Skip to content

cldixon/embeddings_api_example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Embeddings API Example

This is an example of a barebones Embeddings API implementation. The API output mirrors that of the OpenAI Embeddings API. Built on FastAPI, Pydantic, and Sentence Transformers, this project is a learning exercise as much as a starting point for developing custom embedding API interfaces.

The API code is purposely contained in a single main.py file to keep it flexible, and I have included TODO comment tags in places which deserve further development in a production implementation.

Get Started

Before running the API, it is recommended to download the model checkpoints which will be used to generate embeddings, so the pretrained weights are not re-downloaded after server restarts, reloads, etc.

This example API supports serving multiple model options, and these can be specified in the models.txt file (a barebones implementation of a model repo). This example comes with (2) specified, pretrained models from the Sentence-Transformers library, but more can be added by simply adding checkpoint names. More checkpoint options are available here.

Once the models.txt file is ready, run the download.py script to save the model weights and configurations in the artifacts directory.

Start the API

The API is built with FastAPI, so start the server as follows:

$ uvicorn main:app --reload

Tip

FastAPI provides Swagger documentation out of the box. These can be reached at localhost:port/docs#.

Note

Reload allows for editing the source file and updating the app. Not necessary otherwise

API Endpoints

Available Models

This endpoint provides information on the available models (see specified models above), as well as metadata for each model, such as dimension size.

Python
import requests
from pprint import pprint 

response = requests.get(
    url="http://localhost/8000/models", 
    headers={"Content-Type": "application/json"}
)

pprint(response.json())
Curl
$ curl -X 'GET' \
    'http://localhost:8000/models' \
    -H 'accept: application/json'
Response
{
    "data": {
        "models": {
            "all-MiniLM-L12-v2": {"dim": 384},
            "all-mpnet-base-v2": {"dim": 768}
        }
    },
    "generated": "2023-12-02 @ 20:39:22",
    "id": "76662a90-55da-47b9-8072-310ed4d090b8"
}

Text Embeddings

The core feature of the API is to generate and return embedding representations of input text sequences. The example below is for a single text sequence, but the API can handle an array of text sequences as input. The input requires the user to select an available model (see above section).

The API provides (4) fields of data in the response:

  1. model: name of the selected model in the request
  2. generated: generic date/time stamp
  3. id: generic uuid for record inference
  4. data: contains the returned embedding contents

For each embedding returned, there are (3) fields of data:

  1. embedding: vector representation of input text sequence
  2. index: index number for embedding; relevant if multiple text sequences provided as input
  3. num_tokens: number of tokens derived from the input text sequence.
Python
import requests

checkpoint = "all-mpnet-base-v2"
text = "Can we mimic the Embeddings API output format from OpenAI? I dunno, but we can try."

response = requests.post(
    url="http://localhost:8000/embeddings",
    json={"model": checkpoint, "input": text},
    headers=headers
)
pprint(response)
Curl
$ curl -X 'POST' \
    'http://localhost:8000/embeddings' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "all-mpnet-base-v2",
        "input": "Can we mimic the Embeddings API output format from OpenAI? I dunno, but we can try."
    }'
Response
{
    "data": [
        {
            "index": 0,
            "embedding": [
                -0.03878581151366234,
                0.03181726858019829,
                -0.020964443683624268,
                ...
            ],
            "num_tokens": 27
        }
    ],
    "generated": "2023-12-02 @ 20:50:04",
    "id": "83e70cf8-e8d6-4783-a928-b45d9507635e",
    "model": "all-mpnet-base-v2"

Multiple Embeddings

The API can accept a list of input text sequences, and the response output will not be changed. The embeddings are returned as a list either way.

import requests 

checkpoint = "all-mpnet-base-v2"
text = [
    "Can we mimic the Embeddings API output format from OpenAI?",
    "I dunno, but we can try."
]

response = requests.post(
    url="http://localhost:8000/embeddings",
    json={"model": checkpoint, "input": text},
    headers=headers
)

if response.status_code == 200:
    content = response.json()
    print(f"** returned {len(content['data'])} embeddings...")

>>> ** returned 2 embeddings...

Invalid Model Selection

In the current implementation, the model returns an error if an invalid (i.e., unspecified, unavailable) model is selected in the request.

import json
import requests 
from pprint import pprint 

model = "i-ll-take-the-finest-model-you-ve-got!"
text = "..."

response = requests.post(
    url=embeddings_url,
    json={"model": model, "input": text},
    headers=headers
)

error_msg = json.loads(response.content)
pprint(error_msg)
Response
{
    "detail": [
        {
            "ctx": {"error": {}},
            "input": "the-finest-model-youve-got!",
            "loc": ["body", "model"],
            "msg": "Assertion failed, Model `i-ll-take-the-finest-model-you-ve-got!` has not been specified and is not available.",
            "type": "assertion_error",
            "url": "https://errors.pydantic.dev/2.5/v/assertion_error"
        }
    ]
}

TODO's as Next Steps

Be sure to check out the many TODO comments in the main.py file to see how this example could be further expanded. Also, submit a pull request to add more TODO's.

About

Barebones Text Embeddings API Example. Starting point and a learning exercise.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages