GitHub - cldixon/embeddings_api_example: Barebones Text Embeddings API Example. Starting point and a learning exercise.

Simple Embeddings API Example

This is an example of a barebones Embeddings API implementation. The API output mirrors that of the OpenAI Embeddings API. Built on FastAPI, Pydantic, and Sentence Transformers, this project is a learning exercise as much as a starting point for developing custom embedding API interfaces.

The API code is purposely contained in a single main.py file to keep it flexible, and I have included TODO comment tags in places which deserve further development in a production implementation.

Get Started

Before running the API, it is recommended to download the model checkpoints which will be used to generate embeddings, so the pretrained weights are not re-downloaded after server restarts, reloads, etc.

This example API supports serving multiple model options, and these can be specified in the models.txt file (a barebones implementation of a model repo). This example comes with (2) specified, pretrained models from the Sentence-Transformers library, but more can be added by simply adding checkpoint names. More checkpoint options are available here.

Once the models.txt file is ready, run the download.py script to save the model weights and configurations in the artifacts directory.

Start the API

The API is built with FastAPI, so start the server as follows:

$ uvicorn main:app --reload

Tip

FastAPI provides Swagger documentation out of the box. These can be reached at localhost:port/docs#.

Note

Reload allows for editing the source file and updating the app. Not necessary otherwise

API Endpoints

Available Models

This endpoint provides information on the available models (see specified models above), as well as metadata for each model, such as dimension size.

Python

import requests
from pprint import pprint 

response = requests.get(
    url="http://localhost/8000/models", 
    headers={"Content-Type": "application/json"}
)

pprint(response.json())

Curl

$ curl -X 'GET' \
    'http://localhost:8000/models' \
    -H 'accept: application/json'

Response

{
    "data": {
        "models": {
            "all-MiniLM-L12-v2": {"dim": 384},
            "all-mpnet-base-v2": {"dim": 768}
        }
    },
    "generated": "2023-12-02 @ 20:39:22",
    "id": "76662a90-55da-47b9-8072-310ed4d090b8"
}

Text Embeddings

The core feature of the API is to generate and return embedding representations of input text sequences. The example below is for a single text sequence, but the API can handle an array of text sequences as input. The input requires the user to select an available model (see above section).

The API provides (4) fields of data in the response:

model: name of the selected model in the request
generated: generic date/time stamp
id: generic uuid for record inference
data: contains the returned embedding contents

For each embedding returned, there are (3) fields of data:

embedding: vector representation of input text sequence
index: index number for embedding; relevant if multiple text sequences provided as input
num_tokens: number of tokens derived from the input text sequence.

Python

import requests

checkpoint = "all-mpnet-base-v2"
text = "Can we mimic the Embeddings API output format from OpenAI? I dunno, but we can try."

response = requests.post(
    url="http://localhost:8000/embeddings",
    json={"model": checkpoint, "input": text},
    headers=headers
)
pprint(response)

Curl

$ curl -X 'POST' \
    'http://localhost:8000/embeddings' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
        "model": "all-mpnet-base-v2",
        "input": "Can we mimic the Embeddings API output format from OpenAI? I dunno, but we can try."
    }'

Response

{
    "data": [
        {
            "index": 0,
            "embedding": [
                -0.03878581151366234,
                0.03181726858019829,
                -0.020964443683624268,
                ...
            ],
            "num_tokens": 27
        }
    ],
    "generated": "2023-12-02 @ 20:50:04",
    "id": "83e70cf8-e8d6-4783-a928-b45d9507635e",
    "model": "all-mpnet-base-v2"

Multiple Embeddings

The API can accept a list of input text sequences, and the response output will not be changed. The embeddings are returned as a list either way.

import requests 

checkpoint = "all-mpnet-base-v2"
text = [
    "Can we mimic the Embeddings API output format from OpenAI?",
    "I dunno, but we can try."
]

response = requests.post(
    url="http://localhost:8000/embeddings",
    json={"model": checkpoint, "input": text},
    headers=headers
)

if response.status_code == 200:
    content = response.json()
    print(f"** returned {len(content['data'])} embeddings...")

>>> ** returned 2 embeddings...

Invalid Model Selection

In the current implementation, the model returns an error if an invalid (i.e., unspecified, unavailable) model is selected in the request.

import json
import requests 
from pprint import pprint 

model = "i-ll-take-the-finest-model-you-ve-got!"
text = "..."

response = requests.post(
    url=embeddings_url,
    json={"model": model, "input": text},
    headers=headers
)

error_msg = json.loads(response.content)
pprint(error_msg)

Response

{
    "detail": [
        {
            "ctx": {"error": {}},
            "input": "the-finest-model-youve-got!",
            "loc": ["body", "model"],
            "msg": "Assertion failed, Model `i-ll-take-the-finest-model-you-ve-got!` has not been specified and is not available.",
            "type": "assertion_error",
            "url": "https://errors.pydantic.dev/2.5/v/assertion_error"
        }
    ]
}

`TODO's` as Next Steps

Be sure to check out the many TODO comments in the main.py file to see how this example could be further expanded. Also, submit a pull request to add more TODO's.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
artifacts		artifacts
.gitignore		.gitignore
README.md		README.md
download.py		download.py
main.py		main.py
models.txt		models.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simple Embeddings API Example

Get Started

Start the API

API Endpoints

Available Models

Python

Curl

Response

Text Embeddings

Python

Curl

Response

Multiple Embeddings

Invalid Model Selection

Response

`TODO's` as Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cldixon/embeddings_api_example

Folders and files

Latest commit

History

Repository files navigation

Simple Embeddings API Example

Get Started

Start the API

API Endpoints

Available Models

Python

Curl

Response

Text Embeddings

Python

Curl

Response

Multiple Embeddings

Invalid Model Selection

Response

TODO's as Next Steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`TODO's` as Next Steps

Packages