-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
(noticed this error while working on huggingface/huggingface_hub#2556)
System Info
Using TGI through Inference API (e.g. mistralai/Mistral-Nemo-Instruct-2407). At the time I open this issue /info
returns
{
"model_id": "mistralai/Mistral-Nemo-Instruct-2407",
"model_sha": "e17a136e1dcba9c63ad771f2c85c1c312c563e6b",
"model_pipeline_tag": "text-generation",
"max_concurrent_requests": 128,
"max_best_of": 2,
"max_stop_sequences": 4,
"max_input_tokens": 16000,
"max_total_tokens": 32768,
"validation_workers": 2,
"max_client_batch_size": 4,
"router": "text-generation-router",
"version": "2.2.1-dev0",
"sha": "a0b6a2434503afa5da5f25fa47a3e4589c80941c",
"docker_label": "sha-a0b6a24"
}
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Send a request to a text-only model with a payload containing a text + image content. Here is a curl command to reproduce it. It sends an image as image_url
and "Describe this image in one sentence."
as text
.
curl -X POST \
-H 'Content-Type: application/json' \
-H 'authorization: Bearer <HF TOKEN>' \
-d '{
"model": "mistralai/Mistral-Nemo-Instruct-2407",
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}},
{"type": "text", "text": "Describe this image in one sentence."}
]
}
]
}' \
https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/v1/chat/completions
{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}
Expected behavior
Currently TGI returns successfully with the sentence "The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."
. It seems that since the model is not capable of handling the image, the image url is directly passed to the model. Since the url contains Statue-of-Liberty-Island-New-York-Bay.jpg
, the answer looks correct but is not generated from the image itself.
{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}%
In such a case I would expect either a 400 Bad request or a 422 Unprocessable entity.
I also tried with a base64-encoded URL and the model fails (max tokens exceeded) since the full base64 encoding seems to be tokenized.