Passing an `image_url` to a text-only model should fail explicitly

(noticed this error while working on https://github.com/huggingface/huggingface_hub/pull/2556)

### System Info

Using TGI through Inference API (e.g. [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)).  At the time I open this issue [`/info`](https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/info) returns 

```js
{
"model_id": "mistralai/Mistral-Nemo-Instruct-2407",
"model_sha": "e17a136e1dcba9c63ad771f2c85c1c312c563e6b",
"model_pipeline_tag": "text-generation",
"max_concurrent_requests": 128,
"max_best_of": 2,
"max_stop_sequences": 4,
"max_input_tokens": 16000,
"max_total_tokens": 32768,
"validation_workers": 2,
"max_client_batch_size": 4,
"router": "text-generation-router",
"version": "2.2.1-dev0",
"sha": "a0b6a2434503afa5da5f25fa47a3e4589c80941c",
"docker_label": "sha-a0b6a24"
}
``` 

### Information

- [ ] Docker
- [ ] The CLI directly

### Tasks

- [ ] An officially supported command
- [ ] My own modifications

### Reproduction

Send a request to a text-only model with a payload containing a text + image content. Here is a curl command to reproduce it. It sends [an image](https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg) as `image_url` and `"Describe this image in one sentence."` as `text`.

```sh
curl -X POST \
  -H 'Content-Type: application/json' \
  -H 'authorization: Bearer <HF TOKEN>' \
  -d '{
    "model": "mistralai/Mistral-Nemo-Instruct-2407",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "image_url", "image_url": {"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}},
          {"type": "text", "text": "Describe this image in one sentence."}
        ]
      }
    ]
  }' \
  https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/v1/chat/completions
{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}
```

### Expected behavior

Currently TGI returns successfully with the sentence `"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."`. It seems that since the model is not capable of handling the image, the image url is directly passed to the model. Since the url contains `Statue-of-Liberty-Island-New-York-Bay.jpg`, the answer looks correct but is not generated from the image itself.

```js
{"object":"chat.completion","id":"","created":1727279137,"model":"mistralai/Mistral-Nemo-Instruct-2407","system_fingerprint":"2.2.1-dev0-sha-a0b6a24","choices":[{"index":0,"message":{"role":"assistant","content":"The Statue of Liberty stands tall and proud on its pedestal, facing the New York Bay."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":57,"completion_tokens":19,"total_tokens":76}}% 
```

In such a case I would expect either a **400 Bad request** or a **422 Unprocessable entity**. 

I also tried with a base64-encoded URL and the model fails (max tokens exceeded) since the full base64 encoding seems to be tokenized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Passing an `image_url` to a text-only model should fail explicitly #2565

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Passing an image_url to a text-only model should fail explicitly #2565

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Passing an `image_url` to a text-only model should fail explicitly #2565