You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* updated mistral3 model card (#1)
* updated mistral3 model card
* applying suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* made all changes to mistral3.md
* adding space between paragraphs in docs/source/en/model_doc/mistral3.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* removing duplicate in mistral3.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* adding 4 backticks to preserve formatting
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
[Mistral 3](https://mistral.ai/news/mistral-small-3) is a latency optimized model with a lot fewer layers to reduce the time per forward pass. This model adds vision understanding and supports long context lengths of up to 128K tokens without compromising performance.
20
25
21
-
Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
26
+
You can find the original Mistral 3 checkpoints under the[Mistral AI](https://huggingface.co/mistralai/models?search=mistral-small-3) organization.
22
27
23
-
It is ideal for:
24
-
- Fast-response conversational agents.
25
-
- Low-latency function calling.
26
-
- Subject matter experts via fine-tuning.
27
-
- Local inference for hobbyists and organizations handling sensitive data.
28
-
- Programming and math reasoning.
29
-
- Long document understanding.
30
-
- Visual understanding.
31
28
32
-
This model was contributed by [cyrilvallez](https://huggingface.co/cyrilvallez) and [yonigozlan](https://huggingface.co/yonigozlan).
29
+
> [!TIP]
30
+
> This model was contributed by [cyrilvallez](https://huggingface.co/cyrilvallez) and [yonigozlan](https://huggingface.co/yonigozlan).
31
+
> Click on the Mistral3 models in the right sidebar for more examples of how to apply Mistral3 to different tasks.
33
32
34
-
The original code can be found [here](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/pixtral.py) and [here](https://github.com/mistralai/mistral-common).
33
+
The example below demonstrates how to generate text for an image with [`Pipeline`] and the [`AutoModel`] class.
35
34
36
-
## Usage example
35
+
<hfoptionsid="usage">
36
+
<hfoptionid="Pipeline">
37
37
38
-
### Inference with Pipeline
38
+
```py
39
+
import torch
40
+
from transformers import pipeline
39
41
40
-
Here is how you can use the `image-text-to-text` pipeline to perform inference with the `Mistral3` models in just a few lines of code:
'The image depicts a vibrant and lush garden scene featuring a variety of wildflowers and plants. The central focus is on a large, pinkish-purple flower, likely a Greater Celandine (Chelidonium majus), with a'
61
62
```
62
-
### Inference on a single image
63
-
64
-
This example demonstrates how to perform inference on a single image with the Mistral3 models using chat templates.
'The image depicts a vibrant and lush garden scene featuring a variety of wildflowers and plants. The central focus is on a large, pinkish-purple flower, likely a Greater Celandine (Chelidonium majus), with a'
92
100
```
101
+
</hfoption>
102
+
</hfoptions>
93
103
94
-
### Text-only generation
95
-
This example shows how to generate text using the Mistral3 model without providing any image input.
104
+
## Notes
96
105
106
+
- Mistral 3 supports text-only generation.
107
+
```py
108
+
from transformers import AutoProcessor, AutoModelForImageTextToText
>>> model = AutoModelForImageTextToText.from_pretrained(model_checkpoint, device_map=torch_device, torch_dtype=torch.bfloat16)
116
+
SYSTEM_PROMPT="You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
117
+
user_prompt ="Give me 5 non-formal ways to say 'See you later' in French."
106
118
107
-
>>>SYSTEM_PROMPT="You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
108
-
>>> user_prompt ="Give me 5 non-formal ways to say 'See you later' in French."
119
+
messages = [
120
+
{"role": "system", "content": SYSTEM_PROMPT},
121
+
{"role": "user", "content": user_prompt},
122
+
]
109
123
110
-
>>>messages= [
111
-
... {"role": "system", "content": SYSTEM_PROMPT},
112
-
... {"role": "user", "content": user_prompt},
113
-
...]
124
+
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
- Mistral 3 also supported batched image and text inputs with a different number of images for each text. The example below quantizes the model with bitsandbytes.
186
+
187
+
```py
188
+
from transformers import AutoProcessor, AutoModelForImageTextToText, BitsAndBytesConfig
["Write a haiku for this imageSure, here is a haiku inspired by the image:\n\nCalm lake's wooden path\nSilent forest stands guard\n", "These images depict two different landmarks. Can you identify them? Certainly! The images depict two iconic landmarks:\n\n1. The first image shows the Statue of Liberty in New York City."]
0 commit comments