From b532d3485ff53ac01b6e6a7092a3fa618713539f Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 24 Oct 2025 14:25:40 +0200 Subject: [PATCH 1/7] feat(genapi): update whisper quotas and properties --- .../additional-content/organization-quotas.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/pages/organizations-and-projects/additional-content/organization-quotas.mdx b/pages/organizations-and-projects/additional-content/organization-quotas.mdx index 6f5fe5461f..04cdade983 100644 --- a/pages/organizations-and-projects/additional-content/organization-quotas.mdx +++ b/pages/organizations-and-projects/additional-content/organization-quotas.mdx @@ -210,6 +210,11 @@ Generative APIs are rate limited based on: | gpt-oss-120b | 200k | 400k | | bge-multilingual-gemma2 | 200k | 400k | +| Audio seconds per minute | [Payment method validated](/billing/how-to/add-payment-method/#how-to-add-a-credit-card) | Payment method and [identity validated](/account/how-to/verify-identity/) | +|-------------|:----------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------:| +| voxtral-small-24b-2507 | 1800 | 3600 | +| whisper-large-v3 | 1800 | 3600 | + | Requests per minute | [Payment method validated](/billing/how-to/add-payment-method/#how-to-add-a-credit-card) | Payment method and [identity validated](/account/how-to/verify-identity/) | |-------------|:----------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------:| @@ -228,6 +233,7 @@ Generative APIs are rate limited based on: | qwen3-coder-30b-a3b-instruct | 300 | 600 | | gpt-oss-120b | 300 | 600 | | bge-multilingual-gemma2 | 300 | 600 | +| whisper-large-v3 | 300 | 600 | | Concurrent requests | [Payment method validated](/billing/how-to/add-payment-method/#how-to-add-a-credit-card) | Payment method and [identity validated](/account/how-to/verify-identity/) | |-------------|:----------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------:| From 5de53f4c1069ae7cae145127ccd97a453f73f09d Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 24 Oct 2025 14:35:17 +0200 Subject: [PATCH 2/7] feat(genapi): update whisper and voxtral properties --- .../generative-apis/reference-content/supported-models.mdx | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/pages/generative-apis/reference-content/supported-models.mdx b/pages/generative-apis/reference-content/supported-models.mdx index 02581655ad..551cc0a309 100644 --- a/pages/generative-apis/reference-content/supported-models.mdx +++ b/pages/generative-apis/reference-content/supported-models.mdx @@ -24,6 +24,13 @@ Our API supports the most popular models for [Chat](/generative-apis/how-to/quer |-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| | Mistral | `voxtral-small-24b-2507` | 32k | 8192 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) | +### Audio transcription models + +| Provider | Model string | Maximum audio duration (Minutes) | Chunk size (Seconds) | Maximum file size (MB) | License | Model card | +|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| +| Mistral | `voxtral-small-24b-2507` | 30 | 30 | 25 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) | +| OpenAI | `whisper-large-v3` | - | 30 | 25 | [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) | [HF](https://huggingface.co/openai/whisper-large-v3) | + ## Chat models | Provider | Model string | Context window (Tokens) | Maximum output (Tokens)| License | Model card | From e2a8ddb6ff7cf59ab4405098426f350e26489e8a Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 24 Oct 2025 15:21:07 +0200 Subject: [PATCH 3/7] feat(inference): update whisper properties --- .../reference-content/model-catalog.mdx | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 6be7013993..9724cb9187 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -17,6 +17,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | Model name | Provider | Maximum Context length (tokens) | Modalities | Compatible Instances (Max Context in tokens\*) | License | |------------|----------|--------------|------------|-----------|---------| | [`gpt-oss-120b`](#gpt-oss-120b) | OpenAI | 128k | Text | H100 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | +| [`whisper-large-v3`](#whisper-large-v3) | OpenAI | - | Audio transcription | L4, L40S, H100, H100-SXM-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`qwen3-235b-a22b-instruct-2507`](#qwen3-235b-a22b-instruct-2507) | Qwen | 40k | Text | H100-2 | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) | | [`gemma-3-27b-it`](#gemma-3-27b-it) | Google | 40k | Text, Vision | H100, H100-2 | [Gemma](https://ai.google.dev/gemma/terms) | | [`llama-3.3-70b-instruct`](#llama-33-70b-instruct) | Meta | 128k | Text | H100 (15k), H100-2 | [Llama 3.3 Community](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | @@ -48,6 +49,7 @@ A quick overview of available models in Scaleway's catalog and their core attrib | Model name | Structured output supported | Function calling | Supported languages | | --- | --- | --- | --- | | `gpt-oss-120b` | Yes | Yes | English | +| `whisper-large-v3` | - | - | English, French, German, Chinese, Japanese, Korean and 81 additional languages | | `qwen3-235b-a22b-instruct-2507` | Yes | Yes | English, French, German, Chinese, Japanese, Korean and 113 additional languages and dialects | | `gemma-3-27b-it` | Yes | Partial | English, Chinese, Japanese, Korean and 31 additional languages | | `llama-3.3-70b-instruct` | Yes | Yes | English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai | @@ -192,6 +194,26 @@ mistral/voxtral-small-24b-2507:fp8 - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent. - 80ms is equal to 1 input token +## Audio transcription models + +### Whisper-large-v3 +Whisper-large-v3 is a model developed by OpenAI to perform audio transcription on many languages. +This model is optimized for transcription in many languages. + +| Attribute | Value | +|-----------|-------| +| Supported audio formats | WAV and MP3 | +| Audio chunk duration | 30 seconds | + +#### Model names +``` +openai/whisper-large-v3:bf16 +``` + +- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed. +- Audio files are processed in 30 seconds chunks: + - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent. + ## Text models ### Qwen3-235b-a22b-instruct-2507 From 30f0f0738b3e9c32eb62b4bfb5b03d57a3e0dd0d Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 24 Oct 2025 15:44:34 +0200 Subject: [PATCH 4/7] fix(genapi): wording Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- pages/managed-inference/reference-content/model-catalog.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 9724cb9187..ebc3e73995 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -197,7 +197,7 @@ mistral/voxtral-small-24b-2507:fp8 ## Audio transcription models ### Whisper-large-v3 -Whisper-large-v3 is a model developed by OpenAI to perform audio transcription on many languages. +Whisper-large-v3 is a model developed by OpenAI to transcribe audio in many languages. This model is optimized for transcription in many languages. | Attribute | Value | From b36fb985eb8ff426f3c9c7a9c2e6fd734d9d896d Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 24 Oct 2025 15:44:47 +0200 Subject: [PATCH 5/7] fix(genapi): wording Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- pages/managed-inference/reference-content/model-catalog.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index ebc3e73995..896e05f23d 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -198,7 +198,7 @@ mistral/voxtral-small-24b-2507:fp8 ### Whisper-large-v3 Whisper-large-v3 is a model developed by OpenAI to transcribe audio in many languages. -This model is optimized for transcription in many languages. +This model is optimized for audio transcription tasks. | Attribute | Value | |-----------|-------| From 6547b5260f5127106baac156daddd158c77a3529 Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 24 Oct 2025 15:44:54 +0200 Subject: [PATCH 6/7] fix(genapi): wording Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- pages/managed-inference/reference-content/model-catalog.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 896e05f23d..6ff8ea3fb4 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -211,7 +211,7 @@ openai/whisper-large-v3:bf16 ``` - Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed. -- Audio files are processed in 30 seconds chunks: +- Audio files are processed in 30-second chunks: - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent. ## Text models From eaedf1947a6d37211996c24ec4b12c2c51d58dbb Mon Sep 17 00:00:00 2001 From: fpagny Date: Fri, 24 Oct 2025 15:45:02 +0200 Subject: [PATCH 7/7] fix(genapi): wording Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- pages/managed-inference/reference-content/model-catalog.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx index 6ff8ea3fb4..a3e863dd25 100644 --- a/pages/managed-inference/reference-content/model-catalog.mdx +++ b/pages/managed-inference/reference-content/model-catalog.mdx @@ -210,7 +210,7 @@ This model is optimized for audio transcription tasks. openai/whisper-large-v3:bf16 ``` -- Mono and stereo audio formats are supported. For stereo formats, both left and right channels are merged before being processed. +- Mono and stereo audio formats are supported. For stereo formats, left and right channels are merged before being processed. - Audio files are processed in 30-second chunks: - If audio sent is less than 30 seconds, the rest of the chunk will be considered silent.