Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 6 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,97 +1,11 @@
![Release](https://img.shields.io/github/v/release/ahmetoner/whisper-asr-webservice.svg)
![Docker Pulls](https://img.shields.io/docker/pulls/onerahmet/openai-whisper-asr-webservice.svg)
![Build](https://img.shields.io/github/actions/workflow/status/ahmetoner/whisper-asr-webservice/docker-publish.yml.svg)
![Licence](https://img.shields.io/github/license/ahmetoner/whisper-asr-webservice.svg)
The latest version of asr-webservice has added support for the RTX 5090 GPU. By recompiling the source code, we have added support for torch2.7+cuda128. Now, it can efficiently provide transcription services for applications like Speaker using the GPU!

> 🎉 **Join our Discord Community!** Connect with other users, get help, and stay updated on the latest features: [https://discord.gg/4Q5YVrePzZ](https://discord.gg/4Q5YVrePzZ)
You can directly pull the pre-built image I have packaged using the following command:

# Whisper ASR Box
docker pull crpi-n9jif4z5nex2rnkd.cn-hangzhou.personal.cr.aliyuncs.com/docker_2025-images/whisper-asr-webservice_for_5090:latest

Whisper ASR Box is a general-purpose speech recognition toolkit. Whisper Models are trained on a large dataset of diverse audio and is also a multitask model that can perform multilingual speech recognition as well as speech translation and language identification.
为asr-webservice的最新版本添加RTX 5090显卡支持,通过对源代码重新编译,添加了torch2.7+cuda128,现在可以使用GPU来高效地为Speaker等应用提供转写服务啦!

## Features
直接通过以下命令拉取我打包好的镜像即可:

Current release (v1.9.1) supports following whisper models:

- [openai/whisper](https://github.com/openai/whisper)@[v20250625](https://github.com/openai/whisper/releases/tag/v20250625)
- [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.1](https://github.com/SYSTRAN/faster-whisper/releases/tag/v1.1.1)
- [whisperX](https://github.com/m-bain/whisperX)@[v3.4.2](https://github.com/m-bain/whisperX/releases/tag/v3.4.2)

## Quick Usage

### CPU

```shell
docker run -d -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest
```

### GPU

```shell
docker run -d --gpus all -p 9000:9000 \
-e ASR_MODEL=base \
-e ASR_ENGINE=openai_whisper \
onerahmet/openai-whisper-asr-webservice:latest-gpu
```

#### Cache

To reduce container startup time by avoiding repeated downloads, you can persist the cache directory:

```shell
docker run -d -p 9000:9000 \
-v $PWD/cache:/root/.cache/ \
onerahmet/openai-whisper-asr-webservice:latest
```

## Key Features

- Multiple ASR engines support (OpenAI Whisper, Faster Whisper, WhisperX)
- Multiple output formats (text, JSON, VTT, SRT, TSV)
- Word-level timestamps support
- Voice activity detection (VAD) filtering
- Speaker diarization (with WhisperX)
- FFmpeg integration for broad audio/video format support
- GPU acceleration support
- Configurable model loading/unloading
- REST API with Swagger documentation

## Environment Variables

Key configuration options:

- `ASR_ENGINE`: Engine selection (openai_whisper, faster_whisper, whisperx)
- `ASR_MODEL`: Model selection (tiny, base, small, medium, large-v3, etc.)
- `ASR_MODEL_PATH`: Custom path to store/load models
- `ASR_DEVICE`: Device selection (cuda, cpu)
- `MODEL_IDLE_TIMEOUT`: Timeout for model unloading

## Documentation

For complete documentation, visit:
[https://ahmetoner.github.io/whisper-asr-webservice](https://ahmetoner.github.io/whisper-asr-webservice)

## Development

```shell
# Install poetry v2.X
pip3 install poetry

# Install dependencies for cpu
poetry install --extras cpu

# Install dependencies for cuda
poetry install --extras cuda

# Run service
poetry run whisper-asr-webservice --host 0.0.0.0 --port 9000
```

After starting the service, visit `http://localhost:9000` or `http://0.0.0.0:9000` in your browser to access the Swagger UI documentation and try out the API endpoints.

## Credits

- This software uses libraries from the [FFmpeg](http://ffmpeg.org) project under the [LGPLv2.1](http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html)
docker pull crpi-n9jif4z5nex2rnkd.cn-hangzhou.personal.cr.aliyuncs.com/docker_2025-images/whisper-asr-webservice_for_5090:latest