VRAM not released after client disconnect in sherpa-onnx-online-websocket-server

Hello !

When using sherpa-onnx-online-websocket-server with CUDA provider, GPU VRAM usage keeps increasing after handling multiple WebSocket connections.

Even after a client finishes streaming (Done message sent, final result returned, connection closed), VRAM allocated by the model is not released. Over time, this leads to out-of-memory (OOM) errors or forces the server process to crash/restart.

This issue makes it impossible to run the server under heavy load (hundreds of concurrent streams), since VRAM usage grows linearly with the number of completed connections.



```
./bin/sherpa-onnx-online-websocket-server \
  --port=8080 \
  --num-work-threads=16 \
  --num-io-threads=8 \
  --tokens=./models/tokens.txt \
  --encoder=./models/encoder.onnx \
  --decoder=./models/decoder.onnx \
  --joiner=./models/joiner.onnx \
  --provider=cuda \
  --max-batch-size=128 \
  --loop-interval-ms=10
```

**Actual behavior**

VRAM usage keeps increasing after each client disconnect.
Even though connections are removed from connections_ in OnlineWebsocketDecoder::ProcessConnections, the GPU memory is not freed.
Eventually the server hits OOM and restarts.

**Environment**

sherpa-onnx version: latest

Build type: The binaries from wget https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.12.13/sherpa-onnx-v1.12.13-cuda-12.x-cudnn-9.x-linux-x64-gpu.tar.bz2

CUDA version: 12.8

GPU: NVIDIA H100 80GB

OS: nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04

The issue seems related to OnlineRecognizer / OnlineStream not freeing GPU state after InputFinished + removal from connections_.
I tried adding manual cleanup (resetting stream, clearing connections), but VRAM still accumulates.

On CPU provider, the memory is released correctly.
On CUDA provider, VRAM grows continuously with each completed stream.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VRAM not released after client disconnect in sherpa-onnx-online-websocket-server #2631

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VRAM not released after client disconnect in sherpa-onnx-online-websocket-server #2631

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions