Skip to content

VRAM not released after client disconnect in sherpa-onnx-online-websocket-server #2631

@Alekksander66

Description

@Alekksander66

Hello !

When using sherpa-onnx-online-websocket-server with CUDA provider, GPU VRAM usage keeps increasing after handling multiple WebSocket connections.

Even after a client finishes streaming (Done message sent, final result returned, connection closed), VRAM allocated by the model is not released. Over time, this leads to out-of-memory (OOM) errors or forces the server process to crash/restart.

This issue makes it impossible to run the server under heavy load (hundreds of concurrent streams), since VRAM usage grows linearly with the number of completed connections.

./bin/sherpa-onnx-online-websocket-server \
  --port=8080 \
  --num-work-threads=16 \
  --num-io-threads=8 \
  --tokens=./models/tokens.txt \
  --encoder=./models/encoder.onnx \
  --decoder=./models/decoder.onnx \
  --joiner=./models/joiner.onnx \
  --provider=cuda \
  --max-batch-size=128 \
  --loop-interval-ms=10

Actual behavior

VRAM usage keeps increasing after each client disconnect.
Even though connections are removed from connections_ in OnlineWebsocketDecoder::ProcessConnections, the GPU memory is not freed.
Eventually the server hits OOM and restarts.

Environment

sherpa-onnx version: latest

Build type: The binaries from wget https://github.com/k2-fsa/sherpa-onnx/releases/download/v1.12.13/sherpa-onnx-v1.12.13-cuda-12.x-cudnn-9.x-linux-x64-gpu.tar.bz2

CUDA version: 12.8

GPU: NVIDIA H100 80GB

OS: nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04

The issue seems related to OnlineRecognizer / OnlineStream not freeing GPU state after InputFinished + removal from connections_.
I tried adding manual cleanup (resetting stream, clearing connections), but VRAM still accumulates.

On CPU provider, the memory is released correctly.
On CUDA provider, VRAM grows continuously with each completed stream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions