Example of TensorRT-LLM Whisper backend for PyTriton

**Describe the solution you'd like**
With the recent [TensorRT-LLM support for Whipser](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper), and now that PyTriton supports TensorRT-LLM, would be great to get examples of efficient client and server code, as well as decoupled mode examples.

**Describe alternatives you've considered**
I've experimented with [WhisperS2T](https://github.com/shashikg/WhisperS2T) coupled with FastAPI and PyTriton, and both perform well. It would be great to get a more involved example, [like here](https://github.com/triton-inference-server/pytriton/tree/54b85de6723c010065d94f9d772c6b58c8d596e1/examples/tensorrt_llm) and [here](https://github.com/triton-inference-server/python_backend/tree/main/examples/decoupled).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example of TensorRT-LLM Whisper backend for PyTriton #65

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Example of TensorRT-LLM Whisper backend for PyTriton #65

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions