[Question] Tensor parallelism for tensorrt_llm

**Is your feature request related to a problem? Please describe.**
I am aware that PyTriton already have an example for using PyTriton with tensorrt_llm. But I noticed that the example only support single gpu inference. Therefore, may I ask is there any other examples or reference docs which using tensorrt_llm with PyTriton and support tensor parallelism.

**Describe the solution you'd like**
I think right now the example is excellent, but will be more comprehensive if can add multiple gpu inference(tensor parallelism inference) examples since this will be one of the widely use case.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Tensor parallelism for tensorrt_llm #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Tensor parallelism for tensorrt_llm #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions