add support to openai backend for adding aloras to remote servers

Our current alora-vllm implementation for the openai backend assumes the server is running locally on the same machine. We should think about adding support for checking for alora availability on a remote vllm server for the openai backend. The granite_common / rag-intrinsic folks have a script for downloading / loading aloras and loras during server instantiation: https://huggingface.co/ibm-granite/rag-intrinsics-lib/blob/main/run_vllm_alora.sh.

The main obstacle here is a lack of a naming convention for these aloras. We would need to synchronize (or perhaps allow the alora_path variable to be used for this). Then, when "loading" an alora / lora, we can just check the vllm model list to see if it's there. We wouldn't have to force vllm to let users load/unload aloras at runtime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add support to openai backend for adding aloras to remote servers #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add support to openai backend for adding aloras to remote servers #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions