Skip to content

add support to openai backend for adding aloras to remote servers #213

@jakelorocco

Description

@jakelorocco

Our current alora-vllm implementation for the openai backend assumes the server is running locally on the same machine. We should think about adding support for checking for alora availability on a remote vllm server for the openai backend. The granite_common / rag-intrinsic folks have a script for downloading / loading aloras and loras during server instantiation: https://huggingface.co/ibm-granite/rag-intrinsics-lib/blob/main/run_vllm_alora.sh.

The main obstacle here is a lack of a naming convention for these aloras. We would need to synchronize (or perhaps allow the alora_path variable to be used for this). Then, when "loading" an alora / lora, we can just check the vllm model list to see if it's there. We wouldn't have to force vllm to let users load/unload aloras at runtime.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions