[Feature]: improve GGUF loading from HuggingFace user experience

### 🚀 The feature, motivation and pitch

Something like `vllm serve unsloth/Qwen3-4B-Instruct-2507-GGUF:Q4_K_XL` should simply work, downloading the model from Huggingface automatically and caching it.

Or maybe specifying it as `unsloth/Qwen3-4B-Instruct-2507-GGUF/Qwen3-4B-Instruct-2507-Q4_K_XL.gguf` (although I think this is less preferred).

### Alternatives

_No response_

### Additional context

This is a followup to [#20084](https://github.com/vllm-project/vllm/issues/20084) and https://github.com/vllm-project/vllm/pull/20793 which I personally think are incomplete solutions and not what users expect from other projects like Llama.cpp, Ollama, etc.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: improve GGUF loading from HuggingFace user experience #25182

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: improve GGUF loading from HuggingFace user experience #25182

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions