-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Title: Create SGLang Model Configuration Cookbook: Hardware-Optimized Configs for Llama, Qwen, DeepSeek & More
📖 Description
We need to establish a comprehensive, community-driven cookbook that provides optimal SGLang configurations for running popular AI models across different hardware platforms. While we previously created a repository for this purpose, it lacked sufficient community engagement and ownership. This issue aims to restart the effort with clear structure and SGLang-specific optimization focus.
🎯 Objectives
- Create standardized SGLang benchmark recipes for popular AI models
- Provide hardware-specific SGLang runtime optimization configs
- Build a sustainable community contribution system for SGLang configurations
- Establish clear ownership and maintenance protocols for SGLang cookbooks
🖥️ Target Hardware Platforms
Enterprise/Data Center:
- NVIDIA B200
- NVIDIA H200
- NVIDIA H100
Consumer/Prosumer:
- NVIDIA RTX 5090
- NVIDIA RTX 4090
Note: We'll need to arrange hardware access for comprehensive SGLang testing
🤖 Model Priority List
🚨 HIGH PRIORITY - Currently Missing:
- Llama models (3.1, 3.2, various sizes) with SGLang optimization
Additional Models:
- DeepSeek (ds)
- R1 model
- V3 model
- Qwen 3 Next
- Open-source GPT models
📋 Deliverables
For Each Model + Hardware Combination:
- Optimal SGLang runtime configuration (
--tp
,--dp
, memory settings) - SGLang-specific optimization flags and parameters
- Structured generation performance benchmarks
- Memory efficiency with SGLang runtime
- Throughput benchmarks (tokens/sec, requests/sec)
- Latency measurements for structured outputs
- Batching strategies for SGLang workloads
- JSON schema performance comparisons
Repository Structure:
Follow existing written model formats on https://app.gitbook.com/invite/TvLfyTxdRQeudJH7e5QW/Yrdt6Nb7fPPjefF5OfCV
🤝 How to Contribute
- Claim a Model/Hardware Combo: Comment with your SGLang experience level
- Follow SGLang Templates: Use provided SGLang-specific templates
- Submit SGLang Benchmarks: Include runtime configs, structured generation examples
- Share Optimization Tips: Document SGLang-specific tuning discoveries
- Validate Configurations: Test others' SGLang setups and provide feedback
🏷️ Labels
sglang
enhancement
community
benchmarking
documentation
help-wanted
good-first-issue
performance
Who has SGLang experience and is interested in taking ownership or contributing to specific model/hardware combinations? Please comment below with your SGLang background! 🚀