Skip to content

Create SGLang Model Configuration Cookbook: Hardware-Optimized Configs for Llama, Qwen, DeepSeek & More #15

@Richardczl98

Description

@Richardczl98

Title: Create SGLang Model Configuration Cookbook: Hardware-Optimized Configs for Llama, Qwen, DeepSeek & More


📖 Description

We need to establish a comprehensive, community-driven cookbook that provides optimal SGLang configurations for running popular AI models across different hardware platforms. While we previously created a repository for this purpose, it lacked sufficient community engagement and ownership. This issue aims to restart the effort with clear structure and SGLang-specific optimization focus.

🎯 Objectives

  • Create standardized SGLang benchmark recipes for popular AI models
  • Provide hardware-specific SGLang runtime optimization configs
  • Build a sustainable community contribution system for SGLang configurations
  • Establish clear ownership and maintenance protocols for SGLang cookbooks

🖥️ Target Hardware Platforms

Enterprise/Data Center:

  • NVIDIA B200
  • NVIDIA H200
  • NVIDIA H100

Consumer/Prosumer:

  • NVIDIA RTX 5090
  • NVIDIA RTX 4090

Note: We'll need to arrange hardware access for comprehensive SGLang testing

🤖 Model Priority List

🚨 HIGH PRIORITY - Currently Missing:

  • Llama models (3.1, 3.2, various sizes) with SGLang optimization

Additional Models:

  • DeepSeek (ds)
  • R1 model
  • V3 model
  • Qwen 3 Next
  • Open-source GPT models

📋 Deliverables

For Each Model + Hardware Combination:

  • Optimal SGLang runtime configuration (--tp, --dp, memory settings)
  • SGLang-specific optimization flags and parameters
  • Structured generation performance benchmarks
  • Memory efficiency with SGLang runtime
  • Throughput benchmarks (tokens/sec, requests/sec)
  • Latency measurements for structured outputs
  • Batching strategies for SGLang workloads
  • JSON schema performance comparisons

Repository Structure:

Follow existing written model formats on https://app.gitbook.com/invite/TvLfyTxdRQeudJH7e5QW/Yrdt6Nb7fPPjefF5OfCV

🤝 How to Contribute

  1. Claim a Model/Hardware Combo: Comment with your SGLang experience level
  2. Follow SGLang Templates: Use provided SGLang-specific templates
  3. Submit SGLang Benchmarks: Include runtime configs, structured generation examples
  4. Share Optimization Tips: Document SGLang-specific tuning discoveries
  5. Validate Configurations: Test others' SGLang setups and provide feedback

🏷️ Labels

sglang enhancement community benchmarking documentation help-wanted good-first-issue performance

Who has SGLang experience and is interested in taking ownership or contributing to specific model/hardware combinations? Please comment below with your SGLang background! 🚀

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions