@@ -34,7 +34,7 @@ TensorRT LLM classifies APIs into two categories:
3434All API schemas are:
3535- Stored as YAML files in the codebase
3636- Protected by unit tests in ` tests/unittest/api_stability/ `
37- - Automatically validated to ensure consistency
37+ - Automatically validated to ensure consistency
3838
3939## API Change Principles
4040
@@ -44,22 +44,26 @@ All API schemas are:
4444
4545Argument names should describe what the argument represents, not how it is used internally.
4646
47- ✅ ** Good** : ` max_new_tokens ` (clear meaning)
47+ ✅ ** Good** : ` max_new_tokens ` (clear meaning)
48+
4849❌ ** Bad** : ` num ` (ambiguous)
4950
5051** Reflect Argument Type and Granularity**
5152
5253- For ** boolean** knobs, prefix with verbs like ` enable_ ` and so on.
54+
5355 Examples: ` enable_cache ` , ` enable_flash_attention `
5456
55- - For ** numerical threshold** knobs, suffix with ` _limit ` , ` _size ` , ` _count ` , ` _len_ ` or ` _ratio `
57+ - For ** numerical threshold** knobs, suffix with ` _limit ` , ` _size ` , ` _count ` , ` _len_ ` or ` _ratio `
58+
5659 Examples: ` max_seq_len ` , ` prefill_batch_size `
5760
5861** Avoid Redundant Prefixes**
5962
6063Example (in ` MoeConfig ` ):
6164
62- ✅ ** Good** : ` backend `
65+ ✅ ** Good** : ` backend `
66+
6367❌ ** Bad** : ` moe_backend ` (redundant since it's already in ` MoeConfig ` )
6468
6569** Use Specific Names for Narrow Scenarios**
@@ -68,7 +72,8 @@ When adding knobs for specific use cases, make the name convey the restriction c
6872
6973Example (argument to the LLM class):
7074
71- ✅ ** Good** : ` rope_scaling_factor ` → clearly indicates it's for RoPE
75+ ✅ ** Good** : ` rope_scaling_factor ` → clearly indicates it's for RoPE
76+
7277❌ ** Bad** : ` scaling_factor ` → too generic and prone to misuse
7378
7479### 2. Hierarchical Configuration
@@ -77,13 +82,16 @@ Organize complex or hierarchical arguments into **dedicated configuration datacl
7782
7883** Guidelines**
7984
80- - Use the ` XxxConfig ` suffix consistently
85+ - Use the ` XxxConfig ` suffix consistently
86+
8187 Examples: ` ModelConfig ` , ` ParallelConfig ` , ` MoeConfig `
82-
83- - ** Reflect conceptual hierarchy**
88+
89+ - ** Reflect conceptual hierarchy**
90+
8491 The dataclass name should represent a coherent functional unit, not an arbitrary grouping
85-
86- - ** Avoid over-nesting**
92+
93+ - ** Avoid over-nesting**
94+
8795 Use only one level of configuration hierarchy whenever possible (e.g., ` LlmArgs → ParallelConfig ` ) to balance readability and modularity
8896
8997### 3. Prefer ` LlmArgs ` Over Environment Variables
@@ -154,15 +162,15 @@ garbage_collection_gen0_threshold: int = Field(
154162
155163Add the field to the appropriate schema file:
156164
157- - ** Non-committed arguments** : ` tests/unittest/api_stability/references/llm_args .yaml `
165+ - ** Non-committed arguments** : ` tests/unittest/api_stability/references/llm .yaml `
158166 ``` yaml
159167 garbage_collection_gen0_threshold :
160168 type : int
161169 default : 20000
162170 status : beta # Must match the status in code
163171 ` ` `
164172
165- - **Committed arguments**: ` tests/unittest/api_stability/references_committed/llm_args .yaml`
173+ - **Committed arguments**: ` tests/unittest/api_stability/references_committed/llm .yaml`
166174 ` ` ` yaml
167175 garbage_collection_gen0_threshold:
168176 type: int
@@ -196,16 +204,16 @@ For non-committed APIs, use the `@set_api_status` decorator:
196204` ` ` python
197205@set_api_status("beta")
198206def generate_with_streaming(
199- self,
200- prompts: List[str],
207+ self,
208+ prompts: List[str],
201209 **kwargs
202210) -> Iterator[GenerationOutput]:
203211 """Generate text with streaming output.
204-
212+
205213 Args:
206214 prompts: Input prompts for generation
207215 **kwargs: Additional generation parameters
208-
216+
209217 Returns:
210218 Iterator of generation outputs
211219 """
0 commit comments