-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed as not planned
Labels
performancePerformance-related issuesPerformance-related issuesstaleOver 90 days of inactivityOver 90 days of inactivitystructured-output
Description
Proposal to improve performance
With a single request / online mode I'm getting:
- no guided 300 tok/sec
outlines
150 tok/sec (2x slower)lm-format-enforcer
90 tok/sec (~3x slower)
with offline mode I get:
outlines
is about 10-20x slower than no guided generationlm-format-enforcer
is about 4x faster thanoutlines
(note that it is slower thanoutlines
for online)
for online I was using this schema:
json_template = {
"type": "object",
"properties": {
"criteria": {"type": "array", "items": {"type": "string"}, "minItems": 1},
"response": { "type": "string" }
},
"required": ["criteria", "response"]
}
for offline I was using an even simpler schema:
{
"type":"object",
"properties":{
"name":{
"type":"string", "minLength":2, "maxLength":5
},
"age":{
"type":"integer"
}
},
"required":[ "name", "age"]
}
the huge performance hit in the offline mode is very strange for both backends.
2x slow down in the online mode is pretty bad too as it's already a huge impact. The offline mode can actually tolerate 2x no problem as there is no human in the loop, but 10-20x is a way impractical.
vllm=0.6.0
and outlines==0.0.46
Muhtasham, apohllo, dbuades, CptCaptain, bilzard and 4 more
Metadata
Metadata
Assignees
Labels
performancePerformance-related issuesPerformance-related issuesstaleOver 90 days of inactivityOver 90 days of inactivitystructured-output
Type
Projects
Status
Done