Skip to content

[Performance]: guided generation is very slow in offline mode #8313

@stas00

Description

@stas00

Proposal to improve performance

With a single request / online mode I'm getting:

  • no guided 300 tok/sec
  • outlines 150 tok/sec (2x slower)
  • lm-format-enforcer 90 tok/sec (~3x slower)

with offline mode I get:

  • outlines is about 10-20x slower than no guided generation
  • lm-format-enforcer is about 4x faster than outlines (note that it is slower than outlines for online)

for online I was using this schema:

json_template = {
    "type": "object",
    "properties": {
        "criteria": {"type": "array", "items": {"type": "string"}, "minItems": 1},
        "response": { "type": "string" }
    },
    "required": ["criteria", "response"]
}

for offline I was using an even simpler schema:


{
   "type":"object",
   "properties":{
      "name":{
         "type":"string", "minLength":2, "maxLength":5
      },
      "age":{
         "type":"integer"
      }
   },
   "required":[ "name", "age"]
}

the huge performance hit in the offline mode is very strange for both backends.

2x slow down in the online mode is pretty bad too as it's already a huge impact. The offline mode can actually tolerate 2x no problem as there is no human in the loop, but 10-20x is a way impractical.

vllm=0.6.0 and outlines==0.0.46

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions