[Performance]: guided generation is very slow in offline mode

### Proposal to improve performance

With a single request / online mode I'm getting:

- no guided 300 tok/sec
- `outlines` 150 tok/sec (2x slower)
- `lm-format-enforcer` 90 tok/sec (~3x slower)

with offline mode I get:
- `outlines` **is about 10-20x slower than no guided generation**
- `lm-format-enforcer` is about 4x faster than `outlines` (note that it is slower than `outlines` for online)

for online I was using this schema:

```
json_template = {
    "type": "object",
    "properties": {
        "criteria": {"type": "array", "items": {"type": "string"}, "minItems": 1},
        "response": { "type": "string" }
    },
    "required": ["criteria", "response"]
}
```

for offline I was using an even simpler schema:
```

{
   "type":"object",
   "properties":{
      "name":{
         "type":"string", "minLength":2, "maxLength":5
      },
      "age":{
         "type":"integer"
      }
   },
   "required":[ "name", "age"]
}
```
the huge performance hit in the offline mode is very strange for both backends.

2x slow down in the online mode is pretty bad too as it's already a huge impact. The offline mode can actually tolerate 2x no problem as there is no human in the loop, but 10-20x is a way impractical.

`vllm=0.6.0` and `outlines==0.0.46`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Performance]: guided generation is very slow in offline mode #8313

Proposal to improve performance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Performance]: guided generation is very slow in offline mode #8313

Description

Proposal to improve performance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions