Skip to content

Proposal: Add thinking Case to Generation Enum #350

@ronaldmannak

Description

@ronaldmannak

Currently, the Generation enum has three cases: chunk, info, and toolCall.

Many newer APIs (such as Ollama’s thinking property in Message) now include special properties for "thinking" directly in their response data structures, rather than encoding such tokens in text.

Rationale

Different models may use varying tokens to represent "thinking," making it complicated to detect or filter these tokens at the application layer. Moving the responsibility for handling these special tokens to the inference engine would simplify integration and keep application code cleaner.

Proposed Change

Add a new .thinking case to the Generation enum:

public enum Generation: Sendable {
    /// A generated token represented as a String.
    case chunk(String)

    /// A generated "thinking" token, represented as a String.
    case thinking(String)

    /// Completion information summarizing token counts and performance metrics.
    case info(GenerateCompletionInfo)

    /// A tool call from the language model.
    case toolCall(ToolCall)
    ...
}

Considerations

  • Breaking Change: Adding a new enum case will require updates to any exhaustive switch statements that handle Generation in both the mlx-swift-examples code and third party apps using MLX-Swift.

Looking for feedback!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions