-
Notifications
You must be signed in to change notification settings - Fork 310
Open
Description
Currently, the Generation
enum has three cases: chunk
, info
, and toolCall
.
Many newer APIs (such as Ollama’s thinking
property in Message
) now include special properties for "thinking" directly in their response data structures, rather than encoding such tokens in text.
Rationale
Different models may use varying tokens to represent "thinking," making it complicated to detect or filter these tokens at the application layer. Moving the responsibility for handling these special tokens to the inference engine would simplify integration and keep application code cleaner.
Proposed Change
Add a new .thinking
case to the Generation
enum:
public enum Generation: Sendable {
/// A generated token represented as a String.
case chunk(String)
/// A generated "thinking" token, represented as a String.
case thinking(String)
/// Completion information summarizing token counts and performance metrics.
case info(GenerateCompletionInfo)
/// A tool call from the language model.
case toolCall(ToolCall)
...
}
Considerations
- Breaking Change: Adding a new enum case will require updates to any exhaustive
switch
statements that handleGeneration
in both the mlx-swift-examples code and third party apps using MLX-Swift.
Looking for feedback!
Metadata
Metadata
Assignees
Labels
No labels