-
Notifications
You must be signed in to change notification settings - Fork 93
feat: added responses api streaming #598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughUnified streaming handlers and new streaming entry points added across providers; Responses and Embedding public surfaces reintroduced for many providers; schemas extended for Responses streaming, embeddings, speech, transcriptions, and chat choice/log-prob types; transports detect Responses streams and adjust SSE framing; tests updated/added for text and responses streaming. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Provider as Provider (e.g., OpenAI)
participant StreamHandler as Chat/Responses StreamHandler
participant HTTP as HTTP Client
participant Transport as Bifrost Transport
Client->>Provider: ChatCompletionStream / ResponsesStream request
Provider->>StreamHandler: call (http client, url, req, auth, extraHeaders, postHookRunner)
StreamHandler->>HTTP: POST streaming endpoint (/v1/chat/completions or /v1/responses)
HTTP-->>StreamHandler: SSE / chunked stream
StreamHandler->>Transport: emit *schemas.BifrostStream chunks
Transport->>Client: SSE frames per chunk
alt First non-nil chunk indicates Responses stream
note right of Transport #EEF6FF: emit "event: <Type>\n data: <json>\n" and omit final [DONE]
else Other stream types
note right of Transport #F6FFEE: emit "data: <json>\n" and final "data: [DONE]"
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
core/providers/vertex.go (2)
585-601: Fix unused authHeader and avoid duplicate map literal (compile blocker)authHeader is created but not used; the inline map literal is passed instead. This causes an unused variable compile error and duplicates logic. Pass authHeader to the helper.
Apply this diff:
- authHeader := map[string]string{} - if key.Value != "" { - authHeader["Authorization"] = "Bearer " + key.Value - } - // Use shared OpenAI streaming logic - return handleOpenAIChatCompletionStreaming( + authHeader := map[string]string{} + if key.Value != "" { + authHeader["Authorization"] = "Bearer " + key.Value + } + // Use shared OpenAI streaming logic + return handleOpenAIChatCompletionStreaming( ctx, client, url, request, - map[string]string{"Authorization": "Bearer " + key.Value}, + authHeader, provider.networkConfig.ExtraHeaders, schemas.Vertex, postHookRunner, provider.logger, )
225-235: Use net.Error Timeout() for http.Client timeouts (fasthttp.ErrTimeout won’t match)fasthttp.ErrTimeout is not returned by net/http. You’ll misclassify timeouts as generic errors. Prefer context.DeadlineExceeded or net.Error.Timeout().
Apply these diffs to use a helper:
- if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { + if isTimeout(err) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }Repeat the same replacement at all occurrences shown in this comment.
Add this helper and import:
@@ import ( + "net" @@ ) @@ +func isTimeout(err error) bool { + if err == nil { + return false + } + if errors.Is(err, context.DeadlineExceeded) { + return true + } + var ne net.Error + return errors.As(err, &ne) && ne.Timeout() +}Also applies to: 262-268, 417-421, 449-454
🧹 Nitpick comments (8)
transports/bifrost-http/handlers/inference.go (1)
889-907: Stream Responses: send only the event payload JSON, not the entire BifrostStreamFor OpenAI Responses streaming, you compute eventType from ResponsesStreamResponse but still write data as the whole BifrostStream. Clients expect the event payload object. Marshal ResponsesStreamResponse (fallback to data if nil).
Apply this diff:
- if requestType == schemas.ResponsesStreamRequest { - // For responses API, use OpenAI-compatible format with event line - eventType := "" - if response.BifrostResponse != nil && response.BifrostResponse.ResponsesStreamResponse != nil { - eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type) - } - - if eventType != "" { - if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil { - h.logger.Warn(fmt.Sprintf("Failed to write SSE event: %v", err)) - break - } - } - - if _, err := fmt.Fprintf(w, "data: %s\n\n", responseJSON); err != nil { - h.logger.Warn(fmt.Sprintf("Failed to write SSE data: %v", err)) - break - } - } else { + if requestType == schemas.ResponsesStreamRequest { + // For responses API, use OpenAI-compatible format with event line + payload-only JSON + eventType := "" + if response.BifrostResponse != nil && response.BifrostResponse.ResponsesStreamResponse != nil { + eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type) + } + if eventType != "" { + if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil { + h.logger.Warn(fmt.Sprintf("Failed to write SSE event: %v", err)) + break + } + } + // Marshal only the stream payload + var payload interface{} = data + if response.BifrostResponse != nil && response.BifrostResponse.ResponsesStreamResponse != nil { + payload = response.BifrostResponse.ResponsesStreamResponse + } + b, err := sonic.Marshal(payload) + if err != nil { + h.logger.Warn(fmt.Sprintf("Failed to marshal streaming response: %v", err)) + break + } + if _, err := fmt.Fprintf(w, "data: %s\n\n", b); err != nil { + h.logger.Warn(fmt.Sprintf("Failed to write SSE data: %v", err)) + break + } + } else { // For other APIs, use standard format if _, err := fmt.Fprintf(w, "data: %s\n\n", responseJSON); err != nil { h.logger.Warn(fmt.Sprintf("Failed to write SSE data: %v", err)) break } }core/schemas/responses.go (2)
88-114: Error shape in ResponsesResponse may be too narrowOpenAI Responses failures often include type/param; consider extending to capture them for parity with streamed error payloads.
You can augment the struct (outside this hunk) like:
type ResponsesResponseError struct { Type *string `json:"type,omitempty"` Code string `json:"code"` Message string `json:"message"` Param *string `json:"param,omitempty"` }Please confirm the exact non-streaming error shape you expect from providers to avoid losing fields.
1686-1715: Consider a small helper for terminal/error detectionDownstream code must special-case event types to end streams. Add a method:
- IsTerminal() bool for completed/failed/incomplete
- ToBifrostError() for error-ish events mapping Message/Param/Code
This keeps providers/transports simpler and consistent.
core/providers/openai.go (5)
193-205: Stale comment refers to wrong functionComment says handleOpenAIChatCompletionStreaming but the function is handleOpenAITextCompletionStreaming. Please update.
526-744: Improve scanner robustness for large SSE lines; clean up TODO
- SSE lines (tool-call args, long deltas) can exceed bufio.Scanner’s ~64KB default. Increase the buffer to avoid truncated tokens.
- Remove/resolve the TODO “FIX THIS WITHING RESPONSE STRUCT” (typo) or link to a tracking issue.
Apply near scanner creation:
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + // Allow larger SSE lines (up to 10MB) + scanner.Buffer(make([]byte, 0, 64*1024), 10*1024*1024)Also update the TODO line or delete it.
746-763: Copy/paste nit in comment“Check if chat completion is allowed” — this handler is for Responses; update the comment for clarity.
861-1038: Responses streaming: handle failed/incomplete as errors and mark stream end on [DONE]
- Currently only type=="error" is converted to BifrostError. Treat response.failed (and likely response.incomplete) as terminal errors to propagate proper failure state.
- If an upstream sends [DONE] without a response.completed event, the code breaks out without setting the stream end indicator. Set it to avoid dangling streams.
- Increase scanner buffer like chat for long lines.
Apply these diffs:
- Increase scanner buffer
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + // Allow larger SSE lines (up to 10MB) + scanner.Buffer(make([]byte, 0, 64*1024), 10*1024*1024)
- Mark end-of-stream on [DONE]
- if line == "data: [DONE]" { - break - } + if line == "data: [DONE]" { + // Some providers may not emit response.completed; ensure downstream sees end. + ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true) + return + }
- Treat response.failed/incomplete as errors
- if response.ResponsesStreamResponse.Type == schemas.ResponsesStreamResponseTypeCompleted { + switch response.ResponsesStreamResponse.Type { + case schemas.ResponsesStreamResponseTypeCompleted: ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true) processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger) return - } + case schemas.ResponsesStreamResponseTypeFailed, schemas.ResponsesStreamResponseTypeIncomplete: + // Build a BifrostError from stream payload + be := &schemas.BifrostError{ + IsBifrostError: false, + Type: schemas.Ptr(string(response.ResponsesStreamResponse.Type)), + Error: &schemas.ErrorField{}, + } + if response.ResponsesStreamResponse.Message != nil { + be.Error.Message = *response.ResponsesStreamResponse.Message + } else if response.ResponsesStreamResponse.Response != nil && response.ResponsesStreamResponse.Response.Error != nil { + be.Error.Message = response.ResponsesStreamResponse.Response.Error.Message + be.Error.Code = response.ResponsesStreamResponse.Response.Error.Code + } + if response.ResponsesStreamResponse.Param != nil { + be.Error.Param = *response.ResponsesStreamResponse.Param + } + ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true) + processAndSendBifrostError(ctx, postHookRunner, be, responseChan, logger) + return + }
- Verify if upstream always emits response.completed and never [DONE] for Responses API. If [DONE] can occur, the above change is necessary to reliably end streams.
861-895: Stream client timeout for long-lived SSEThe stream http.Client uses a global Timeout. For SSE, this can abort healthy long streams. Consider a dedicated stream client with Timeout=0 (no overall deadline) and per-connection timeouts at transport level.
Confirm your DefaultRequestTimeoutInSeconds is large enough for typical streams, or switch to a separate no-timeout stream client.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(8 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (8)
core/schemas/transcriptions.go (1)
core/schemas/bifrost.go (1)
LogProbs(339-344)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(97-97)BifrostResponse(236-252)BifrostError(392-401)ResponsesStreamRequest(105-105)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(302-307)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(339-344)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(97-97)BifrostResponse(236-252)BifrostError(392-401)ResponsesStreamRequest(105-105)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/providers/openai.go (6)
core/schemas/provider.go (3)
PostHookRunner(191-191)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (13)
BifrostChatRequest(187-193)BifrostStream(381-384)BifrostError(392-401)OpenAI(53-53)ModelProvider(50-50)ErrorField(409-416)LLMUsage(294-300)BifrostContextKeyStreamEndIndicator(123-123)BifrostResponse(236-252)BifrostResponsesRequest(195-201)ResponsesRequest(104-104)BifrostEmbeddingRequest(203-209)EmbeddingRequest(106-106)core/utils.go (1)
Ptr(12-14)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
🔇 Additional comments (18)
core/providers/openrouter.go (1)
131-131: LGTM: Streaming handler updated for clarity.The refactor from
handleOpenAIStreamingtohandleOpenAIChatCompletionStreamingimproves code clarity by explicitly indicating this handler is for chat completion streaming. All arguments and behavior are preserved.core/providers/cerebras.go (1)
138-138: LGTM: Consistent streaming handler refactor.The update to
handleOpenAIChatCompletionStreamingmaintains consistency with the broader refactor across all OpenAI-compatible providers.core/providers/gemini.go (1)
180-180: LGTM: Streaming handler aligned with chat completion semantics.The handler rename correctly reflects that this path handles chat completion streaming specifically, improving code maintainability.
core/providers/azure.go (1)
371-371: LGTM: Azure streaming handler updated consistently.The handler rename preserves Azure's specific deployment-based URL construction and authentication logic while aligning with the unified chat streaming pathway.
core/providers/groq.go (1)
179-179: LGTM: Groq streaming handler refactored consistently.The update to
handleOpenAIChatCompletionStreamingmaintains consistency across all OpenAI-compatible providers without altering functionality.core/providers/parasail.go (1)
121-121: LGTM: Parasail streaming handler updated.The handler rename improves clarity by explicitly indicating this is for chat completion streaming.
core/providers/ollama.go (1)
151-151: LGTM: Ollama streaming handler refactored.The update to
handleOpenAIChatCompletionStreamingaligns Ollama with the unified chat streaming approach across providers.core/providers/sgl.go (1)
153-153: LGTM: SGL streaming handler consistently updated.The final provider update completes the consistent refactor across all OpenAI-compatible providers, improving code clarity and separation between chat completion and responses streaming pathways.
core/providers/mistral.go (1)
133-144: Rename to chat-specific streaming helper looks goodCall switch to handleOpenAIChatCompletionStreaming aligns with the refactor. No issues spotted.
core/providers/vertex.go (1)
590-600: Chat streaming path switch LGTM (once header fix above is applied)Switch to handleOpenAIChatCompletionStreaming matches the refactor. With the authHeader fix, this is good.
Ensure handleOpenAIChatCompletionStreaming accepts the same header semantics as prior handleOpenAIStreaming for Vertex.
transports/bifrost-http/handlers/inference.go (2)
867-873: Request type detection on first chunk looks correctDeriving requestType from ExtraFields on first non-nil item is fine and keeps the stream formatting consistent.
922-929: DONE sentinel gating matches Responses API behavior[DONE] only for non-responses streams is correct per OpenAI Responses spec.
transports/bifrost-http/integrations/utils.go (3)
456-472: Request type capture for stream-format branching LGTMCapturing RequestType once and branching format accordingly is clean and avoids per-chunk checks.
568-609: Responses streaming event/data formatting is correctEmitting event: and then data: for ResponsesStreamRequest matches OpenAI Responses streaming. Fallback to standard SSE for others is good.
Confirm DeriveOpenAIStreamFromBifrostResponse returns the event payload object (not a pre-wrapped SSE string) for Responses streams.
619-626: Conditional [DONE] emission is correctNo [DONE] for Responses API; emit for others. Matches expected client behavior.
core/schemas/bifrost.go (1)
250-252: Embedding ResponsesStreamResponse pointer is appropriateKeeps stream payload accessible without bloating non-stream responses. Watch for field-name collisions across embeddings (none apparent).
Please ensure any JSON schema/contract tests cover presence/absence of these fields across streaming vs non-streaming responses.
core/providers/openai.go (2)
500-521: ChatCompletionStream entrypoint: LGTMNew entrypoint and routing to shared chat streaming handler looks consistent with the Provider interface.
Confirm all other providers in this stack updated to call handleOpenAIChatCompletionStreaming (as per PR summary).
1040-1136: Embedding request path: LGTMConsistent with other non-streaming handlers; proper extra fields set.
d0f4414 to
2fe27b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
core/schemas/responses.go (1)
1623-1684: Fix streaming event type constant values per previous review.Several streaming event constant values remain incorrect and do not match the OpenAI Responses API specification. This is a duplicate of the previous review comment:
- Line 1635:
ResponsesStreamResponseTypeOutputTextAddedvalue should be"response.output_text.delta"(not.added)- Line 1641:
ResponsesStreamResponseTypeFunctionCallArgumentsAddedvalue should be"response.function_call_arguments.delta"(not.added)- Line 1673:
ResponsesStreamResponseTypeCodeInterpreterCallCodeDeltavalue should be"response.code_interpreter_call.code.delta"(missing dot betweencallandcode)- Line 1674:
ResponsesStreamResponseTypeCodeInterpreterCallCodeDonevalue should be"response.code_interpreter_call.code.done"(missing dot betweencallandcode)- Line 1676:
ResponsesStreamResponseTypeOutputTextAnnotationAddedvalue should be"response.output_text.annotation.added"(missing dot beforeannotation)- Missing constants:
"response.cancelled"(terminal event) and"done"(final SSE sentinel)Based on learnings
Apply these corrections:
- ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added" + ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta" ResponsesStreamResponseTypeOutputTextDone ResponsesStreamResponseType = "response.output_text.done" ResponsesStreamResponseTypeRefusalDelta ResponsesStreamResponseType = "response.refusal.delta" ResponsesStreamResponseTypeRefusalDone ResponsesStreamResponseType = "response.refusal.done" - ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.added" + ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.delta" ResponsesStreamResponseTypeFunctionCallArgumentsDone ResponsesStreamResponseType = "response.function_call_arguments.done"ResponsesStreamResponseTypeCodeInterpreterCallInProgress ResponsesStreamResponseType = "response.code_interpreter_call.in_progress" ResponsesStreamResponseTypeCodeInterpreterCallInterpreting ResponsesStreamResponseType = "response.code_interpreter_call.interpreting" ResponsesStreamResponseTypeCodeInterpreterCallCompleted ResponsesStreamResponseType = "response.code_interpreter_call.completed" - ResponsesStreamResponseTypeCodeInterpreterCallCodeDelta ResponsesStreamResponseType = "response.code_interpreter_call_code.delta" - ResponsesStreamResponseTypeCodeInterpreterCallCodeDone ResponsesStreamResponseType = "response.code_interpreter_call_code.done" + ResponsesStreamResponseTypeCodeInterpreterCallCodeDelta ResponsesStreamResponseType = "response.code_interpreter_call.code.delta" + ResponsesStreamResponseTypeCodeInterpreterCallCodeDone ResponsesStreamResponseType = "response.code_interpreter_call.code.done" - ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added" + ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"Add missing terminal events:
ResponsesStreamResponseTypeCustomToolCallInputDelta ResponsesStreamResponseType = "response.custom_tool_call_input.delta" ResponsesStreamResponseTypeCustomToolCallInputDone ResponsesStreamResponseType = "response.custom_tool_call_input.done" + ResponsesStreamResponseTypeCancelled ResponsesStreamResponseType = "response.cancelled" + ResponsesStreamResponseTypeDone ResponsesStreamResponseType = "done" + ResponsesStreamResponseTypeError ResponsesStreamResponseType = "error" )
🧹 Nitpick comments (3)
core/schemas/transcriptions.go (1)
60-66: Consider omitting emptytextin streamsWithout
omitempty, streaming deltas that only deliverdeltaevents will still serialize"text": "", which deviates from the OpenAI payloads we mirror. Makingtextoptional will trim that noise while retaining the field whenever it’s populated.core/providers/openai.go (2)
567-582: Consider extracting duplicated error handling logic.The error handling for
http.NewRequestWithContext(lines 567-582) andclient.Do(lines 594-609) contains identical logic for context cancellation, timeout, and generic errors. This pattern appears multiple times across different streaming handlers in this file.Extract the error handling into a helper function:
func handleStreamingRequestError(err error, providerName schemas.ModelProvider) *schemas.BifrostError { if errors.Is(err, context.Canceled) { return &schemas.BifrostError{ IsBifrostError: false, Error: &schemas.ErrorField{ Type: schemas.Ptr(schemas.RequestCancelled), Message: schemas.ErrRequestCancelled, Error: err, }, } } if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) } return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) }Then use it:
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody)) if err != nil { - if errors.Is(err, context.Canceled) { - return nil, &schemas.BifrostError{ - IsBifrostError: false, - Error: &schemas.ErrorField{ - Type: schemas.Ptr(schemas.RequestCancelled), - Message: schemas.ErrRequestCancelled, - Error: err, - }, - } - } - if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { - return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) - } - return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) + return nil, handleStreamingRequestError(err, providerName) }Also applies to: 594-609
970-991: Event line filtering logic is sound but could be simplified.The logic to skip "event:" lines when parsing SSE data is correct for the Responses API streaming format. However, the nested conditions could be more readable.
Consider simplifying:
var jsonData string // Parse SSE data if after, ok := strings.CutPrefix(line, "data: "); ok { jsonData = after - } else if !strings.HasPrefix(line, "event:") { - // Handle raw JSON errors (without "data: " prefix) but skip event lines - jsonData = line - } else { - // This is an event line, skip it + } else if strings.HasPrefix(line, "event:") { + // Skip event lines continue + } else { + // Handle raw JSON errors (without "data: " prefix) + jsonData = line }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(8 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- core/providers/sgl.go
- core/providers/vertex.go
- core/schemas/textcompletions.go
- core/providers/azure.go
- core/providers/groq.go
- core/schemas/embedding.go
- core/providers/gemini.go
🧰 Additional context used
🧬 Code graph analysis (8)
core/schemas/transcriptions.go (1)
core/schemas/bifrost.go (1)
LogProbs(339-344)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(339-344)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(302-307)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(97-97)BifrostResponse(236-252)BifrostError(407-416)ResponsesStreamRequest(105-105)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(97-97)BifrostResponse(236-252)BifrostError(407-416)ResponsesStreamRequest(105-105)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/responses.go (2)
plugins/mocker/main.go (1)
Response(97-103)ui/lib/types/logs.ts (1)
Annotation(236-239)
core/providers/openai.go (4)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (14)
BifrostChatRequest(187-193)BifrostStream(381-384)BifrostError(407-416)OpenAI(53-53)ModelProvider(50-50)ErrorField(424-431)RequestCancelled(376-376)LLMUsage(294-300)BifrostContextKeyStreamEndIndicator(123-123)BifrostResponse(236-252)RequestType(97-97)BifrostResponsesRequest(195-201)ResponsesRequest(104-104)ResponsesStreamRequest(105-105)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)
🔇 Additional comments (10)
core/providers/mistral.go (1)
133-143: Streaming handler swap looks goodThe migration to
handleOpenAIChatCompletionStreamingkeeps the wiring identical while opting into the new chat-specific path. ✅core/schemas/bifrost.go (1)
251-251: LGTM!The embedded
*ResponsesStreamResponsepointer enablesBifrostResponseto carry Responses API streaming data alongside existing fields. The pointer semantics ensure backward compatibility.core/schemas/responses.go (2)
88-88: LGTM!The
Errorfield addition allowsResponsesResponseto capture error states directly in the response structure, which is appropriate for the Responses API.
1686-1715: LGTM with dependency on event constant fixes.The
ResponsesStreamResponsestruct provides appropriate fields to carry granular streaming event payloads. The optional pointer fields support the union-style type pattern needed for different event types. However, ensure the event type constants (lines 1623-1684) are corrected before this struct is used in production.core/providers/openai.go (6)
508-526: LGTM!The
ChatCompletionStreammethod properly delegates to the new sharedhandleOpenAIChatCompletionStreaminghandler. The operation check and parameter passing are correct.
757-850: LGTM!The
Responsesmethod andhandleOpenAIResponsesRequestfunction correctly handle non-streaming Responses API requests. The request type is properly set toschemas.ResponsesRequestand the endpoint is correctly formed.
852-870: LGTM!The
ResponsesStreammethod properly delegates tohandleOpenAIResponsesStreaming. The operation check forResponsesStreamRequestis correct.
1054-1072: LGTM!The
Embeddingmethod properly delegates to the sharedhandleOpenAIEmbeddingRequesthandler with correct parameters.
1076-1147: LGTM!The
handleOpenAIEmbeddingRequestfunction correctly handles non-streaming embedding requests. Request construction, error handling, and response processing are all implemented properly.
1033-1046: Ensure final response is emitted on stream endThe handler sends a final response when
Type == ResponsesStreamResponseTypeCompleted, but if the scanner finishes without that event (noscanner.Err()), no response is sent. Confirm whether the OpenAI Responses API always emits acompletedevent before closing; if it may close without one, add anelseblock afterif err := scanner.Err()to emit a final chunk (mirroring chat completion streaming).
2fe27b0 to
cc59aff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/providers/vertex.go (1)
585-599: Do not override Vertex’s OAuth header on streaming calls.
authHeaderis populated to skip the Authorization header whenkey.Valueis empty, yet the call still hardcodesmap[string]string{"Authorization": "Bearer " + key.Value}. With OAuth-based keys (key.Value == "") this forcesAuthorization: Bearerand prevents the Google client from attaching its own token, resulting in 401s on streaming. Pass theauthHeadermap you just built instead.- return handleOpenAIChatCompletionStreaming( + return handleOpenAIChatCompletionStreaming( ctx, client, url, request, - map[string]string{"Authorization": "Bearer " + key.Value}, + authHeader, provider.networkConfig.ExtraHeaders, schemas.Vertex, postHookRunner, provider.logger, )
♻️ Duplicate comments (2)
core/schemas/embedding.go (1)
122-145: Reset embedding union before decoding
UnmarshalJSONnever clears the other shapes before trying the next decode, so reusing the sameBifrostEmbeddingResponseinstance can leak the previous branch (e.g., prior string sticks around when the next payload is a vector). Nil out every variant up front so each decode starts clean.func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error { + be.EmbeddingStr = nil + be.EmbeddingArray = nil + be.Embedding2DArray = nil + // First, try to unmarshal as a direct string var stringContent string if err := sonic.Unmarshal(data, &stringContent); err == nil { be.EmbeddingStr = &stringContent return nilcore/schemas/responses.go (1)
1635-1677: Align Responses stream event names with OpenAI specSeveral event literals don’t match the Responses API stream (
response.output_text.delta,response.function_call_arguments.delta,response.output_text.annotation.added). With the current strings we’ll never classify those chunks correctly.ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added" + ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta" @@ - ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.added" + ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.delta" @@ - ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added" + ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"
🧹 Nitpick comments (3)
core/providers/openai.go (3)
568-582: Deduplicate error handling logic.The error handling blocks after
http.NewRequestWithContext(lines 568-582) andclient.Do(lines 594-609) are identical. Consider extracting this into a helper function to reduce duplication and improve maintainability.Example refactor:
func handleHTTPError(err error, providerName schemas.ModelProvider) *schemas.BifrostError { if errors.Is(err, context.Canceled) { return &schemas.BifrostError{ IsBifrostError: false, Error: &schemas.ErrorField{ Type: schemas.Ptr(schemas.RequestCancelled), Message: schemas.ErrRequestCancelled, Error: err, }, } } if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) } return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) }Then use it as:
if err != nil { return nil, handleHTTPError(err, providerName) }Also applies to: 594-609
973-994: Simplify event line filtering logic.The event line filtering has redundant checks:
- Line 974 skips all lines starting with
"event:"- Line 988's condition
!strings.HasPrefix(line, "event:")and the else block at lines 991-994 are unreachable because event lines are already filtered at line 974Apply this diff to simplify the logic:
// Skip empty lines, comments, and event lines if line == "" || strings.HasPrefix(line, ":") || strings.HasPrefix(line, "event:") { continue } // Check for end of stream if line == "data: [DONE]" { break } var jsonData string // Parse SSE data if after, ok := strings.CutPrefix(line, "data: "); ok { jsonData = after - } else if !strings.HasPrefix(line, "event:") { - // Handle raw JSON errors (without "data: " prefix) but skip event lines + } else { + // Handle raw JSON errors (without "data: " prefix) jsonData = line - } else { - // This is an event line, skip it - continue }
872-1056: Consider adding usage tracking for consistency.The
handleOpenAIResponsesStreamingfunction does not accumulate usage information likehandleOpenAIChatCompletionStreamingdoes (lines 690-709). If the Responses API returns usage data in stream chunks, consider adding similar usage tracking logic for consistency and completeness.Review the Responses API specification to determine if usage information should be collected and returned in the final response. If so, implement similar logic to the chat completion streaming handler.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(8 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- transports/bifrost-http/handlers/server.go
- core/schemas/providers/mistral/types.go
- core/schemas/providers/mistral/embedding.go
- plugins/telemetry/main.go
✅ Files skipped from review due to trivial changes (1)
- core/schemas/providers/gemini/chat.go
🚧 Files skipped from review as they are similar to previous changes (7)
- core/providers/cerebras.go
- core/providers/azure.go
- core/providers/openrouter.go
- core/providers/mistral.go
- core/providers/parasail.go
- core/providers/groq.go
- core/schemas/speech.go
🧰 Additional context used
🧬 Code graph analysis (7)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/transcriptions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/responses.go (2)
plugins/mocker/main.go (1)
Response(97-103)core/schemas/bifrost.go (1)
LogProbs(341-346)
core/providers/openai.go (5)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (16)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
🔇 Additional comments (5)
core/providers/openai.go (5)
505-526: LGTM! Clean refactor to shared streaming handler.The refactored
ChatCompletionStreammethod properly delegates to the sharedhandleOpenAIChatCompletionStreamingfunction with correct parameters and authorization header construction.
757-774: LGTM! Well-structured Responses API method.The
Responsesmethod follows the established pattern and properly delegates to the shared handler with correct parameters.
776-850: LGTM! Consistent implementation.The
handleOpenAIResponsesRequestfunction follows the established pattern for non-streaming requests with proper error handling and response construction.
852-870: LGTM! Properly delegates to streaming handler.The
ResponsesStreammethod correctly delegates to the shared streaming handler with appropriate parameters.
1058-1154: LGTM! Well-implemented embedding support.The
Embeddingmethod andhandleOpenAIEmbeddingRequestfunction follow the established patterns for non-streaming requests with proper error handling, authorization, and response construction.
298c854 to
f9dda5e
Compare
cc59aff to
4a84b84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
core/schemas/chatcompletions.go (1)
351-363: Update comment to reflect all three embedded pointer types.The comment at lines 353-354 states "IMPORTANT: Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice should be non-nil at a time" but the struct embeds three pointer types:
BifrostTextCompletionResponseChoice,BifrostNonStreamResponseChoice, andBifrostStreamResponseChoice. The comment should mention all three to accurately document the constraint.Apply this diff to update the comment:
// BifrostChatResponseChoice represents a choice in the completion result. // This struct can represent either a streaming or non-streaming response choice. -// IMPORTANT: Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice -// should be non-nil at a time. +// IMPORTANT: Only one of BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice, +// or BifrostStreamResponseChoice should be non-nil at a time. type BifrostChatResponseChoice struct {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(8 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- core/schemas/providers/mistral/types.go
- core/schemas/providers/mistral/embedding.go
- plugins/telemetry/main.go
- transports/bifrost-http/handlers/server.go
🚧 Files skipped from review as they are similar to previous changes (10)
- core/providers/cerebras.go
- core/providers/openrouter.go
- core/providers/azure.go
- core/providers/gemini.go
- core/providers/ollama.go
- core/schemas/textcompletions.go
- core/schemas/speech.go
- core/providers/sgl.go
- core/providers/vertex.go
- core/providers/parasail.go
🧰 Additional context used
🧬 Code graph analysis (6)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/providers/openai.go (4)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (16)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (7)
core/schemas/transcriptions.go (1)
56-81: LGTM!The transcription types follow a clean embedded pointer pattern that clearly separates streaming and non-streaming fields. The structure is consistent with the broader changes in this PR.
core/schemas/embedding.go (1)
97-145: LGTM!The embedding types and custom marshaling logic are well-implemented. The union-type pattern with exclusive field usage is clear and consistent with similar patterns elsewhere in the codebase.
core/schemas/providers/gemini/chat.go (1)
638-641: LGTM!Removing the explicit nil check is correct since
len(nil)returns 0 for slices in Go. This simplification makes the code more idiomatic while preserving behavior.core/schemas/responses.go (1)
1618-1715: LGTM!The streaming types comprehensively model the OpenAI Responses API streaming events. The
ResponsesStreamResponsestruct with its optional pointer fields appropriately handles the union-type nature of different streaming event payloads.core/providers/mistral.go (1)
133-143: LGTM!The rename to
handleOpenAIChatCompletionStreamingclarifies that this is specifically for chat completion streaming, distinguishing it from the new Responses API streaming path. This aligns with the broader PR refactoring.core/providers/groq.go (1)
179-189: LGTM!Consistent with the Mistral provider changes, this rename clarifies the streaming handler's purpose and aligns with the separation of chat completion and Responses API streaming paths.
transports/bifrost-http/handlers/inference.go (1)
828-895: LGTM!The dynamic SSE format detection elegantly handles both standard and Responses API streaming:
requestTypeis determined from the first non-nil chunk's metadata, ensuring consistent formatting throughout the stream- For Responses API (
ResponsesStreamRequest), optionalevent:lines are emitted beforedata:lines, matching OpenAI's SSE format- The conditional
[DONE]marker correctly omits the marker for Responses API streams (which signal completion by closing the stream) while preserving it for other streaming APIsThe implementation correctly handles the format differences between streaming API types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
transports/bifrost-http/handlers/inference.go (1)
828-846: Populate RequestType on error chunks
BifrostError.ExtraFields.RequestTypeisn’t set in any streaming error path, so if the first chunk is an error the handler’srequestTyperemains empty and SSE framing/[DONE]emission will fail. AddExtraFields.RequestTypewhen constructingBifrostErrorin streaming flows (e.g. in core/bifrost.go and provider error wrappers).
🧹 Nitpick comments (6)
transports/bifrost-http/handlers/inference.go (1)
856-880: LGTM with minor observation.The conditional SSE formatting correctly distinguishes between Responses API streaming (with event lines) and standard streaming (data-only). The logic appropriately handles both formats.
Note: This logic is duplicated in
transports/bifrost-http/integrations/utils.go. Consider extracting to a shared helper function to maintain consistency and reduce maintenance burden.transports/bifrost-http/integrations/utils.go (2)
456-471: Verify request type population across all streaming paths.The runtime detection of
requestTypefrom the first chunk'sExtraFields.RequestTypeis identical to the logic ininference.go. Ensure all streaming code paths consistently populate this field.Note: This detection logic is duplicated between
handlers/inference.goandintegrations/utils.go. Consider extracting to a shared utility function.
568-609: LGTM with refactoring suggestion.The conditional SSE formatting correctly handles both Responses API (with event lines) and standard streaming (data-only) formats based on the detected request type.
The logic here closely mirrors
handlers/inference.golines 856-880. Consider extracting the common SSE formatting logic into a shared helper function to reduce duplication and ensure consistency.core/schemas/chatcompletions.go (1)
351-363: Consider adding custom JSON marshaling to enforce the one-of constraint.The comment states that only one of the three embedded pointer structs should be non-nil at a time, but there's no runtime enforcement of this invariant. Other similar types in the codebase (e.g.,
ChatMessageContentandChatToolChoicein this same file) implement customMarshalJSONandUnmarshalJSONmethods to validate and enforce one-of semantics.Without validation, callers could accidentally set multiple fields, leading to ambiguous JSON output when the embedded struct fields are flattened.
Consider implementing custom JSON marshaling similar to the pattern used for
ChatToolChoice(lines 128-165):+// MarshalJSON implements custom JSON marshalling for BifrostChatResponseChoice. +func (bc BifrostChatResponseChoice) MarshalJSON() ([]byte, error) { + // Count how many embedded structs are set + set := 0 + if bc.BifrostTextCompletionResponseChoice != nil { + set++ + } + if bc.BifrostNonStreamResponseChoice != nil { + set++ + } + if bc.BifrostStreamResponseChoice != nil { + set++ + } + if set > 1 { + return nil, fmt.Errorf("multiple embedded response types are set; only one should be non-nil") + } + // Proceed with default marshaling if validation passes + type Alias BifrostChatResponseChoice + return sonic.Marshal((*Alias)(&bc)) +}core/schemas/transcriptions.go (1)
56-66: Consider adding custom JSON marshaling to enforce mutual exclusivity.
BifrostTranscribeembeds two optional pointer structs (*BifrostTranscribeNonStreamResponseand*BifrostTranscribeStreamResponse). Similar to the pattern incore/schemas/chatcompletions.go(seeBifrostChatResponseChoice), without custom marshaling, callers could set both fields, leading to ambiguous JSON output when the embedded struct fields are flattened.Consider implementing validation to ensure only one embedded struct is non-nil at a time:
+// MarshalJSON implements custom JSON marshalling for BifrostTranscribe. +func (bt BifrostTranscribe) MarshalJSON() ([]byte, error) { + if bt.BifrostTranscribeNonStreamResponse != nil && bt.BifrostTranscribeStreamResponse != nil { + return nil, fmt.Errorf("both non-stream and stream response types are set; only one should be non-nil") + } + type Alias BifrostTranscribe + return sonic.Marshal((*Alias)(&bt)) +}core/providers/openai.go (1)
593-609: Remove redundant context cancellation check afterclient.Do.The context cancellation check on lines 595-604 is redundant. If the context is canceled after the request is created but before
client.Dois called, the HTTP client will return an error (wrappedcontext.Canceled) immediately. The error handling on line 595 already checks forcontext.Canceled, so the explicit check on lines 595-604 is unnecessary.The same pattern appears in several streaming functions (
handleOpenAIResponsesStreaming,SpeechStream,TranscriptionStream), suggesting this refactor could be applied consistently.Apply this diff to simplify the error handling:
// Make the request resp, err := client.Do(req) if err != nil { - if errors.Is(err, context.Canceled) { - return nil, &schemas.BifrostError{ - IsBifrostError: false, - Error: &schemas.ErrorField{ - Type: schemas.Ptr(schemas.RequestCancelled), - Message: schemas.ErrRequestCancelled, - Error: err, - }, - } - } - if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { - return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) - } - return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) + return nil, handleHTTPClientError(err, providerName) }And define a helper function to reduce duplication:
func handleHTTPClientError(err error, providerName schemas.ModelProvider) *schemas.BifrostError { if errors.Is(err, context.Canceled) { return &schemas.BifrostError{ IsBifrostError: false, Error: &schemas.ErrorField{ Type: schemas.Ptr(schemas.RequestCancelled), Message: schemas.ErrRequestCancelled, Error: err, }, } } if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) } return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(8 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- core/schemas/providers/mistral/types.go
- plugins/telemetry/main.go
- core/schemas/providers/mistral/embedding.go
- transports/bifrost-http/handlers/server.go
🚧 Files skipped from review as they are similar to previous changes (7)
- core/providers/sgl.go
- core/providers/vertex.go
- core/schemas/speech.go
- core/providers/groq.go
- core/providers/cerebras.go
- core/providers/gemini.go
- core/schemas/providers/gemini/chat.go
🧰 Additional context used
🧬 Code graph analysis (7)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/transcriptions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/providers/openai.go (5)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (16)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/utils.go (1)
Ptr(12-14)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (16)
core/providers/parasail.go (1)
121-121: LGTM!The function rename from
handleOpenAIStreamingtohandleOpenAIChatCompletionStreamingaligns with the PR's objective to separate chat completion streaming from Responses API streaming. The change is consistent with similar updates across other providers.core/providers/openrouter.go (1)
131-131: LGTM!The function rename is consistent with the broader refactoring across providers to use specialized chat completion streaming handlers.
core/providers/ollama.go (1)
151-151: LGTM!Consistent with the streaming handler refactoring across all providers.
core/providers/azure.go (1)
371-371: LGTM!The function rename maintains consistency with the chat-centric streaming refactoring.
core/providers/mistral.go (1)
133-133: LGTM!Completes the consistent refactoring of streaming handlers across all providers.
transports/bifrost-http/handlers/inference.go (1)
889-895: LGTM!Correctly omits the
[DONE]marker for OpenAI Responses API, which ends when the stream closes rather than with an explicit marker.transports/bifrost-http/integrations/utils.go (1)
619-625: LGTM!Correctly omits the
[DONE]marker for Responses API streaming, which terminates on stream closure per OpenAI's specification.core/schemas/textcompletions.go (1)
72-78: Fix JSON tag formatting error.Line 76 has an extra space in the JSON tag:
json:" Tokens"should bejson:"tokens"to match the Go naming convention and avoid serialization issues.Apply this diff:
- Tokens []string `json:" Tokens"` + Tokens []string `json:"tokens"`Likely an incorrect or invalid review comment.
core/schemas/embedding.go (1)
97-145: LGTM!The custom JSON marshaling logic correctly handles the three possible embedding formats (string, 1D array, 2D array). While the
UnmarshalJSONmethod doesn't reset fields before decoding (as noted in a previous review), the author confirmed that instances are not reused in their usage patterns, making the implementation safe for the current use case.core/schemas/bifrost.go (1)
253-253: LGTM!The addition of
*ResponsesStreamResponseas an embedded pointer field is consistent with the existing pattern used inBifrostResponse(e.g.,*ResponsesResponseon line 252). This enables support for the new Responses API streaming format.core/schemas/responses.go (2)
88-88: LGTM!The addition of the
Errorfield toResponsesResponseenables proper error handling for the Responses API. The field is optional (pointer type) and uses the well-definedResponsesResponseErrortype.
1618-1715: LGTM!The streaming types introduced here provide comprehensive coverage of the OpenAI Responses API streaming events. The
ResponsesStreamResponseTypeconstants define a wide range of event types, and theResponsesStreamResponsestruct uses optional fields to accommodate the various event payload structures.Based on the previous review discussion, the event type string literals have been verified against the official OpenAI documentation.
core/providers/openai.go (4)
757-870: LGTM!The
ResponsesandResponsesStreammethods follow consistent patterns with the existing chat completion implementations. The URL endpoints (/v1/responses) are correct for the OpenAI Responses API, and error handling is properly implemented.
1058-1154: LGTM!The
Embeddingmethod and its handlerhandleOpenAIEmbeddingRequestfollow the established request/response pattern used throughout the provider implementations. The endpoint (/v1/embeddings) is correct, and error handling is properly implemented.
1008-1029: LGTM!The error handling for
ResponsesStreamResponseTypeErrorproperly constructs aBifrostErrorfrom the stream response fields and sets the stream end indicator before propagating the error through the channel.
973-994: SSE event lines can be safely skipped – thetypefield is mandatory in the JSON payload (ResponsesStreamResponse.Typeis non-omitempty), so event lines need not be parsed separately.
f9dda5e to
fbe1158
Compare
4a84b84 to
bb24cd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
core/schemas/responses.go (1)
1618-1684: Fix event type string values to match OpenAI Responses API specification.The event type constants still contain incorrect values that were flagged in a previous review. These must be corrected to ensure compatibility with the OpenAI Responses API streaming format:
- Line 1635: Change
"response.output_text.added"to"response.output_text.delta"- Line 1641: Change
"response.function_call_arguments.added"to"response.function_call_arguments.delta"- Line 1676: Change
"response.output_text_annotation.added"to"response.output_text.annotation.added"(add dot beforeannotation)Apply this diff to fix the event type values:
- ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added" + ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta" ResponsesStreamResponseTypeOutputTextDone ResponsesStreamResponseType = "response.output_text.done" ResponsesStreamResponseTypeRefusalDelta ResponsesStreamResponseType = "response.refusal.delta" ResponsesStreamResponseTypeRefusalDone ResponsesStreamResponseType = "response.refusal.done" - ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.added" + ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.delta" ResponsesStreamResponseTypeFunctionCallArgumentsDone ResponsesStreamResponseType = "response.function_call_arguments.done"- ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added" + ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"These corrections are based on the official OpenAI Responses API streaming documentation.
core/providers/openai.go (2)
1026-1043: Preserve Responses API error events instead of downgrading to generic errors.When encountering
ResponsesStreamResponseTypeError, the code converts it to a genericBifrostErrorviaprocessAndSendBifrostError. This breaks compatibility with the OpenAI Responses API streaming specification, where clients expect structured error events with theevent: errorline and the full error schema (type/code/message/param fields).Impact: Clients consuming the Responses API will not receive properly formatted error events, breaking their error handling logic.
Solution: Forward the error as a normal responses-stream event (preserving all fields from
ResponsesStreamResponse) so the transport layer can emit the correctevent: errorline with the structured payload. Only set the stream-end indicator and terminate after sending that structured event.Consider this approach:
if response.ResponsesStreamResponse.Type == schemas.ResponsesStreamResponseTypeError { - bifrostErr := &schemas.BifrostError{ - Type: schemas.Ptr(string(schemas.ResponsesStreamResponseTypeError)), - IsBifrostError: false, - Error: &schemas.ErrorField{}, - } - - if response.ResponsesStreamResponse.Message != nil { - bifrostErr.Error.Message = *response.ResponsesStreamResponse.Message - } - if response.ResponsesStreamResponse.Param != nil { - bifrostErr.Error.Param = *response.ResponsesStreamResponse.Param - } - + // Send error as a structured responses-stream event, then terminate + response.ExtraFields.RequestType = schemas.ResponsesStreamRequest + response.ExtraFields.Provider = providerName + response.ExtraFields.ModelRequested = request.Model + response.ExtraFields.ChunkIndex = response.ResponsesStreamResponse.SequenceNumber + response.ExtraFields.Latency = time.Since(startTime).Milliseconds() ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true) - processAndSendBifrostError(ctx, postHookRunner, bifrostErr, responseChan, logger) + processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger) return }This preserves the OpenAI error event structure for proper client handling.
671-671: Fix TODO comment typo and address the underlying issue.The TODO comment contains a typo:
"WITHING"should be"WITHIN". More importantly, this comment was marked as addressed in commitcc59affin a previous review, but the typo is still present. This suggests the underlying issue about error response handling within the response struct may not have been fully resolved either.Apply this diff to fix the typo:
- //TODO: FIX THIS WITHING RESPONSE STRUCT + //TODO: FIX THIS WITHIN RESPONSE STRUCTAdditionally, consider addressing or clarifying what needs to be fixed regarding error response handling, or remove the TODO if the issue has been resolved.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(7 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- plugins/telemetry/main.go
- core/schemas/providers/mistral/types.go
- transports/bifrost-http/handlers/server.go
- core/schemas/providers/mistral/embedding.go
🚧 Files skipped from review as they are similar to previous changes (10)
- core/schemas/providers/gemini/chat.go
- core/providers/parasail.go
- core/providers/ollama.go
- core/providers/gemini.go
- core/providers/openrouter.go
- transports/bifrost-http/integrations/utils.go
- core/providers/vertex.go
- core/providers/cerebras.go
- core/providers/sgl.go
- core/schemas/embedding.go
🧰 Additional context used
🧬 Code graph analysis (7)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/transcriptions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/providers/openai.go (5)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (14)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
core/schemas/textcompletions.go (1)
72-78: LGTM!The
TextCompletionLogProbtype is well-structured and follows Go conventions. The JSON tags are properly defined, and the type aligns with similar log probability structures in the codebase.core/schemas/speech.go (1)
86-95: LGTM!The
BifrostSpeechandBifrostSpeechStreamResponsetypes follow the same embedding pattern used throughout this PR for streaming support. The structure allows for conditional streaming fields while maintaining non-streaming fields.core/schemas/chatcompletions.go (1)
389-402: LGTM!The
LogProbandContentLogProbtypes are well-structured and consistent with log probability representations used elsewhere in the codebase.core/schemas/transcriptions.go (1)
56-81: LGTM!The
BifrostTranscribestructure follows the consistent embedding pattern used throughout this PR. The separation of common fields, non-streaming fields, and streaming fields is clear and maintainable.core/providers/groq.go (1)
179-179: LGTM!The switch from
handleOpenAIStreamingtohandleOpenAIChatCompletionStreamingaligns with the PR's refactoring to separate chat completion streaming from Responses API streaming.core/providers/azure.go (1)
372-372: LGTM!Consistent with the broader refactoring to distinguish chat completion streaming from other streaming types.
core/providers/mistral.go (1)
133-133: LGTM!Consistent with the streaming handler refactoring across providers.
transports/bifrost-http/handlers/inference.go (3)
828-846: LGTM!The
requestTypeextraction logic correctly captures the request type from the first non-nil chunk, handling both success and error cases. The empty string default ensures the code falls back to standard SSE formatting when no chunks are received.
856-880: LGTM!The conditional SSE formatting correctly distinguishes between Responses API streams (with optional event lines) and standard chat completion streams. The logic properly extracts the event type and formats the output accordingly.
889-896: LGTM!The conditional
[DONE]marker emission correctly follows OpenAI's Responses API specification, which ends streams by closing the connection rather than sending a[DONE]signal. The explanatory comment is helpful.core/schemas/bifrost.go (1)
253-253: LGTM!The addition of
*ResponsesStreamResponseis consistent with the existing pattern of embedded pointer fields inBifrostResponse. The placement alongside*ResponsesResponseis logical, and the customMarshalJSONonBifrostStreamwill handle any potential conflicts between embedded fields.core/schemas/responses.go (1)
88-88: LGTM!The addition of the
Errorfield toResponsesResponseprovides proper error handling support for the Responses API. The field is appropriately optional (pointer type) and follows the established pattern for error fields in the codebase.core/providers/openai.go (2)
987-1008: LGTM! Proper SSE event line handling.The implementation correctly handles Server-Sent Events (SSE) format by:
- Skipping
event:lines in the initial check (line 988)- Filtering out event lines when parsing data (lines 1002-1008)
- Only processing actual
data:lines or raw JSONThis defensive approach ensures clean parsing of the Responses API streaming format.
1072-1168: LGTM! Clean implementation of embedding support.The new
Embeddingmethod andhandleOpenAIEmbeddingRequesthelper follow the established patterns in this file:
- Proper authorization via Bearer token
- Consistent error handling using
parseOpenAIError- Use of shared converter function
ToOpenAIEmbeddingRequest- Appropriate metadata in
ExtraFieldsThe implementation is well-structured and maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
core/schemas/transcriptions.go (1)
55-81: Document or enforce mutual exclusivity of embedded streaming fields.The embedding of both
*BifrostTranscribeNonStreamResponseand*BifrostTranscribeStreamResponseallows both to be non-nil simultaneously, which may not reflect the intended usage (one for streaming, one for non-streaming). Consider:
- Adding a comment clarifying that exactly one should be set based on context
- Adding a validation method to ensure mutual exclusivity at runtime
Example validation:
func (t *BifrostTranscribe) Validate() error { if t.BifrostTranscribeNonStreamResponse != nil && t.BifrostTranscribeStreamResponse != nil { return fmt.Errorf("only one of non-stream or stream response should be set") } return nil }transports/bifrost-http/handlers/inference.go (1)
840-846: Verify edge case: requestType remains empty if first chunk lacks RequestType.Similar to the logic in
transports/bifrost-http/integrations/utils.go, if the first chunk is an error withoutExtraFields.RequestTypeset,requestTyperemains empty and the code falls through to standard SSE format. Ensure this is the intended fallback.Additionally, note that this logic is duplicated between
handlers/inference.goandintegrations/utils.go. Consider extracting to a shared helper function to reduce duplication.core/schemas/chatcompletions.go (1)
351-363: Update comment to reflect all three embedded pointer types.The comment states "Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice should be non-nil at a time," but the struct actually has three embedded pointer types:
BifrostTextCompletionResponseChoice,BifrostNonStreamResponseChoice, andBifrostStreamResponseChoice.Apply this diff to clarify the comment:
// BifrostChatResponseChoice represents a choice in the completion result. // This struct can represent either a streaming or non-streaming response choice. -// IMPORTANT: Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice -// should be non-nil at a time. +// IMPORTANT: Only one of BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice, +// or BifrostStreamResponseChoice should be non-nil at a time. type BifrostChatResponseChoice struct {core/providers/openai.go (1)
573-617: Consider extracting duplicate error-handling logic.The error-handling blocks for
NewRequestWithContext(lines 573-590) andclient.Do(lines 601-617) are nearly identical, checking forcontext.Canceled,fasthttp.ErrTimeout, andcontext.DeadlineExceeded. While this duplication provides clarity at each call site, extracting it into a helper function would reduce repetition.Example helper function:
func handleRequestError(err error, providerName schemas.ModelProvider) *schemas.BifrostError { if errors.Is(err, context.Canceled) { return &schemas.BifrostError{ IsBifrostError: false, Error: &schemas.ErrorField{ Type: schemas.Ptr(schemas.RequestCancelled), Message: schemas.ErrRequestCancelled, Error: err, }, } } if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) } return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) }Then use it at both call sites:
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody)) if err != nil { return nil, handleRequestError(err, providerName) } // ... resp, err := client.Do(req) if err != nil { return nil, handleRequestError(err, providerName) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(7 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- core/schemas/providers/mistral/embedding.go
- plugins/telemetry/main.go
- core/schemas/providers/mistral/types.go
- transports/bifrost-http/handlers/server.go
🚧 Files skipped from review as they are similar to previous changes (10)
- core/providers/ollama.go
- core/providers/gemini.go
- core/providers/sgl.go
- core/schemas/providers/gemini/chat.go
- core/providers/vertex.go
- core/providers/parasail.go
- core/providers/mistral.go
- core/schemas/responses.go
- core/schemas/speech.go
- core/providers/azure.go
🧰 Additional context used
🧬 Code graph analysis (5)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/providers/openai.go (6)
core/schemas/provider.go (3)
PostHookRunner(191-191)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (12)
BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ModelProvider(52-52)ErrorField(426-433)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)core/utils.go (1)
Ptr(12-14)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (14)
core/providers/groq.go (1)
179-179: LGTM!The function rename to
handleOpenAIChatCompletionStreamingclearly distinguishes chat completion streaming from other streaming types, aligning with the PR's goal to separate streaming handlers.core/providers/openrouter.go (1)
132-132: LGTM!Consistent with other providers switching to
handleOpenAIChatCompletionStreaming.core/providers/cerebras.go (1)
139-139: LGTM!Consistent refactor aligning with the chat completion streaming handler separation.
transports/bifrost-http/integrations/utils.go (3)
456-471: Verify edge case: requestType remains empty if first chunk lacks RequestType.If the first chunk is an error without
ExtraFields.RequestTypeset,requestTyperemains empty and the code falls through to standard SSE format. Ensure this is the intended fallback behavior.Consider adding a log or default value if
requestTypeis still empty after the first chunk:if requestType == "" { if response.BifrostResponse != nil { requestType = response.BifrostResponse.ExtraFields.RequestType } else if response.BifrostError != nil { requestType = response.BifrostError.ExtraFields.RequestType } + if requestType == "" { + // Default to standard streaming format if type cannot be determined + log.Printf("Warning: Unable to determine request type from first chunk, defaulting to standard format") + } }
568-609: Consider skipping event line when eventType is empty.At lines 577-581, if
eventTypeis empty, the code writesevent: \nwhich is technically valid SSE but unusual. Consider skipping the event line entirely when the type is unavailable:-if eventType != "" { - if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil { - return // Network error, stop streaming - } -} +// Only write event line if we have a non-empty event type +if eventType != "" { + if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil { + return // Network error, stop streaming + } +}The current check already does this, so the logic appears correct. However, ensure that
ResponsesStreamResponse.Typeis always populated whenrequestType == schemas.ResponsesStreamRequest.
619-626: LGTM!The conditional [DONE] marker emission correctly omits it for the Responses API, aligning with OpenAI's API specification where the Responses API stream ends without a [DONE] sentinel.
core/schemas/textcompletions.go (1)
71-78: LGTM!The
TextCompletionLogProbtype provides a clean structure for log probability data, consistent with similar types elsewhere in the codebase (e.g.,LogProbin chat completions).transports/bifrost-http/handlers/inference.go (2)
856-880: LGTM!The streaming format branching correctly distinguishes between Responses API (event+data format) and other APIs (data-only format), with proper handling of empty event types.
889-895: LGTM!The conditional [DONE] marker emission correctly aligns with OpenAI's API specification, omitting it for the Responses API.
core/schemas/embedding.go (1)
109-120: LGTM!The marshaling logic correctly selects between string, float32 array, or 2D array based on which field is non-nil, with appropriate error handling when all fields are nil.
core/schemas/bifrost.go (1)
253-253: LGTM!The addition of the embedded
*ResponsesStreamResponsefield properly extendsBifrostResponseto support Responses API streaming, consistent with the PR's objective.core/providers/openai.go (3)
511-533: LGTM!The refactoring to use the dedicated
handleOpenAIChatCompletionStreaminghandler improves code organization by separating chat completion streaming from other streaming types (Responses, TextCompletion).
885-1070: LGTM!The Responses API streaming handler correctly:
- Skips
event:lines (lines 987-990, 1002-1008) which is specific to the Responses streaming format- Handles error responses by constructing and sending a
BifrostError(lines 1026-1043)- Sets the stream-end indicator for completion events (lines 1050-1055)
- Tracks latency per chunk and for the overall stream
The implementation properly differentiates Responses streaming from chat completion streaming.
1072-1168: LGTM!The embedding implementation follows the established patterns for OpenAI request handling:
- Proper operation allowance checking
- Centralized request body conversion via
ToOpenAIEmbeddingRequest- Consistent error handling and response parsing
- Appropriate metadata population in
ExtraFields
bb24cd9 to
b5451ac
Compare
fbe1158 to
7ceab9c
Compare
b5451ac to
a5f25b3
Compare
7ceab9c to
a55ad25
Compare
a5f25b3 to
1ba34d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
core/schemas/embedding.go (1)
122-144: Reset embedding fields before decoding.
UnmarshalJSONnever clearsEmbeddingStr,EmbeddingArray, orEmbedding2DArraybefore decoding. If the sameBifrostEmbeddingResponseinstance is reused (common when unmarshalling into preallocated structs), stale state survives: e.g., a prior 1-D decode leavesEmbeddingArraynon-nil, then decoding a string populatesEmbeddingStrbut the old array remains andMarshalJSONserializes the wrong shape. Nil out every variant up front so each decode starts from a clean slate.func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error { + be.EmbeddingStr = nil + be.EmbeddingArray = nil + be.Embedding2DArray = nil + // First, try to unmarshal as a direct stringcore/schemas/responses.go (3)
1635-1635: Fix event type string: should be "delta" not "added".Line 1635 defines
ResponsesStreamResponseTypeOutputTextAddedbut the string value should be"response.output_text.delta"to match the OpenAI Responses API streaming specification.Apply this diff:
- ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added" + ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta"
1641-1641: Fix event type string: should be "delta" not "added".Line 1641 defines
ResponsesStreamResponseTypeFunctionCallArgumentsAddedbut the string value should be"response.function_call_arguments.delta"to match the OpenAI Responses API streaming specification.Apply this diff:
- ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.added" + ResponsesStreamResponseTypeFunctionCallArgumentsAdded ResponsesStreamResponseType = "response.function_call_arguments.delta"
1676-1676: Fix event type string: missing dot before "annotation".Line 1676 defines
ResponsesStreamResponseTypeOutputTextAnnotationAddedbut the string value should be"response.output_text.annotation.added"(note the dot between "text" and "annotation").Apply this diff:
- ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added" + ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"core/providers/openai.go (1)
671-671: Fix TODO comment typo.Line 671 contains a typo: "WITHING" should be "WITHIN".
Apply this diff:
- //TODO: FIX THIS WITHING RESPONSE STRUCT + //TODO: FIX THIS WITHIN RESPONSE STRUCT
🧹 Nitpick comments (1)
core/schemas/chatcompletions.go (1)
351-363: Clarify the mutually exclusive choice comment.The struct now embeds three shapes (
BifrostTextCompletionResponseChoice,BifrostNonStreamResponseChoice,BifrostStreamResponseChoice), yet the comment still says only the last two are mutually exclusive. Please update the doc to spell out that exactly one of the three embeds should be populated and give a short note on when each variant is used. Keeps downstream implementers from assuming the text-completion pointer can coexist with the others.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(7 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- transports/bifrost-http/handlers/server.go
- core/schemas/providers/mistral/embedding.go
- plugins/telemetry/main.go
- core/schemas/providers/mistral/types.go
🚧 Files skipped from review as they are similar to previous changes (14)
- core/providers/groq.go
- core/providers/openrouter.go
- core/providers/ollama.go
- core/providers/sgl.go
- core/schemas/providers/gemini/chat.go
- core/providers/gemini.go
- core/providers/parasail.go
- core/providers/vertex.go
- core/schemas/textcompletions.go
- core/providers/azure.go
- core/providers/mistral.go
- core/schemas/bifrost.go
- core/schemas/transcriptions.go
- core/providers/cerebras.go
🧰 Additional context used
🧬 Code graph analysis (6)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/providers/openai.go (5)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (17)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (5)
core/schemas/responses.go (1)
88-88: LGTM! Error field addition is properly structured.The addition of the
Errorfield toResponsesResponsefollows Go conventions with a pointer type for optionality and proper JSON tagging. The field integrates well with the existing struct definition.core/providers/openai.go (4)
511-533: LGTM! Clean delegation to shared streaming logic.The
ChatCompletionStreammethod properly delegates to the sharedhandleOpenAIChatCompletionStreamingfunction, reducing code duplication. Authorization headers and configuration are correctly passed through.
769-862: LGTM! Responses method implementation is solid.The
Responsesmethod andhandleOpenAIResponsesRequestfollow the established pattern used for chat completions. Proper authorization, error handling, and metadata population are all in place.
864-1070: LGTM! Responses streaming implementation handles SSE format correctly.The
handleOpenAIResponsesStreamingfunction properly handles the Responses API's SSE format:
- Correctly skips
event:lines (lines 988, 1002-1007) since responses streaming uses typed events- Defensive nil-check for
ResponsesStreamResponse(line 1022)- Proper terminal event handling for both error (lines 1026-1043) and completion (lines 1050-1055) states
- Stream end context marker set appropriately for terminal events
1072-1168: LGTM! Embedding implementation follows established patterns.The
Embeddingmethod andhandleOpenAIEmbeddingRequestfollow the same solid patterns used throughout the provider:
- Proper operation validation
- Standard error handling and resource management
- Correct metadata population with
EmbeddingRequesttype- Consistent with other non-streaming methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/providers/vertex.go (1)
585-602: Fix the Authorization header regression.Line 596 currently hardcodes
Authorization: Bearereven whenkey.Valueis empty because the computedauthHeadermap is never used. That overwrites the service-account token that the OAuth client injects and turns every default-credential stream into a 401. Pass through the conditional header map instead so we only set the header when a direct key exists.- authHeader := map[string]string{} - if key.Value != "" { - authHeader["Authorization"] = "Bearer " + key.Value - } + var authHeader map[string]string + if key.Value != "" { + authHeader = map[string]string{"Authorization": "Bearer " + key.Value} + } // Use shared OpenAI streaming logic return handleOpenAIChatCompletionStreaming( ctx, client, url, request, - map[string]string{"Authorization": "Bearer " + key.Value}, + authHeader, provider.networkConfig.ExtraHeaders, provider.sendBackRawResponse, schemas.Vertex, postHookRunner, provider.logger, )
♻️ Duplicate comments (3)
core/schemas/chatcompletions.go (1)
351-363: Clarify which embedded pointers are mutually exclusive.The comment states "Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice should be non-nil," but
BifrostChatResponseChoiceembeds three pointers. The documentation should clarify which combinations are valid (e.g., only one of the three should be non-nil at any time, or specific pairings are allowed).core/schemas/responses.go (1)
1617-1684: Event type constants previously flagged for verification.A previous review identified potential issues with some streaming event type string constants (e.g.,
ResponsesStreamResponseTypeOutputTextAdded,ResponsesStreamResponseTypeFunctionCallArgumentsAdded,ResponsesStreamResponseTypeOutputTextAnnotationAdded) and requested verification against OpenAI's official Responses API documentation to ensure correctness.core/schemas/embedding.go (1)
122-145: Reset embedding variants before decoding.Reusing a pooled
BifrostEmbeddingResponsewill leave whichever variant was populated first, because Line 126 sets (say)EmbeddingStrbut later decode paths never clear it. On the next unmarshal, the string pointer stays non-nil, soMarshalJSONreturns the old string instead of the newly decoded vector. Clear every variant up front so each decode starts from a clean slate.func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error { + be.EmbeddingStr = nil + be.EmbeddingArray = nil + be.Embedding2DArray = nil + // First, try to unmarshal as a direct string var stringContent string if err := sonic.Unmarshal(data, &stringContent); err == nil { be.EmbeddingStr = &stringContent return nil
🧹 Nitpick comments (1)
core/providers/openai.go (1)
574-617: Consider extracting duplicate error handling into a helper function.The error handling logic for
context.Canceled,fasthttp.ErrTimeout, andcontext.DeadlineExceededis duplicated in two places (lines 576-590 afterNewRequestWithContextand lines 603-617 afterclient.Do). This duplication makes the code harder to maintain.Consider extracting this into a helper function:
func handleStreamRequestError(err error, providerName schemas.ModelProvider) *schemas.BifrostError { if errors.Is(err, context.Canceled) { return &schemas.BifrostError{ IsBifrostError: false, Error: &schemas.ErrorField{ Type: schemas.Ptr(schemas.RequestCancelled), Message: schemas.ErrRequestCancelled, Error: err, }, } } if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) } return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) }Then use it in both places:
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody)) if err != nil { - if errors.Is(err, context.Canceled) { - return nil, &schemas.BifrostError{ - IsBifrostError: false, - Error: &schemas.ErrorField{ - Type: schemas.Ptr(schemas.RequestCancelled), - Message: schemas.ErrRequestCancelled, - Error: err, - }, - } - } - if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { - return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) - } - return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) + return nil, handleStreamRequestError(err, providerName) } // ... later ... resp, err := client.Do(req) if err != nil { - if errors.Is(err, context.Canceled) { - return nil, &schemas.BifrostError{ - IsBifrostError: false, - Error: &schemas.ErrorField{ - Type: schemas.Ptr(schemas.RequestCancelled), - Message: schemas.ErrRequestCancelled, - Error: err, - }, - } - } - if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { - return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName) - } - return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName) + return nil, handleStreamRequestError(err, providerName) }This same pattern appears in other streaming functions in this file (
handleOpenAIResponsesStreaming,SpeechStream,TranscriptionStream), so the helper would reduce duplication across all of them.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(7 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- core/schemas/providers/mistral/embedding.go
- transports/bifrost-http/handlers/server.go
- plugins/telemetry/main.go
- core/schemas/providers/mistral/types.go
🚧 Files skipped from review as they are similar to previous changes (7)
- core/providers/sgl.go
- core/providers/groq.go
- core/providers/cerebras.go
- core/providers/mistral.go
- core/providers/azure.go
- core/providers/gemini.go
- core/schemas/providers/gemini/chat.go
🧰 Additional context used
🧬 Code graph analysis (7)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/providers/openai.go (5)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (14)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
- GitHub Check: Graphite / mergeability_check
🔇 Additional comments (12)
core/schemas/speech.go (2)
16-18: LGTM: Formatting improvements.The whitespace alignment changes improve readability without altering functionality.
86-95: LGTM: New streaming speech types.The new
BifrostSpeechandBifrostSpeechStreamResponsetypes follow the established pattern of embedding pointers for streaming variants. The structure aligns with similar types introduced in this PR (e.g.,BifrostTranscribe*,BifrostChatResponseChoice).core/schemas/textcompletions.go (1)
72-78: LGTM: Log probability support for text completions.The new
TextCompletionLogProbtype provides token-level log probability information for text completions, complementing the similar log-probability structures added incore/schemas/chatcompletions.go(LogProb,ContentLogProb).core/schemas/transcriptions.go (1)
56-81: LGTM: Transcription response types.The new
BifrostTranscribe*types follow the established pattern of embedding variant-specific pointers to distinguish streaming from non-streaming fields. The structure is consistent with other Bifrost response types introduced in this PR.core/schemas/chatcompletions.go (1)
365-402: LGTM: Chat completion streaming types.The new types (
BifrostTextCompletionResponseChoice,BifrostNonStreamResponseChoice,BifrostStreamResponseChoice,BifrostStreamDelta,LogProb,ContentLogProb) provide comprehensive support for both streaming and non-streaming chat completion responses with token-level log probabilities.core/schemas/responses.go (2)
88-88: LGTM: Error field addition.The new
Errorfield enables structured error reporting in responses, following the standard pattern of optional pointer fields (omitempty).
1686-1720: LGTM: Streaming response structures.The
ResponsesStreamResponseandResponsesStreamResponseStructtypes provide a comprehensive structure for handling various streaming event payloads. The use of optional pointer fields allows different event types to carry their specific data.core/providers/parasail.go (1)
121-121: LGTM: Streaming handler refactor.The switch from
handleOpenAIStreamingtohandleOpenAIChatCompletionStreamingimproves code clarity by distinguishing chat completion streaming from other streaming types (e.g., Responses API streaming). This change is consistent with the same refactor applied across other providers in this PR.core/providers/openrouter.go (1)
132-132: LGTM: Streaming handler refactor.The switch to
handleOpenAIChatCompletionStreamingaligns with the streaming handler refactor applied across all providers in this PR, improving code organization by separating chat completion streaming from other streaming paths.core/providers/ollama.go (1)
152-152: LGTM: Streaming handler refactor.The switch to
handleOpenAIChatCompletionStreamingcompletes the consistent refactor across all providers, distinguishing chat completion streaming from other streaming types (e.g., Responses API streaming).core/providers/openai.go (2)
769-862: LGTM! Clean implementation of non-streaming Responses API.The implementation follows the same proven pattern as
ChatCompletion, with appropriate error handling, latency tracking, and metadata population. The use of the centralizedToOpenAIResponsesRequestconverter maintains consistency.
1072-1168: LGTM! Clean implementation of Embedding API.The embedding request handler follows the established pattern used for chat completions and responses. The implementation is straightforward with appropriate error handling and metadata tracking.
1ba34d3 to
791aa46
Compare
791aa46 to
fe69930
Compare
fe69930 to
ea5a3e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(7 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (4)
- core/schemas/providers/mistral/embedding.go
- transports/bifrost-http/handlers/server.go
- plugins/telemetry/main.go
- core/schemas/providers/mistral/types.go
🚧 Files skipped from review as they are similar to previous changes (11)
- core/providers/groq.go
- core/providers/cerebras.go
- core/providers/azure.go
- core/schemas/speech.go
- core/schemas/textcompletions.go
- core/providers/vertex.go
- core/providers/gemini.go
- core/schemas/transcriptions.go
- core/providers/sgl.go
- core/schemas/providers/gemini/chat.go
- core/schemas/chatcompletions.go
🧰 Additional context used
🧬 Code graph analysis (5)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1686-1715)
core/providers/openai.go (4)
core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (17)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1686-1715)ResponsesStreamResponseTypeError(1683-1683)ResponsesStreamResponseTypeCompleted(1625-1625)
🔇 Additional comments (13)
core/schemas/embedding.go (1)
97-145: LGTM! Type definitions and marshaling logic are correct.The custom marshaling and unmarshaling logic properly handles the three embedding format variants (string, float32 array, 2D float32 array) and provides clear error messages when none match.
core/providers/parasail.go (1)
121-121: LGTM! Streaming handler update aligns with PR refactoring objectives.The switch to
handleOpenAIChatCompletionStreamingis consistent with the broader refactor to separate chat completion streaming from Responses API streaming across all providers.core/providers/openrouter.go (1)
132-132: LGTM! Consistent streaming handler refactor.The change mirrors the pattern applied across other providers in this PR, maintaining argument compatibility while routing through the chat-specific streaming handler.
core/providers/ollama.go (1)
152-152: LGTM! Handler rename aligns with project-wide refactor.core/providers/mistral.go (1)
133-133: LGTM! Streaming handler update matches PR scope.transports/bifrost-http/handlers/inference.go (4)
828-829: LGTM! Request type detection enables format-specific streaming.The local
requestTypevariable properly tracks the API type from the first chunk, enabling the correct SSE format selection downstream.
840-846: LGTM! Safe extraction of request type with defensive nil checks.The code correctly checks both
BifrostResponseandBifrostErrorfor the request type, ensuring the value is captured from whichever appears first in the stream.
856-880: LGTM! Proper SSE format selection based on API type.The branching logic correctly:
- Emits
event:lines for Responses API streaming (OpenAI Responses format)- Uses standard
data:only format for other APIs (chat completions, etc.)- Safely handles nil
ResponsesStreamResponsewith the check at line 859
889-895: LGTM! Correct end-of-stream signaling per API spec.The conditional
[DONE]emission aligns with the OpenAI API specification:
- Chat completions and other APIs: send
[DONE]marker- Responses API: no
[DONE]marker (stream close indicates end)transports/bifrost-http/integrations/utils.go (3)
456-471: LGTM! Request type detection matches the pattern in handlers/inference.go.The implementation correctly initializes the
requestTypevariable and extracts it from the first streaming chunk with proper nil checks.
569-609: LGTM! Streaming format logic correctly distinguishes API types.The branching between Responses API format (with
event:lines) and standard SSE format (data-only) is implemented correctly:
- Line 572: Safe nil check for
ResponsesStreamResponsebefore accessingType- Lines 577-581: Event line only written when
eventTypeis non-empty- Lines 595-608: Standard SSE format for other request types
619-625: LGTM! End-of-stream marker correctly omitted for Responses API.The conditional
[DONE]logic matches the OpenAI specification and is consistent with the implementation inhandlers/inference.go.core/schemas/bifrost.go (1)
252-253: LGTM! Embedded streaming response field complements existing response structure.The
*ResponsesStreamResponseembedded field is correctly positioned alongside*ResponsesResponse. These fields are mutually exclusive in practice (one for non-streaming, one for streaming responses), so no JSON marshaling conflicts are expected.
ea5a3e4 to
3f77d16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (6)
core/schemas/embedding.go (1)
122-144: Reset embedding fields before decoding to avoid stale data
UnmarshalJSONnever clears the other shape fields before decoding. If the sameBifrostEmbeddingResponseinstance is reused (common when unmarshalling successive SSE chunks), decoding a later payload with a different shape leaves the previously populated fields intact. Example: decode a base64 string first (setsEmbeddingStr), then decode a float array;EmbeddingStrremains non-nil, soMarshalJSON(or any caller that checks the fields) will still see the old string instead of the new array. Clear every variant up front so each branch assigns a clean shape.func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error { + // Reset all shapes so repeated unmarshals don't leak previous values + be.EmbeddingStr = nil + be.EmbeddingArray = nil + be.Embedding2DArray = nil + // First, try to unmarshal as a direct string var stringContent string if err := sonic.Unmarshal(data, &stringContent); err == nil {core/providers/vertex.go (3)
262-264: Remove incorrectfasthttp.ErrTimeoutcheck.Same issue as Line 225:
fasthttp.ErrTimeoutwill never match when usingnet/http.Apply this diff:
- if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }
417-419: Remove incorrectfasthttp.ErrTimeoutcheck.Same issue as Lines 225 and 262:
fasthttp.ErrTimeoutwill never match when usingnet/http.Apply this diff:
- if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }
448-450: Remove incorrectfasthttp.ErrTimeoutcheck.Same issue as Lines 225, 262, and 417:
fasthttp.ErrTimeoutwill never match when usingnet/http.Apply this diff:
- if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }core/schemas/responses.go (1)
1677-1677: Verify annotation event type string format.Line 1677 defines
ResponsesStreamResponseTypeOutputTextAnnotationAddedwith an underscore:ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added"Based on past review comments and the OpenAI Responses API SSE format, this should use a dot separator for consistency with other hierarchical event types (e.g.,
response.output_text.delta). The correct format appears to be:"response.output_text.annotation.added"Please verify whether the current underscore format (
output_text_annotation) matches the actual SSE events emitted by the OpenAI Responses API, or if it should use dot notation (output_text.annotation) for proper event matching.core/providers/openai.go (1)
671-671: Fix TODO comment typo.The TODO comment contains a typo: "WITHING" should be "WITHIN". This issue was supposedly addressed in a previous commit (cc59aff) according to past review comments, but the typo remains in the code.
Apply this diff to fix the typo:
- //TODO: FIX THIS WITHING RESPONSE STRUCT + //TODO: FIX THIS WITHIN RESPONSE STRUCTAdditionally, consider clarifying what needs to be fixed in the response struct or creating a tracked issue if this represents ongoing work.
🧹 Nitpick comments (5)
tests/core-providers/scenarios/responses_stream.go (3)
17-21: Consider adding a dedicated ResponsesStream flag for clarity.The test uses the generic
CompletionStreamflag (line 18) to gate Responses API streaming tests. While this works, it would be clearer to have a dedicatedResponsesStreamflag inTestScenarios, similar to howTextCompletionStreamwas separated fromCompletionStream. This would allow finer-grained control over which streaming tests run for each provider.If you agree, consider adding a
ResponsesStream boolfield toTestScenariosintests/core-providers/config/account.goand updating the test configurations accordingly.
279-283: Consider using retry wrapper for consistency.Unlike the main test (line 61), this tool calling test doesn't use the
WithStreamRetrywrapper. For consistency and to handle transient provider issues, consider wrapping this call as well:Apply this diff to add retry wrapper:
- responseChannel, err := client.ResponsesStreamRequest(ctx, request) - RequireNoError(t, err, "Responses stream with tools failed") - if responseChannel == nil { - t.Fatal("Response channel should not be nil") - } + retryConfig := StreamingRetryConfig() + retryContext := TestRetryContext{ + ScenarioName: "ResponsesStreamWithTools", + ExpectedBehavior: map[string]interface{}{ + "should_call_tool": true, + }, + TestMetadata: map[string]interface{}{ + "provider": testConfig.Provider, + "model": testConfig.ChatModel, + }, + } + + responseChannel, err := WithStreamRetry(t, retryConfig, retryContext, func() (chan *schemas.BifrostStream, *schemas.BifrostError) { + return client.ResponsesStreamRequest(ctx, request) + }) + RequireNoError(t, err, "Responses stream with tools failed") + if responseChannel == nil { + t.Fatal("Response channel should not be nil") + }
388-392: Consider using retry wrapper for consistency.Similar to the tool calling test, this reasoning test also lacks retry wrapping. Consider adding
WithStreamRetryhere for consistent error handling across all test scenarios.core/schemas/bifrost.go (1)
237-254: Document the union constraint for ResponsesResponse and ResponsesStreamResponse.The
BifrostResponsestruct now embeds both*ResponsesResponseand*ResponsesStreamResponse, following a union-type pattern similar to other structs in the codebase. However, the struct lacks documentation clarifying that only one of these embedded pointers should be non-nil at a time.Add a comment above the struct or in the struct documentation to clarify the usage:
// BifrostResponse represents the complete result from any bifrost request. +// +// For Responses API requests: Only one of ResponsesResponse (non-stream) or +// ResponsesStreamResponse (stream) should be non-nil at a time. type BifrostResponse struct { ID string `json:"id,omitempty"`tests/core-providers/scenarios/text_completion_stream.go (1)
146-148: Consider documenting or extracting magic numbers for streaming chunk limits.The test uses different maximum chunk counts across sub-tests (500, 100, 150) without clear documentation of why these specific values were chosen.
Consider either:
- Extracting these as named constants with explanatory comments:
const ( maxChunksMainTest = 500 // High limit for full story generation maxChunksVariedPrompts = 100 // Lower limit for short prompt completions maxChunksParameterTest = 150 // Mid-range for parameter variations )
- Or adding inline comments explaining the rationale:
if responseCount > 500 { // Safety limit: story generation should complete well under 500 chunks t.Fatal("Received too many streaming chunks, something might be wrong") }This improves test maintainability and makes the test's assumptions explicit.
Also applies to: 279-281, 381-383
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (46)
core/providers/azure.go(1 hunks)core/providers/cerebras.go(1 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(1 hunks)core/providers/mistral.go(1 hunks)core/providers/ollama.go(1 hunks)core/providers/openai.go(7 hunks)core/providers/openrouter.go(1 hunks)core/providers/parasail.go(1 hunks)core/providers/sgl.go(1 hunks)core/providers/vertex.go(5 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)tests/core-providers/anthropic_test.go(1 hunks)tests/core-providers/azure_test.go(1 hunks)tests/core-providers/bedrock_test.go(1 hunks)tests/core-providers/cerebras_test.go(1 hunks)tests/core-providers/cohere_test.go(1 hunks)tests/core-providers/config/account.go(12 hunks)tests/core-providers/custom_test.go(1 hunks)tests/core-providers/gemini_test.go(1 hunks)tests/core-providers/groq_test.go(1 hunks)tests/core-providers/mistral_test.go(1 hunks)tests/core-providers/ollama_test.go(1 hunks)tests/core-providers/openai_test.go(1 hunks)tests/core-providers/openrouter_test.go(1 hunks)tests/core-providers/parasail_test.go(1 hunks)tests/core-providers/scenarios/chat_completion_stream.go(1 hunks)tests/core-providers/scenarios/responses_stream.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(0 hunks)tests/core-providers/sgl_test.go(1 hunks)tests/core-providers/tests.go(2 hunks)tests/core-providers/vertex_test.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (5)
- plugins/telemetry/main.go
- tests/core-providers/scenarios/utils.go
- transports/bifrost-http/handlers/server.go
- core/schemas/providers/mistral/embedding.go
- core/schemas/providers/mistral/types.go
🚧 Files skipped from review as they are similar to previous changes (9)
- core/providers/cerebras.go
- core/providers/sgl.go
- transports/bifrost-http/handlers/inference.go
- core/providers/parasail.go
- core/providers/openrouter.go
- core/schemas/providers/gemini/chat.go
- core/providers/gemini.go
- core/providers/groq.go
- core/providers/mistral.go
🧰 Additional context used
🧬 Code graph analysis (10)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
tests/core-providers/groq_test.go (1)
tests/core-providers/config/account.go (1)
TestScenarios(20-40)
tests/core-providers/scenarios/responses_stream.go (3)
tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/responses.go (23)
ResponsesMessage(280-292)ResponsesInputMessageRoleUser(567-567)ResponsesMessageContent(573-576)ResponsesParameters(33-60)ResponsesStreamResponseType(1620-1620)ResponsesStreamResponse(1687-1716)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesStreamResponseTypeContentPartAdded(1632-1632)ResponsesStreamResponseTypeCreated(1623-1623)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeError(1684-1684)ResponsesTool(1341-1356)ResponsesToolFunction(1359-1362)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesParametersReasoning(210-214)ResponsesStreamResponseTypeReasoningSummaryPartAdded(1652-1652)ResponsesStreamResponseTypeReasoningSummaryTextDelta(1654-1654)ResponsesMessageTypeReasoning(273-273)ResponsesResponse(85-114)ResponsesMessageTypeMessage(256-256)ResponsesInputMessageRoleAssistant(566-566)core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostStream(383-386)BifrostError(409-418)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)SGL(64-64)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1687-1716)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1687-1716)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
tests/core-providers/scenarios/text_completion_stream.go (4)
tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/textcompletions.go (2)
TextCompletionInput(9-12)TextCompletionParameters(50-70)core/schemas/bifrost.go (9)
BifrostTextCompletionRequest(134-140)BifrostStream(383-386)BifrostError(409-418)TextCompletionStreamRequest(103-103)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)RequestType(99-99)TextCompletionRequest(102-102)core/schemas/chatcompletions.go (3)
BifrostStreamResponseChoice(376-378)BifrostTextCompletionResponseChoice(365-367)BifrostChatResponseChoice(355-363)
core/providers/openai.go (4)
core/schemas/provider.go (6)
PostHookRunner(191-191)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/bifrost.go (14)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1687-1716)ResponsesStreamResponseTypeError(1684-1684)ResponsesStreamResponseTypeCompleted(1625-1625)
tests/core-providers/tests.go (3)
tests/core-providers/scenarios/text_completion_stream.go (1)
RunTextCompletionStreamTest(16-399)tests/core-providers/scenarios/chat_completion_stream.go (1)
RunChatCompletionStreamTest(16-298)tests/core-providers/scenarios/responses_stream.go (1)
RunResponsesStreamTest(17-467)
🔇 Additional comments (22)
tests/core-providers/custom_test.go (1)
31-31: CompletionStream rename is consistent
Verified thatTestScenariosdefinesCompletionStreamand all test files reference it.tests/core-providers/gemini_test.go (1)
35-35: LGTM: Test configuration updated for streaming refactor.The field rename from
ChatCompletionStreamtoCompletionStreamaligns with the PR's refactoring to separate chat completion streaming from Responses API streaming. The change maintains the same test coverage.tests/core-providers/sgl_test.go (1)
28-28: LGTM: Test configuration updated for streaming refactor.Consistent with the PR-wide refactoring to rename streaming flags for clarity.
tests/core-providers/bedrock_test.go (1)
33-33: LGTM: Test configuration updated for streaming refactor.The field rename maintains test coverage while improving naming clarity.
core/providers/azure.go (1)
372-372: LGTM: Provider updated to use chat-specific streaming handler.The function call change aligns with the PR's refactoring to separate chat completion streaming from Responses API streaming. The call site preserves all arguments and behavior.
tests/core-providers/azure_test.go (1)
28-28: LGTM: Test configuration updated for streaming refactor.Consistent field rename maintaining test coverage.
tests/core-providers/scenarios/chat_completion_stream.go (1)
17-17: LGTM: Test scenario gating updated for config refactor.The condition now references
CompletionStreaminstead ofChatCompletionStream, aligning with the updated test configuration structure. The test logic remains unchanged.core/providers/ollama.go (1)
152-152: LGTM: Verified handler signature and call arguments align.tests/core-providers/tests.go (2)
29-32: LGTM! New streaming test scenarios properly wired.The addition of
RunTextCompletionStreamTestandRunResponsesStreamTestcorrectly extends the test suite to cover the new streaming capabilities introduced in this PR.
71-71: LGTM! Field rename aligns with broader refactoring.The rename from "ChatCompletionStream" to "CompletionStream" provides clearer semantics and is consistently applied across all test configurations.
tests/core-providers/vertex_test.go (1)
28-28: LGTM! Field rename is consistent.The update from
ChatCompletionStreamtoCompletionStreamaligns with the broader test configuration refactoring.tests/core-providers/ollama_test.go (1)
27-27: LGTM! Field rename is consistent.The update from
ChatCompletionStreamtoCompletionStreamaligns with the broader test configuration refactoring.core/providers/vertex.go (1)
591-603: LGTM! Function rename aligns with streaming refactoring.The rename from
handleOpenAIStreamingtohandleOpenAIChatCompletionStreamingproperly reflects the chat-specific streaming behavior and aligns with the PR's goal of separating chat completion streaming from Responses API streaming.tests/core-providers/anthropic_test.go (1)
28-28: LGTM! Field rename is consistent.The update from
ChatCompletionStreamtoCompletionStreamaligns with the broader test configuration refactoring.tests/core-providers/cohere_test.go (1)
28-28: LGTM! Field rename is consistent.The update from
ChatCompletionStreamtoCompletionStreamaligns with the broader test configuration refactoring.tests/core-providers/groq_test.go (2)
22-26: LGTM! Text completion enabled via chat conversion.Enabling text completion and text completion streaming for Groq by leveraging chat completion conversion is a reasonable approach. The inline comments clearly document this implementation strategy.
28-28: LGTM! Field rename is consistent.The update from
ChatCompletionStreamtoCompletionStreamaligns with the broader test configuration refactoring.tests/core-providers/mistral_test.go (1)
28-28: LGTM! Field rename is consistent.The update from
ChatCompletionStreamtoCompletionStreamaligns with the broader test configuration refactoring.tests/core-providers/scenarios/responses_stream.go (1)
541-544: LGTM - StreamingValidationResult type.The
StreamingValidationResulttype provides a clean way to collect and report validation issues without failing tests immediately. This approach is appropriate for streaming tests where some provider variations are expected.transports/bifrost-http/integrations/utils.go (1)
568-609: Add nil check for ResponsesStreamResponse to prevent potential panic.When
requestType == schemas.ResponsesStreamRequest, the code assumesresponse.BifrostResponse.ResponsesStreamResponseis non-nil (line 572), but there's no explicit nil check. While unlikely, if a response hasrequestTypeset toResponsesStreamRequestbutResponsesStreamResponseis nil, line 572 will panic when accessing.Type.Apply this diff to add a defensive nil check:
} else { // Handle different streaming formats based on request type if requestType == schemas.ResponsesStreamRequest { + // Verify ResponsesStreamResponse is present + if response.BifrostResponse.ResponsesStreamResponse == nil { + log.Printf("Warning: ResponsesStreamResponse is nil for ResponsesStreamRequest") + continue + } + // OPENAI RESPONSES FORMAT: Use event: and data: lines for OpenAI responses API compatibility eventType := "" - if response.BifrostResponse.ResponsesStreamResponse != nil { - eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type) - } + eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type) // Send event line if available if eventType != "" {Likely an incorrect or invalid review comment.
core/providers/openai.go (2)
987-1008: LGTM: Proper event-line filtering for Responses API streaming.The event-line handling correctly:
- Skips SSE event metadata lines (lines starting with
event:) at lines 988 and 1002- Extracts only the
data:payload for JSON parsing- Prevents event-type strings from being parsed as JSON data
This ensures clean SSE parsing for the Responses API streaming format.
1026-1043: LGTM: Error event handling follows the correct pattern.The error handling for
ResponsesStreamResponseTypeError:
- Properly constructs a
BifrostErrorfrom the error event fields- Sets the stream-end indicator in the context (line 1040)
- Routes through
processAndSendBifrostErrorfor consistent error propagationThis ensures that Responses API streaming errors are handled consistently with the rest of the streaming infrastructure.
bf5a247 to
43efca8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
transports/bifrost-http/integrations/utils.go (2)
629-652: Honor custom SSE strings in initial stream error too (parity with in-loop errors).sendStreamError always JSON-wraps the error and ignores preformatted SSE strings from StreamErrorConverter. Mirror handleStreaming’s behavior: if converter returns string, write it verbatim; else marshal and send as data:.
func (g *GenericRouter) sendStreamError(ctx *fasthttp.RequestCtx, config RouteConfig, bifrostErr *schemas.BifrostError) { var errorResponse interface{} @@ - errorJSON, err := json.Marshal(map[string]interface{}{ - "error": errorResponse, - }) - if err != nil { - log.Printf("Failed to marshal error for SSE: %v", err) - ctx.SetStatusCode(fasthttp.StatusInternalServerError) - return - } - - if _, err := fmt.Fprintf(ctx, "data: %s\n\n", errorJSON); err != nil { - log.Printf("Failed to write SSE error: %v", err) - } + // If converter returned a raw SSE string, send as-is + if sseString, ok := errorResponse.(string); ok { + if _, err := fmt.Fprint(ctx, sseString); err != nil { + log.Printf("Failed to write SSE error: %v", err) + } + return + } + // Default: wrap as data: {json} + errorJSON, err := json.Marshal(errorResponse) + if err != nil { + log.Printf("Failed to marshal error for SSE: %v", err) + ctx.SetStatusCode(fasthttp.StatusInternalServerError) + return + } + if _, err := fmt.Fprintf(ctx, "data: %s\n\n", errorJSON); err != nil { + log.Printf("Failed to write SSE error: %v", err) + } }
792-812: Bug: reflection looks for unexported field name "fallbacks".Go struct fields are exported as "Fallbacks". FieldByName("fallbacks") will always miss, dropping fallbacks. Search for "Fallbacks" or read by JSON tag.
- // Look for the "fallbacks" field - fallbacksField := reqValue.FieldByName("fallbacks") + // Look for the exported "Fallbacks" field + fallbacksField := reqValue.FieldByName("Fallbacks") if !fallbacksField.IsValid() { - return nil, nil // No fallbacks field found + return nil, nil // No fallbacks field found }Optional: iterate struct fields and match
json:"fallbacks"tag for maximum robustness.core/providers/openai.go (2)
278-286: Increase Scanner buffer to avoid token-too-long on SSE linesDefault bufio.Scanner buffer (64K) can fail on large SSE chunks. Set a larger max buffer.
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + // Increase max token size for large SSE payloads + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
1331-1336: Speech/Transcription streams: enlarge Scanner buffersPrevent scan failures on large audio/transcription SSE chunks.
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)Also applies to: 1610-1616
♻️ Duplicate comments (2)
core/schemas/embedding.go (1)
122-144: Clear previous embedding shape before decoding
UnmarshalJSONnever resetsEmbeddingStr,EmbeddingArray, orEmbedding2DArray. If this struct is reused (common with pooling or external consumers), decoding a new payload after a prior shape leaves the old field populated. BecauseMarshalJSONprefers the string branch, a prior string decode will still be emitted even after successfully decoding an array. Please nil out every shape up front to avoid stale state.func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error { + be.EmbeddingStr = nil + be.EmbeddingArray = nil + be.Embedding2DArray = nil + // First, try to unmarshal as a direct string var stringContent stringcore/providers/openai.go (1)
671-671: Fix TODO typo (already noted previously)WITHING → WITHIN.
- //TODO: FIX THIS WITHING RESPONSE STRUCT + //TODO: FIX THIS WITHIN RESPONSE STRUCT
🧹 Nitpick comments (6)
transports/bifrost-http/integrations/utils.go (2)
619-625: Prevent stray [DONE] on Responses streams.If ExtraFields.RequestType isn’t set, you may emit [DONE] for Responses streams, which is spec-incompatible. The detection fix above mitigates; ensure post-hooks set RequestType or keep the heuristic.
Would you also confirm all Responses streaming chunks have ExtraFields.RequestType=responses_stream? See verification script below.
368-373: SSE proxying hardening (optional): disable proxy buffering.Consider adding X-Accel-Buffering: no to improve SSE behavior behind nginx.
ctx.Response.Header.Set("Cache-Control", "no-cache") ctx.Response.Header.Set("Connection", "keep-alive") ctx.Response.Header.Set("Access-Control-Allow-Origin", "*") + ctx.Response.Header.Set("X-Accel-Buffering", "no")core/providers/ollama.go (1)
152-160: ResponsesStream delegation: verify RequestType propagation.Same concern as others: ensure post-hook or conversion sets ExtraFields.RequestType=responses_stream so HTTP layer emits event: lines and omits [DONE].
core/providers/mistral.go (1)
130-137: ResponsesStream delegation: verify RequestType propagation.Ensure combined post-hook sets ExtraFields.RequestType=responses_stream; otherwise SSE formatting will be wrong.
tests/core-providers/scenarios/text_completion_stream.go (1)
413-417: Use stream request type in consolidated responseThe consolidated response represents a streamed scenario. Set RequestType to TextCompletionStreamRequest for accuracy.
- ExtraFields: schemas.BifrostResponseExtraFields{ - Provider: provider, - RequestType: schemas.TextCompletionRequest, - }, + ExtraFields: schemas.BifrostResponseExtraFields{ + Provider: provider, + RequestType: schemas.TextCompletionStreamRequest, + },core/providers/openai.go (1)
1057-1061: Responses streaming: include RawResponse when enabledParity with chat/text streaming; attach raw JSON to aid debugging/telemetry.
- processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger) + if sendBackRawResponse { + response.ExtraFields.RawResponse = jsonData + } + processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (48)
core/providers/azure.go(2 hunks)core/providers/cerebras.go(2 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(2 hunks)core/providers/mistral.go(2 hunks)core/providers/ollama.go(2 hunks)core/providers/openai.go(9 hunks)core/providers/openrouter.go(2 hunks)core/providers/parasail.go(2 hunks)core/providers/sgl.go(2 hunks)core/providers/utils.go(1 hunks)core/providers/vertex.go(6 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/mux.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)tests/core-providers/anthropic_test.go(1 hunks)tests/core-providers/azure_test.go(1 hunks)tests/core-providers/bedrock_test.go(1 hunks)tests/core-providers/cerebras_test.go(1 hunks)tests/core-providers/cohere_test.go(1 hunks)tests/core-providers/config/account.go(12 hunks)tests/core-providers/custom_test.go(1 hunks)tests/core-providers/gemini_test.go(1 hunks)tests/core-providers/groq_test.go(1 hunks)tests/core-providers/mistral_test.go(1 hunks)tests/core-providers/ollama_test.go(1 hunks)tests/core-providers/openai_test.go(1 hunks)tests/core-providers/openrouter_test.go(1 hunks)tests/core-providers/parasail_test.go(1 hunks)tests/core-providers/scenarios/chat_completion_stream.go(1 hunks)tests/core-providers/scenarios/responses_stream.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(0 hunks)tests/core-providers/sgl_test.go(1 hunks)tests/core-providers/tests.go(2 hunks)tests/core-providers/vertex_test.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (5)
- tests/core-providers/scenarios/utils.go
- core/schemas/providers/mistral/embedding.go
- core/schemas/providers/mistral/types.go
- transports/bifrost-http/handlers/server.go
- plugins/telemetry/main.go
🚧 Files skipped from review as they are similar to previous changes (18)
- core/providers/gemini.go
- tests/core-providers/azure_test.go
- tests/core-providers/groq_test.go
- tests/core-providers/anthropic_test.go
- tests/core-providers/parasail_test.go
- tests/core-providers/mistral_test.go
- core/schemas/speech.go
- core/schemas/chatcompletions.go
- core/schemas/providers/gemini/chat.go
- core/providers/sgl.go
- core/schemas/textcompletions.go
- tests/core-providers/bedrock_test.go
- tests/core-providers/gemini_test.go
- tests/core-providers/cohere_test.go
- core/providers/vertex.go
- tests/core-providers/scenarios/responses_stream.go
- tests/core-providers/tests.go
- tests/core-providers/scenarios/chat_completion_stream.go
🧰 Additional context used
🧬 Code graph analysis (15)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/providers/groq.go (2)
core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/providers/utils.go (2)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/bifrost.go (4)
BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesStreamRequest(107-107)
core/providers/ollama.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Ollama(62-62)BifrostResponsesRequest(197-203)
tests/core-providers/scenarios/text_completion_stream.go (4)
tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/textcompletions.go (2)
TextCompletionInput(9-12)TextCompletionParameters(50-70)core/schemas/bifrost.go (9)
BifrostTextCompletionRequest(134-140)BifrostStream(383-386)BifrostError(409-418)TextCompletionStreamRequest(103-103)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)RequestType(99-99)TextCompletionRequest(102-102)core/schemas/chatcompletions.go (3)
BifrostStreamResponseChoice(376-378)BifrostTextCompletionResponseChoice(365-367)BifrostChatResponseChoice(355-363)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/schemas/mux.go (3)
core/schemas/bifrost.go (1)
BifrostResponse(238-254)core/schemas/responses.go (16)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesMessage(280-292)ResponsesMessageTypeMessage(256-256)ResponsesInputMessageRoleAssistant(566-566)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeRefusalDelta(1639-1639)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesToolMessage(692-715)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeIncomplete(1627-1627)ResponsesStreamResponseTypeOutputItemDone(1630-1630)ResponsesStreamResponseStruct(1719-1722)ResponsesResponseUsage(236-239)ResponsesExtendedResponseUsage(229-234)core/schemas/chatcompletions.go (1)
BifrostStreamResponseChoice(376-378)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/providers/azure.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (9)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)Azure(56-56)EmbeddingRequest(108-108)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/cerebras.go (2)
core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/parasail.go (3)
core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)transports/bifrost-http/handlers/inference.go (1)
ResponsesRequest(195-199)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/openrouter.go (2)
core/schemas/bifrost.go (4)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)BifrostStream(383-386)core/schemas/provider.go (1)
PostHookRunner(191-191)
core/providers/mistral.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Mistral(61-61)BifrostResponsesRequest(197-203)
core/providers/openai.go (4)
core/schemas/bifrost.go (16)
TextCompletionStreamRequest(103-103)BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)ErrorField(426-433)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)EmbeddingRequest(108-108)core/schemas/provider.go (3)
PostHookRunner(191-191)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
🔇 Additional comments (19)
tests/core-providers/config/account.go (1)
22-24: LGTM: Field rename executed consistently.The renaming of
ChatCompletionStreamtoCompletionStreamand addition ofTextCompletionStreamproperly separates text completion streaming from chat completion streaming. This change is applied consistently across all provider configurations in the file.core/schemas/transcriptions.go (1)
56-81: LGTM: Well-structured transcription response types.The new
BifrostTranscribetype with embedded pointers to streaming and non-streaming variants follows the established pattern in the codebase. This design allows a single type to flexibly represent both response formats.core/schemas/bifrost.go (1)
253-253: LGTM: Responses streaming support added.The embedded
*ResponsesStreamResponsepointer enables streaming support for the Responses API while maintaining backward compatibility through nil-safe access.core/schemas/responses.go (2)
88-88: LGTM: Error field added to response.The addition of the
Errorfield enables proper error reporting in responses while maintaining backward compatibility through the pointer type.
1617-1722: LGTM: Comprehensive streaming support added.The streaming event types and response structures are now properly defined. The event type names appear to have been corrected from earlier review feedback:
- Both
.addedand.deltavariants exist where needed (e.g.,response.output_text.addedandresponse.output_text.delta)- Annotation events use proper dot notation (
response.output_text.annotation.added)- Code interpreter events use correct underscore format (
response.code_interpreter_call_code.delta)The extensive set of constants covers the full spectrum of streaming events for the Responses API.
tests/core-providers/sgl_test.go (1)
28-28: LGTM: Test configuration updated for streaming refactor.The field rename from
ChatCompletionStreamtoCompletionStreamis consistent with the broader refactoring effort.tests/core-providers/ollama_test.go (1)
27-27: LGTM: Test configuration updated for streaming refactor.The field rename is consistent with the broader refactoring effort.
tests/core-providers/openrouter_test.go (1)
32-32: LGTM: Test configuration updated for streaming refactor.The field rename is consistent with the broader refactoring effort and was properly addressed in the previous review cycle.
tests/core-providers/vertex_test.go (1)
28-28: LGTM: Test configuration updated for streaming refactor.The field rename from
ChatCompletionStreamtoCompletionStreamis consistent with the broader refactoring effort across all provider tests.tests/core-providers/custom_test.go (1)
31-31: Rename to CompletionStream looks correctMatches the new scenario flag naming and unified streaming path.
tests/core-providers/openai_test.go (1)
30-33: Streaming flags update LGTMEnabling TextCompletionStream and CompletionStream aligns with the refactor.
tests/core-providers/cerebras_test.go (1)
26-29: Streaming flags update LGTMUsing TextCompletionStream and CompletionStream is consistent with the unified streaming setup.
transports/bifrost-http/handlers/inference.go (1)
828-896: Verify error chunks carry the correct RequestType
Confirm that every error emitted into the stream (i.e., viaprocessAndSendErroror post-hook) has itsExtraFields.RequestTypeset toResponsesStreamRequest. Without that, the first-chunk–error edge case will misclassify the stream and emit a stray[DONE].core/providers/openrouter.go (3)
118-130: LGTM: switched to chat-specific streaming handler.Correct use of handleOpenAIChatCompletionStreaming with OpenRouter provider key.
146-153: RequestType is already set in the converter
getResponsesChunkConverterCombinedPostHookRunner invokes result.ToResponsesStream() and then sets result.ExtraFields.RequestType = schemas.ResponsesStreamRequest for every non-nil chunk. No changes needed.
132-145: No changes needed for OpenRouter Responses endpoint The currentprovider.networkConfig.BaseURL + "/alpha/responses"correctly yieldshttps://openrouter.ai/api/alpha/responsesper the OpenRouter docs.core/providers/ollama.go (1)
118-136: LGTM: added chat-specific streaming with shared handler.Consistent with other providers and uses correct provider key.
core/providers/mistral.go (1)
96-114: LGTM: added chat-specific streaming with shared handler.Matches provider conventions and uses correct provider key.
transports/bifrost-http/integrations/utils.go (1)
456-625: No action needed: ExtraFields.RequestType is correctly set for responses streams
43efca8 to
d4f79e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
transports/bifrost-http/handlers/inference.go (1)
695-699: Bug: multipart file may be partially read. Use io.ReadAll/io.ReadFull.Single Read doesn’t guarantee full file contents. This can truncate audio and break transcription.
Apply this diff:
- // Read file data - fileData := make([]byte, fileHeader.Size) - if _, err := file.Read(fileData); err != nil { - SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger) - return - } + // Read file data (ensure full read) + fileData, err := io.ReadAll(file) + if err != nil { + SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger) + return + }
♻️ Duplicate comments (6)
core/schemas/responses.go (1)
1635-1645: Streaming event string literals still incorrect.The past review identified that the OpenAI Responses API uses
response.output_text.deltaandresponse.function_call_arguments.deltafor streaming content, not.addedvariants. WhileDeltaconstants exist (lines 1636, 1644), theAddedconstants at lines 1635 and 1643 don't match the official API event names and could cause event mismatches.Additionally, line 1678 uses an underscore in
"response.output_text_annotation.added"but should use a dot:"response.output_text.annotation.added".Apply this diff to fix the annotation event:
- ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added" + ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"For the
.addedconstants, either remove them if unused or verify they match actual API events. Based on official OpenAI documentation, only.deltavariants exist for output_text and function_call_arguments.Also applies to: 1678-1678
core/providers/utils.go (1)
552-575: Resolved: set RequestType on error chunks for responses streamsThis implements the earlier feedback to tag error-first chunks, preventing SSE misclassification.
core/providers/vertex.go (1)
225-227: Remove incorrectfasthttp.ErrTimeoutcheck.This is the same issue flagged in previous reviews: Vertex provider uses
net/http(not fasthttp), so thefasthttp.ErrTimeoutcheck will never match and should be removed.Apply this diff:
- if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }The same issue exists at lines 262, 516, and 547.
core/providers/openai.go (3)
632-632: Enlarge Scanner buffer to handle large streaming chunks.The default
bufio.Scannerbuffer (64KB max) can fail on large SSE chunks containing extensive logprobs or deltas. A past review comment flagged this issue, but it remains unaddressed.Apply this diff:
scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024) chunkIndex := -1
671-671: Fix TODO comment typo.The TODO comment contains a typo: "WITHING" should be "WITHIN". A past review flagged this, but it remains unfixed.
Apply this diff:
- //TODO: FIX THIS WITHING RESPONSE STRUCT + //TODO: FIX THIS WITHIN RESPONSE STRUCT
982-982: Enlarge Scanner buffer for Responses streaming.Same issue as chat streaming: the default scanner buffer can fail on long response events.
Apply this diff:
scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
🧹 Nitpick comments (5)
tests/core-providers/tests.go (1)
71-87: Add TextCompletionStream to the printed summaryThe runner exists, but the summary omits it. Add a row for visibility.
testScenarios := []struct { name string supported bool }{ {"TextCompletion", testConfig.Scenarios.TextCompletion && testConfig.TextModel != ""}, + {"TextCompletionStream", testConfig.Scenarios.TextCompletionStream}, {"SimpleChat", testConfig.Scenarios.SimpleChat}, {"CompletionStream", testConfig.Scenarios.CompletionStream},core/providers/gemini.go (1)
194-201: Fix docstring provider nameComment says “Anthropic’s API” in Gemini provider.
-// Responses performs a chat completion request to Anthropic's API. +// Responses performs a chat completion request using Gemini's OpenAI‑compatible API.transports/bifrost-http/handlers/inference.go (1)
840-846: Make requestType inference robust (fallback to response shape).If ExtraFields.RequestType is missing on early chunks, Responses stream formatting won’t trigger. Fallback to detect Responses stream by presence of ResponsesStreamResponse.
Apply this diff:
- if requestType == "" { - if chunk.BifrostResponse != nil { - requestType = chunk.BifrostResponse.ExtraFields.RequestType - } else if chunk.BifrostError != nil { - requestType = chunk.BifrostError.ExtraFields.RequestType - } - } + if requestType == "" { + if chunk.BifrostResponse != nil { + // Prefer explicit request type + requestType = chunk.BifrostResponse.ExtraFields.RequestType + // Fallback: infer from shape (Responses stream emits ResponsesStreamResponse) + if requestType == "" && chunk.BifrostResponse.ResponsesStreamResponse != nil { + requestType = schemas.ResponsesStreamRequest + } + } else if chunk.BifrostError != nil { + requestType = chunk.BifrostError.ExtraFields.RequestType + } + }core/providers/sgl.go (1)
119-137: Prefer provider.GetProviderKey() for consistency.Use the instance method for provider identity, matching the rest of the file.
Apply this diff:
- schemas.SGL, + provider.GetProviderKey(),core/providers/ollama.go (1)
118-136: Use provider.GetProviderKey() for consistency.Minor consistency tweak across methods.
Apply this diff:
- schemas.Ollama, + provider.GetProviderKey(),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (48)
core/providers/azure.go(2 hunks)core/providers/cerebras.go(2 hunks)core/providers/gemini.go(1 hunks)core/providers/groq.go(2 hunks)core/providers/mistral.go(2 hunks)core/providers/ollama.go(2 hunks)core/providers/openai.go(9 hunks)core/providers/openrouter.go(2 hunks)core/providers/parasail.go(2 hunks)core/providers/sgl.go(2 hunks)core/providers/utils.go(1 hunks)core/providers/vertex.go(6 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/mux.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)tests/core-providers/anthropic_test.go(1 hunks)tests/core-providers/azure_test.go(1 hunks)tests/core-providers/bedrock_test.go(1 hunks)tests/core-providers/cerebras_test.go(1 hunks)tests/core-providers/cohere_test.go(1 hunks)tests/core-providers/config/account.go(12 hunks)tests/core-providers/custom_test.go(1 hunks)tests/core-providers/gemini_test.go(1 hunks)tests/core-providers/groq_test.go(1 hunks)tests/core-providers/mistral_test.go(1 hunks)tests/core-providers/ollama_test.go(1 hunks)tests/core-providers/openai_test.go(1 hunks)tests/core-providers/openrouter_test.go(1 hunks)tests/core-providers/parasail_test.go(1 hunks)tests/core-providers/scenarios/chat_completion_stream.go(1 hunks)tests/core-providers/scenarios/responses_stream.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(0 hunks)tests/core-providers/sgl_test.go(1 hunks)tests/core-providers/tests.go(2 hunks)tests/core-providers/vertex_test.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (5)
- transports/bifrost-http/handlers/server.go
- plugins/telemetry/main.go
- core/schemas/providers/mistral/types.go
- core/schemas/providers/mistral/embedding.go
- tests/core-providers/scenarios/utils.go
🚧 Files skipped from review as they are similar to previous changes (10)
- tests/core-providers/gemini_test.go
- tests/core-providers/config/account.go
- tests/core-providers/cerebras_test.go
- tests/core-providers/cohere_test.go
- tests/core-providers/custom_test.go
- tests/core-providers/mistral_test.go
- core/schemas/providers/gemini/chat.go
- tests/core-providers/bedrock_test.go
- core/schemas/embedding.go
- tests/core-providers/groq_test.go
🧰 Additional context used
🧬 Code graph analysis (21)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/providers/openrouter.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (4)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)BifrostStream(383-386)core/schemas/provider.go (1)
PostHookRunner(191-191)
core/providers/utils.go (2)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/bifrost.go (4)
BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesStreamRequest(107-107)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
core/schemas/mux.go (3)
core/schemas/bifrost.go (3)
BifrostResponse(238-254)RequestType(99-99)ResponsesStreamRequest(107-107)core/schemas/responses.go (16)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesMessage(280-292)ResponsesMessageTypeMessage(256-256)ResponsesInputMessageRoleAssistant(566-566)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeRefusalDelta(1639-1639)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesToolMessage(692-715)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeIncomplete(1627-1627)ResponsesStreamResponseTypeOutputItemDone(1630-1630)ResponsesStreamResponseStruct(1719-1722)ResponsesResponseUsage(236-239)ResponsesExtendedResponseUsage(229-234)core/schemas/chatcompletions.go (1)
BifrostStreamResponseChoice(376-378)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/providers/vertex.go (5)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (2)
Key(8-17)VertexKeyConfig(35-39)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Vertex(60-60)BifrostResponsesRequest(197-203)core/schemas/providers/anthropic/chat.go (1)
ToAnthropicChatCompletionRequest(365-596)core/utils.go (1)
Ptr(12-14)
tests/core-providers/scenarios/responses_stream.go (4)
core/bifrost.go (1)
Bifrost(33-50)tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/responses.go (21)
ResponsesMessage(280-292)ResponsesMessageContent(573-576)ResponsesParameters(33-60)ResponsesStreamResponseType(1620-1620)ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesStreamResponseTypeContentPartAdded(1632-1632)ResponsesStreamResponseTypeCreated(1623-1623)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeError(1685-1685)ResponsesTool(1341-1356)ResponsesToolFunction(1359-1362)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesParametersReasoning(210-214)ResponsesStreamResponseTypeReasoningSummaryPartAdded(1653-1653)ResponsesStreamResponseTypeReasoningSummaryTextDelta(1655-1655)ResponsesMessageTypeReasoning(273-273)ResponsesResponse(85-114)ResponsesMessageTypeMessage(256-256)core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostStream(383-386)BifrostError(409-418)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)
tests/core-providers/scenarios/text_completion_stream.go (5)
core/bifrost.go (1)
Bifrost(33-50)tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/textcompletions.go (2)
TextCompletionInput(9-12)TextCompletionParameters(50-70)core/schemas/bifrost.go (9)
BifrostTextCompletionRequest(134-140)BifrostStream(383-386)BifrostError(409-418)TextCompletionStreamRequest(103-103)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)RequestType(99-99)TextCompletionRequest(102-102)core/schemas/chatcompletions.go (3)
BifrostStreamResponseChoice(376-378)BifrostTextCompletionResponseChoice(365-367)BifrostChatResponseChoice(355-363)
core/providers/azure.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (8)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)Azure(56-56)EmbeddingRequest(108-108)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/sgl.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)SGL(64-64)BifrostResponsesRequest(197-203)
core/providers/groq.go (2)
core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/providers/cerebras.go (2)
core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/ollama.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Ollama(62-62)BifrostResponsesRequest(197-203)
core/providers/parasail.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
tests/core-providers/tests.go (3)
tests/core-providers/scenarios/text_completion_stream.go (1)
RunTextCompletionStreamTest(16-399)tests/core-providers/scenarios/chat_completion_stream.go (1)
RunChatCompletionStreamTest(16-298)tests/core-providers/scenarios/responses_stream.go (1)
RunResponsesStreamTest(17-467)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/providers/openai.go (5)
core/schemas/bifrost.go (19)
TextCompletionStreamRequest(103-103)BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostErrorExtraFields(435-439)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/provider.go (7)
PostHookRunner(191-191)ErrProviderJSONMarshaling(26-26)ErrRequestCancelled(23-23)ErrProviderRequestTimedOut(22-22)ErrProviderRequest(24-24)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeError(1685-1685)ResponsesStreamResponseTypeCompleted(1625-1625)
core/providers/mistral.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Mistral(61-61)BifrostResponsesRequest(197-203)
🔇 Additional comments (44)
core/schemas/speech.go (1)
86-91: Pointer embedding is safe; nil-safety checks are in place.Code consistently checks
BifrostSpeechStreamResponsefor nil before accessingType, preventing runtime panics—no changes needed.core/schemas/textcompletions.go (1)
71-78: LGTM! Clean schema addition for log probability support.The new
TextCompletionLogProbtype is well-structured with proper JSON tags and aligns with the PR's broader streaming schema enhancements.core/schemas/transcriptions.go (1)
55-81: LGTM! Well-designed transcription response structure.The separation of common fields from streaming/non-streaming specifics via embedded pointers is clean and follows the patterns established in other schema files in this PR.
core/schemas/responses.go (1)
88-88: LGTM! Error field addition is correct.The
Errorfield properly usesomitemptyand follows the established pattern.tests/core-providers/scenarios/chat_completion_stream.go (1)
17-17: LGTM! Test config field rename aligns with streaming API refactor.The change from
ChatCompletionStreamtoCompletionStreamis part of the broader streaming API standardization across the codebase.tests/core-providers/sgl_test.go (1)
28-28: LGTM! Consistent test config update.The field rename matches the streaming API standardization across all test configurations.
tests/core-providers/parasail_test.go (1)
27-27: LGTM! Test config updated consistently.The field rename is part of the coordinated streaming API refactor across all provider tests.
tests/core-providers/openrouter_test.go (1)
32-32: LGTM! Test config field rename completed.The field rename from
ChatCompletionStreamtoCompletionStreamaligns with the streaming API standardization. This was previously reviewed and addressed.tests/core-providers/ollama_test.go (1)
27-27: LGTM! Final test config update looks good.The field rename completes the streaming API standardization across all provider test configurations.
tests/core-providers/anthropic_test.go (1)
28-28: LGTM: migrated to CompletionStream flagMatches the new unified streaming gating across scenarios.
tests/core-providers/tests.go (1)
29-33: Good: added text and responses stream scenario runnersNew runners are properly wired into the comprehensive suite.
tests/core-providers/azure_test.go (1)
28-28: LGTM: migrated to CompletionStream flagConsistent with other providers and updated scenarios.
tests/core-providers/vertex_test.go (1)
28-28: LGTM: migrated to CompletionStream flagAligned with the new streaming architecture.
tests/core-providers/openai_test.go (1)
30-33: LGTM: added TextCompletionStream and unified CompletionStreamCovers both text and chat streaming paths for OpenAI.
core/providers/gemini.go (2)
180-192: LGTM: switched to chat-specific streaming handlerhandleOpenAIChatCompletionStreaming is the right path for OpenAI‑compatible chat SSE.
919-921: Intentional: ResponsesStream unsupported for GeminiIf this is expected, consider noting it in provider capabilities/config to avoid accidental test invocation.
transports/bifrost-http/integrations/utils.go (2)
456-472: Detect request type from first chunkSolid approach; keeps formatting decisions consistent throughout the stream.
Please confirm all providers’ responses/responses_stream post-hooks propagate ExtraFields.RequestType on both success and error chunks (using the new combined post-hook), so detection never falls back ambiguously.
568-609: Correct Responses API SSE: event lines + no [DONE]
- Emits event: lines and data: payload for Responses streams.
- Suppresses [DONE] for responses API.
Looks correct per OpenAI Responses streaming semantics.
Also applies to: 619-626
core/providers/sgl.go (1)
153-160: Confirm ResponsesStream sets requestType=responses_stream on first chunk.HTTP SSE formatting depends on this to emit event lines and omit [DONE]. Verify getResponsesChunkConverterCombinedPostHookRunner sets ExtraFields.RequestType accordingly for the initial chunk (including error paths).
core/schemas/chatcompletions.go (1)
351-402: LGTM: union choice types and deltas are clear and align with streaming needs.Structure and tags look good.
core/providers/openrouter.go (3)
118-129: LGTM: shared chat streaming handler usage.Consistent with other providers.
146-153: Confirm ResponsesStream requestType and consider direct Responses streaming.
- Ensure the post-hook sets requestType=responses_stream on the first chunk (to drive SSE event lines and omit [DONE]).
- If OpenRouter supports direct Responses streaming, consider using it for parity (event fidelity, server-side behavior).
132-144: No changes needed for OpenRouter Responses endpoint.
‘/alpha/responses’ correctly maps to https://openrouter.ai/api/alpha/responses and supports both non-streaming (default) and streaming viastream: true.core/providers/ollama.go (1)
152-159: Confirm ResponsesStream sets requestType=responses_stream on first chunk.Needed by HTTP SSE layer to format events and avoid [DONE].
core/providers/mistral.go (2)
96-114: LGTM!The streaming implementation correctly uses
handleOpenAIChatCompletionStreamingwith proper endpoint, headers, and parameters.
130-137: LGTM!The ResponsesStream implementation properly delegates to ChatCompletionStream with the responses-specific post-hook converter, following the established pattern across providers.
core/providers/vertex.go (2)
349-437: LGTM!The streaming implementation properly handles both Claude (Anthropic-style) and non-Claude (OpenAI-style) models with appropriate validation, authentication, and streaming paths.
453-460: LGTM!The ResponsesStream implementation correctly delegates to ChatCompletionStream with the responses-specific post-hook converter.
core/schemas/mux.go (1)
885-1027: LGTM! Previous issue has been resolved.The
ToResponsesStreammethod now correctly setsbr.ExtraFields.RequestType = ResponsesStreamRequestat line 1024, addressing the concern raised in previous reviews. The implementation comprehensively handles various streaming delta types (role, content, thought, refusal, tool calls) and properly maps finish reasons to completion states.tests/core-providers/scenarios/responses_stream.go (2)
17-467: LGTM!The test implementation is comprehensive and well-structured, covering:
- Basic streaming with proper timeout handling and context cancellation
- Tool calls validation when supported
- Reasoning scenarios with appropriate event tracking
- Sequence number validation and event type tracking
- Good error handling and diagnostic logging
The streaming validation logic properly handles the Responses API streaming format with appropriate checks for content, events, and metadata.
470-637: LGTM!The helper functions provide clear validation logic for responses streaming:
validateResponsesStreamingStructure: validates sequence ordering and event distributioncreateConsolidatedResponsesResponse: constructs consolidated response for validationvalidateResponsesStreamingResponse: performs comprehensive streaming validationThe validation approach is thorough and handles provider-specific variations appropriately.
core/providers/cerebras.go (2)
120-132: LGTM!The streaming implementation correctly uses
handleOpenAIChatCompletionStreamingfor Cerebras's OpenAI-compatible streaming format.
134-160: LGTM!The Responses and ResponsesStream implementations follow the established pattern:
- Responses: properly converts chat response to responses format with correct metadata
- ResponsesStream: correctly delegates through ChatCompletionStream with the responses-specific post-hook converter
- Embedding: appropriately returns unsupported error for Cerebras
core/providers/parasail.go (1)
102-142: LGTM! Clean implementation of Responses API for Parasail provider.The changes correctly implement the Responses and ResponsesStream methods by routing through chat completion endpoints with appropriate conversions. The use of
getResponsesChunkConverterCombinedPostHookRunnerensures streaming chunks are properly transformed to the Responses format.core/providers/groq.go (1)
160-200: LGTM! Consistent implementation across providers.The Groq provider changes mirror the Parasail implementation, maintaining consistency across the codebase. The routing of Responses API calls through chat completion endpoints is the correct approach for providers without native Responses support.
core/providers/azure.go (1)
320-393: LGTM! Azure provider fully implements Responses and Embedding APIs.The changes correctly implement the Responses streaming architecture while also providing full Embedding support (unlike other providers which stub it out). The implementation properly leverages Azure's deployment-based URL construction and maintains consistency with Azure's authentication patterns.
core/schemas/bifrost.go (1)
253-253: LGTM! Embedded pointer enables Responses streaming support.Adding
*ResponsesStreamResponseas an embedded pointer allowsBifrostResponseto carry streaming-specific data for the Responses API. This follows the established pattern of using embedded pointers (like*ResponsesResponseon line 252) to extend the response structure.tests/core-providers/scenarios/text_completion_stream.go (3)
114-121: LGTM! Nil Delta check properly implemented.The code now correctly guards against nil Delta before accessing
Delta.Content(line 117:if delta != nil && delta.Content != nil). This addresses the past review comment.
269-276: LGTM! Nil Delta check properly implemented in varied prompts loop.The nil check for Delta is correctly implemented here as well (line 272:
if delta != nil && delta.Content != nil), addressing the past review concern.
402-433: LGTM! Well-structured consolidation helper.The
createConsolidatedTextCompletionResponsefunction properly constructs a synthetic response for validation, correctly copying metadata from the last streaming response when available.core/providers/openai.go (4)
737-740: LGTM! Nil Delta guard prevents potential panic.The code now properly checks for nil Delta before accessing its fields, addressing the past review comment.
990-1011: LGTM! Event line handling is correct for Responses API format.The special handling for
event:lines (skipping them) is intentional and correct for the OpenAI Responses API streaming format, which uses SSE with separate event and data lines.
1029-1054: LGTM! Error handling properly sets metadata for transport layer.The error handling correctly populates
ExtraFieldswithRequestType,Provider, andModelRequested(lines 1034-1038), and properly extracts the error code (lines 1047-1049). This ensures the transport layer can emit correct SSE events.
1083-1179: LGTM! Embedding implementation follows established patterns.The
Embeddingmethod andhandleOpenAIEmbeddingRequesthandler are well-structured and consistent with other request handlers in the provider.
99da354 to
39465aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
core/schemas/speech.go (1)
56-84: Bug: UnmarshalJSON doesn’t handle single VoiceConfig object (only string or array).Comment says “object/array,” but a single object isn’t parsed. Add object handling.
Apply this diff:
func (vi *SpeechVoiceInput) UnmarshalJSON(data []byte) error { // Reset receiver state before attempting any decode to avoid stale data vi.Voice = nil vi.MultiVoiceConfig = nil // First, try to unmarshal as a direct string var stringContent string if err := sonic.Unmarshal(data, &stringContent); err == nil { vi.Voice = &stringContent return nil } + // Try to unmarshal as a single VoiceConfig object + var single VoiceConfig + if err := sonic.Unmarshal(data, &single); err == nil { + if single.Voice == "" { + return fmt.Errorf("voice config has empty voice field") + } + vi.MultiVoiceConfig = []VoiceConfig{single} + return nil + } + // Try to unmarshal as an array of VoiceConfig objects var voiceConfigs []VoiceConfig if err := sonic.Unmarshal(data, &voiceConfigs); err == nil { // Validate each VoiceConfig and build a new slice deterministically validConfigs := make([]VoiceConfig, 0, len(voiceConfigs)) for _, config := range voiceConfigs { if config.Voice == "" { return fmt.Errorf("voice config has empty voice field") } validConfigs = append(validConfigs, config) } vi.MultiVoiceConfig = validConfigs return nil } - return fmt.Errorf("voice field is neither a string, nor an array of VoiceConfig objects") + return fmt.Errorf("voice field is neither a string, a VoiceConfig object, nor an array of VoiceConfig objects") }core/providers/groq.go (1)
36-39: http.Client.Timeout will prematurely cut off SSE streams.A finite Timeout aborts long-running streams. Use 0 (no overall timeout) or a dedicated streaming timeout plus per-phase timeouts (Dial/TLS/ResponseHeaderTimeout) on Transport.
Apply this minimal safe change:
- streamClient := &http.Client{ - Timeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), - } + streamClient := &http.Client{ + // Do not set an overall timeout for streaming; rely on transport-level timeouts. + Timeout: 0, +}If you have a stream-specific timeout in config, prefer that instead.
core/providers/openrouter.go (1)
36-39: http.Client.Timeout will prematurely cut off SSE streams.Same as Groq: avoid overall Timeout for streaming.
- streamClient := &http.Client{ - Timeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds), - } + streamClient := &http.Client{ + Timeout: 0, +}
♻️ Duplicate comments (4)
core/schemas/responses.go (1)
1617-1722: Note: Streaming event types have open discussion from past reviews.The streaming response types and constants are comprehensive. However, past review comments raised concerns about specific event type string literals (lines 1635, 1643) using "added" vs "delta" suffixes, with extensive discussion between reviewer and author. This remains an open item from previous reviews.
core/providers/openai.go (3)
671-671: Fix TODO comment typo.The TODO comment contains a typo: "WITHING" should be "WITHIN". This was previously flagged and marked as addressed but the typo remains.
Apply this diff:
- //TODO: FIX THIS WITHING RESPONSE STRUCT + //TODO: FIX THIS WITHIN RESPONSE STRUCT
632-632: Enlarge scanner buffer to prevent errors on large chunks.The
bufio.NewScanneruses a default 64KB max buffer, which can cause scan errors when the API returns large chunks (e.g., extensive logprobs, large tool call arguments, or verbose content). This was previously flagged and marked as addressed but the fix is not present in the current code.Apply this diff to enlarge the buffer:
scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024) chunkIndex := -1
982-982: Enlarge scanner buffer for Responses streaming.Similar to chat streaming, the Responses API scanner needs a larger buffer to handle long response events. The default 64KB max can cause scan errors on large structured payloads.
Apply this diff to enlarge the buffer:
scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
🧹 Nitpick comments (8)
core/providers/mistral.go (1)
100-114: Avoid client-wide Timeout for SSE; rely on ctx instead.A non-zero http.Client.Timeout can abort long-lived SSE. Prefer Timeout=0 on the streaming client and enforce limits via context deadlines per request.
core/providers/vertex.go (4)
224-227: Remove http.ErrHandlerTimeout check; use context deadline.http.ErrHandlerTimeout is server-side; client timeouts surface as context.DeadlineExceeded (or net.Error Timeout). Keep only the context check.
Apply this diff:
- if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }
261-263: Same here: drop http.ErrHandlerTimeout check.Use only context.DeadlineExceeded (or a net.Error timeout test).
- if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }
515-517: Timeout check: remove http.ErrHandlerTimeout here too.- if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }
546-548: Timeout check: same fix for Do error branch.- if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }core/providers/groq.go (2)
41-49: Proxy not applied to streamClient (net/http).Only the fasthttp client is proxied; streaming requests may bypass proxies. Mirror proxy settings on streamClient.Transport.
Example:
streamClient := &http.Client{ - Timeout: 0, + Timeout: 0, + Transport: &http.Transport{ + Proxy: http.ProxyFromEnvironment, // or build from config.ProxyConfig + // set Dial/TLS/Idle timeouts as needed + }, }
119-136: Avoid shadowing channel var in range.for response := range response { … } is confusing. Rename the loop variable.
- for response := range response { - if response.BifrostError != nil { - responseChan <- response + for chunk := range response { + if chunk.BifrostError != nil { + responseChan <- chunk continue } - response.ToTextCompletionResponse() - if response.BifrostResponse != nil { - response.BifrostResponse.ExtraFields.RequestType = schemas.TextCompletionRequest - response.BifrostResponse.ExtraFields.Provider = provider.GetProviderKey() - response.BifrostResponse.ExtraFields.ModelRequested = request.Model + chunk.ToTextCompletionResponse() + if chunk.BifrostResponse != nil { + chunk.BifrostResponse.ExtraFields.RequestType = schemas.TextCompletionRequest + chunk.BifrostResponse.ExtraFields.Provider = provider.GetProviderKey() + chunk.BifrostResponse.ExtraFields.ModelRequested = request.Model } - responseChan <- response + responseChan <- chunk }core/providers/openrouter.go (1)
41-43: Apply proxy config to streamClient as well.Streaming requests should honor the same proxy settings.
streamClient := &http.Client{ Timeout: 0, + Transport: &http.Transport{ + Proxy: http.ProxyFromEnvironment, // or from config.ProxyConfig + }, }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (51)
core/providers/anthropic.go(2 hunks)core/providers/azure.go(2 hunks)core/providers/bedrock.go(2 hunks)core/providers/cerebras.go(2 hunks)core/providers/cohere.go(2 hunks)core/providers/gemini.go(5 hunks)core/providers/groq.go(2 hunks)core/providers/mistral.go(2 hunks)core/providers/ollama.go(2 hunks)core/providers/openai.go(16 hunks)core/providers/openrouter.go(2 hunks)core/providers/parasail.go(2 hunks)core/providers/sgl.go(2 hunks)core/providers/utils.go(1 hunks)core/providers/vertex.go(6 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/mux.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)tests/core-providers/anthropic_test.go(1 hunks)tests/core-providers/azure_test.go(1 hunks)tests/core-providers/bedrock_test.go(1 hunks)tests/core-providers/cerebras_test.go(1 hunks)tests/core-providers/cohere_test.go(1 hunks)tests/core-providers/config/account.go(12 hunks)tests/core-providers/custom_test.go(1 hunks)tests/core-providers/gemini_test.go(1 hunks)tests/core-providers/groq_test.go(1 hunks)tests/core-providers/mistral_test.go(1 hunks)tests/core-providers/ollama_test.go(1 hunks)tests/core-providers/openai_test.go(1 hunks)tests/core-providers/openrouter_test.go(1 hunks)tests/core-providers/parasail_test.go(1 hunks)tests/core-providers/scenarios/chat_completion_stream.go(1 hunks)tests/core-providers/scenarios/responses_stream.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(0 hunks)tests/core-providers/sgl_test.go(1 hunks)tests/core-providers/tests.go(2 hunks)tests/core-providers/vertex_test.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (5)
- core/schemas/providers/mistral/types.go
- core/schemas/providers/mistral/embedding.go
- tests/core-providers/scenarios/utils.go
- plugins/telemetry/main.go
- transports/bifrost-http/handlers/server.go
🚧 Files skipped from review as they are similar to previous changes (15)
- core/schemas/providers/gemini/chat.go
- tests/core-providers/tests.go
- tests/core-providers/scenarios/chat_completion_stream.go
- tests/core-providers/gemini_test.go
- core/providers/gemini.go
- tests/core-providers/openrouter_test.go
- core/providers/ollama.go
- tests/core-providers/openai_test.go
- core/providers/sgl.go
- core/schemas/chatcompletions.go
- tests/core-providers/groq_test.go
- transports/bifrost-http/integrations/utils.go
- tests/core-providers/parasail_test.go
- tests/core-providers/scenarios/text_completion_stream.go
- tests/core-providers/vertex_test.go
🧰 Additional context used
🧬 Code graph analysis (16)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (2)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeError(1685-1685)
core/providers/utils.go (2)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/bifrost.go (4)
BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesStreamRequest(107-107)
tests/core-providers/scenarios/responses_stream.go (3)
tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/responses.go (20)
ResponsesMessage(280-292)ResponsesMessageContent(573-576)ResponsesParameters(33-60)ResponsesStreamResponseType(1620-1620)ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesStreamResponseTypeContentPartAdded(1632-1632)ResponsesStreamResponseTypeCreated(1623-1623)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeError(1685-1685)ResponsesTool(1341-1356)ResponsesToolFunction(1359-1362)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesParametersReasoning(210-214)ResponsesStreamResponseTypeReasoningSummaryPartAdded(1653-1653)ResponsesStreamResponseTypeReasoningSummaryTextDelta(1655-1655)ResponsesMessageTypeReasoning(273-273)ResponsesResponse(85-114)core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostStream(383-386)BifrostError(409-418)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)SGL(64-64)
core/providers/openrouter.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (4)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)BifrostStream(383-386)core/schemas/provider.go (1)
PostHookRunner(191-191)
core/providers/vertex.go (5)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (2)
Key(8-17)VertexKeyConfig(35-39)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Vertex(60-60)BifrostResponsesRequest(197-203)core/schemas/providers/anthropic/chat.go (1)
ToAnthropicChatCompletionRequest(365-596)core/utils.go (1)
Ptr(12-14)
core/providers/parasail.go (2)
core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/mistral.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Mistral(61-61)BifrostResponsesRequest(197-203)
core/providers/cerebras.go (2)
core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/schemas/transcriptions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/providers/groq.go (2)
core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/openai.go (5)
core/schemas/bifrost.go (17)
TextCompletionStreamRequest(103-103)BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)ErrorField(426-433)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostErrorExtraFields(435-439)EmbeddingRequest(108-108)core/schemas/provider.go (2)
PostHookRunner(191-191)Provider(194-219)core/schemas/chatcompletions.go (1)
ChatStreamOptions(43-46)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeError(1685-1685)ResponsesStreamResponseTypeCompleted(1625-1625)
core/schemas/mux.go (3)
core/schemas/bifrost.go (3)
BifrostResponse(238-254)RequestType(99-99)ResponsesStreamRequest(107-107)core/schemas/responses.go (16)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesMessage(280-292)ResponsesMessageTypeMessage(256-256)ResponsesInputMessageRoleAssistant(566-566)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeRefusalDelta(1639-1639)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesToolMessage(692-715)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeIncomplete(1627-1627)ResponsesStreamResponseTypeOutputItemDone(1630-1630)ResponsesStreamResponseStruct(1719-1722)ResponsesResponseUsage(236-239)ResponsesExtendedResponseUsage(229-234)core/schemas/chatcompletions.go (1)
BifrostStreamResponseChoice(376-378)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
core/providers/azure.go (2)
core/schemas/bifrost.go (9)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)Azure(56-56)EmbeddingRequest(108-108)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/responses.go (2)
plugins/mocker/main.go (1)
Response(97-103)core/schemas/bifrost.go (1)
LogProbs(341-346)
🔇 Additional comments (40)
tests/core-providers/sgl_test.go (1)
28-28: Flag rename is correct
TheCompletionStreamflag consistently replacesChatCompletionStreamacross tests and matches theTestScenariosdefinition for both chat and responses streaming; no change needed.core/providers/anthropic.go (1)
398-398: LGTM: Timeout handling updated to standard library.The migration from
fasthttp.ErrTimeouttohttp.ErrHandlerTimeoutaligns with the repository-wide timeout handling standardization. The logic remains unchanged—only the error sentinel has been swapped.Also applies to: 425-425
core/providers/cohere.go (1)
380-380: LGTM: Timeout handling migrated to http package.The replacement of
fasthttp.ErrTimeoutwithhttp.ErrHandlerTimeoutis consistent with the timeout handling standardization across providers.Also applies to: 408-408
core/providers/bedrock.go (1)
158-158: LGTM: Complete migration from fasthttp timeout handling.The removal of the
fasthttpimport and migration tohttp.ErrHandlerTimeoutcompletes the timeout handling standardization for this provider.Also applies to: 473-473
tests/core-providers/config/account.go (2)
22-24: LGTM: Test configuration fields updated consistently.The addition of
TextCompletionStreamandCompletionStream(replacingChatCompletionStream) aligns with the broader streaming surface expansion across the test suite.
466-768: LGTM: Provider configurations updated consistently.All provider test configurations have been consistently updated to use
CompletionStreaminstead ofChatCompletionStream, with appropriate values for each provider's streaming capabilities.core/schemas/embedding.go (1)
97-145: LGTM: Embedding types and JSON handling implemented correctly.The
BifrostEmbeddingandBifrostEmbeddingResponsetypes with their JSON marshaling methods provide flexible representation for embedding data (string, 1D array, or 2D array).Note: A previous review raised a concern about
UnmarshalJSONnot resetting fields before decoding, which could cause stale state if instances are reused. The maintainer confirmed that instances are not reused in the current codebase, which addresses the concern.tests/core-providers/ollama_test.go (1)
27-27: LGTM: Test configuration field updated.The change from
ChatCompletionStreamtoCompletionStreamis consistent with the test configuration refactoring across all provider tests.tests/core-providers/cohere_test.go (1)
28-28: LGTM: Test configuration field updated.The field rename from
ChatCompletionStreamtoCompletionStreammatches the unified streaming flag naming across the test suite.tests/core-providers/bedrock_test.go (1)
33-33: LGTM: Test configuration field updated.The change from
ChatCompletionStreamtoCompletionStreamcompletes the consistent test configuration migration across all provider tests.tests/core-providers/mistral_test.go (1)
28-28: LGTM on field rename.Consistent with the new TestScenarios API.
core/schemas/speech.go (2)
16-19: No semantic change.Whitespace-only adjustments.
86-95: BifrostSpeech struct verified; no duplicates found
Audio ([]byte) base64-encodes as expected; embedded *BifrostSpeechStreamResponse omits nil fields.tests/core-providers/anthropic_test.go (1)
28-28: LGTM on field rename.Matches updated TestScenarios schema.
tests/core-providers/custom_test.go (1)
31-31: LGTM on CompletionStream usage.Consistent with other providers’ tests.
core/providers/mistral.go (1)
130-137: ResponsesStream wiring looks good.Delegation via getResponsesChunkConverterCombinedPostHookRunner is consistent with other providers.
tests/core-providers/scenarios/responses_stream.go (1)
318-325: Verify tool call name path.Does ResponsesMessage expose Name at the top-level? If Name is nested (e.g., via ResponsesToolMessage), adjust access accordingly to avoid nil/compile issues.
core/providers/vertex.go (1)
452-459: ResponsesStream wiring looks good.Delegation via combined post-hook is consistent with other providers.
core/providers/parasail.go (2)
101-114: LGTM on streaming handler swap.OpenAI-compatible streaming path and headers are correct.
116-128: Responses/ResponsesStream/Embedding surfaces look consistent.Responses adapts Chat->Responses; ResponsesStream delegates with post-hook; Embedding correctly returns unsupported.
Also applies to: 130-137, 139-142
core/providers/cerebras.go (2)
120-132: LGTM on streaming handler swap.Switch to handleOpenAIChatCompletionStreaming is consistent.
134-146: Responses/ResponsesStream/Embedding surfaces look consistent.Adapters mirror other providers; unsupported Embedding is explicit.
Also applies to: 148-155, 157-160
core/providers/azure.go (3)
320-331: LGTM on chat streaming refactor.Handler swap preserves Azure auth header handling and deployment URL.
334-349: Responses and ResponsesStream wiring look good.Uses Chat path conversion and combined post-hook runner as elsewhere.
Also applies to: 351-358
360-394: Embedding implementation looks correct.Conversion, completeRequest reuse, metadata, and optional RawResponse handled properly.
core/schemas/bifrost.go (1)
253-253: LGTM: Clean addition for streaming responses support.The embedded
*ResponsesStreamResponsepointer follows the existing pattern and enables the Responses API streaming surface without breaking changes. The nil-gated pointer ensures backward compatibility.core/schemas/responses.go (1)
88-88: LGTM: Error field addition is well-structured.Adding
Error *ResponsesResponseErrortoResponsesResponseenables proper error handling in non-streaming responses. The pointer type correctly allows optional presence.core/providers/openai.go (3)
259-259: LGTM: Comprehensive timeout error handling.The additions of
http.ErrHandlerTimeoutchecks alongsidecontext.DeadlineExceededprovide proper coverage for both context-based and HTTP client timeouts across all streaming handlers. This prevents silent failures and ensures consistent error reporting.Also applies to: 587-587, 936-936, 963-963, 1296-1296, 1323-1323, 1575-1575, 1602-1602
303-305: LGTM: Modern Go idiom for prefix handling.Using
strings.CutPrefixwith the ok-check is the preferred Go 1.20+ idiom. It's more efficient and explicit thanTrimPrefixand properly distinguishes between "prefix present" and "prefix absent" cases.Also applies to: 658-663
511-533: LGTM: Well-structured new public API methods.The new and refactored methods follow consistent patterns:
ChatCompletionStreamproperly delegates to sharedhandleOpenAIChatCompletionStreamingResponsesandResponsesStreamfollow the same patterns as chat methodsEmbeddinguses the standard request handler pattern- All methods properly validate operations and set metadata
The refactoring to shared handlers reduces duplication and improves maintainability.
Also applies to: 772-789, 867-886, 1083-1104
core/providers/groq.go (4)
160-171: Streaming handler switch looks good.Using handleOpenAIChatCompletionStreaming aligns with the unified streaming path.
174-186: Responses path conversion is correct.Chat→Responses conversion with metadata set (RequestType/Provider/Model) is consistent.
188-195: ResponsesStream delegation is appropriate.Reusing Chat stream with the combined post-hook converter is the right pattern for non-OpenAI providers.
30-34: Confirm config field for MaxConnsPerHost core/providers/groq.go:33 uses BufferSize, but providers like OpenRouter use Concurrency—ensure this matches the intended connection limit.core/providers/openrouter.go (3)
118-130: Streaming handler switch looks good.handleOpenAIChatCompletionStreaming usage aligns with unified streaming logic.
146-153: ResponsesStream delegation is appropriate.Reusing Chat stream with responses converter matches the non-OpenAI pattern.
132-144: No change needed: OpenRouter’s Responses endpoint should use “/alpha/responses” as per the official docs.core/schemas/transcriptions.go (1)
56-81: New transcription schemas look consistent.Composition of common/non-stream/stream fields is clean and aligns with existing schema patterns.
tests/core-providers/cerebras_test.go (1)
26-29: Test flags updated appropriately.Switch to TextCompletionStream and unified CompletionStream looks correct.
core/schemas/mux.go (1)
885-1027: Approve ToResponsesStream conversion Covers role/text/refusal/tool-call deltas, finish reasons, usage mapping, sets RequestType, clears chat fields; no duplicate definitions found.
39465aa to
19cab6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
transports/bifrost-http/handlers/inference.go (1)
694-701: Fix partial file read and enforce size guard.Single Read may return fewer bytes; also no size guard is enforced. Use io.ReadAll and reject oversize files.
- // Read file data - fileData := make([]byte, fileHeader.Size) - if _, err := file.Read(fileData); err != nil { - SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger) - return - } + // Enforce size limit and read file data safely + if fileHeader.Size > MaxFileSize { + SendError(ctx, fasthttp.StatusBadRequest, fmt.Sprintf("File size exceeds maximum limit of %d MB", MaxFileSize/1024/1024), h.logger) + return + } + fileData, err := io.ReadAll(file) + if err != nil { + SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger) + return + }core/providers/openai.go (3)
278-283: Increase Scanner buffer to avoid truncated SSE chunks.Default 64KB limit can fail on large deltas/logprobs. Set a larger buffer.
Apply:
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
1342-1346: SpeechStream: enlarge Scanner buffer.Audio SSE chunks can be large; increase max token size.
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
1621-1625: TranscriptionStream: enlarge Scanner buffer.Avoid scan errors on long chunks.
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
♻️ Duplicate comments (8)
tests/core-providers/azure_test.go (1)
28-28: Rename to CompletionStream looks good.Matches the config struct rename; no issues spotted here.
tests/core-providers/anthropic_test.go (1)
28-28: LGTM on CompletionStream flag.Consistent with the field rename across tests.
tests/core-providers/config/account.go (1)
20-25: Struct rename/additions look good (CompletionStream, TextCompletionStream)Ensure all scenario helpers reference Scenarios.CompletionStream (not ChatCompletionStream).
#!/bin/bash # Confirm no lingering ChatCompletionStream references rg -nP '\bChatCompletionStream\b' -S -g '!**/vendor/**'tests/core-providers/scenarios/responses_stream.go (1)
529-535: Fix dereference: read fields from lastResponse.BifrostResponse.BifrostStream wraps BifrostResponse. Accessing fields directly on lastResponse won’t compile.
- if lastResponse != nil && lastResponse.BifrostResponse != nil { - consolidatedResponse.Usage = lastResponse.Usage - consolidatedResponse.Model = lastResponse.Model - consolidatedResponse.ID = lastResponse.ID - consolidatedResponse.Created = lastResponse.Created - } + if lastResponse != nil && lastResponse.BifrostResponse != nil { + consolidatedResponse.Usage = lastResponse.BifrostResponse.Usage + consolidatedResponse.Model = lastResponse.BifrostResponse.Model + consolidatedResponse.ID = lastResponse.BifrostResponse.ID + consolidatedResponse.Created = lastResponse.BifrostResponse.Created + }core/providers/openai.go (4)
1029-1054: Good: Responses stream error metadata and code forwarded.ExtraFields (RequestType/Provider/Model) and Error.Code are now included; enables correct SSE emission and error handling.
671-672: Fix TODO typo and clarify/remove TODO.Typo “WITHING” → “WITHIN”. Consider replacing with actionable comment or removing.
- //TODO: FIX THIS WITHING RESPONSE STRUCT + // TODO: Fix this within response struct (clarify what needs fixing or remove)
632-636: Same: enlarge Scanner buffer for chat streaming.Prevent scan errors on big tool-call/logprob chunks.
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
982-986: Responses streaming: enlarge Scanner buffer.Large response events can exceed default buffer.
- scanner := bufio.NewScanner(resp.Body) + scanner := bufio.NewScanner(resp.Body) + scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
🧹 Nitpick comments (4)
transports/bifrost-http/handlers/inference.go (1)
603-606: Remove duplicate initialization of SpeechParameters.Lines 603-606 repeat the same initialization done at Lines 599-601. Drop the second block to avoid confusion.
- // Extract extra params - if req.SpeechParameters == nil { - req.SpeechParameters = &schemas.SpeechParameters{} - }core/providers/mistral.go (1)
96-114: Guard Authorization header on presence of key.Safer and consistent with other providers; omit the header if key is empty.
- return handleOpenAIChatCompletionStreaming( + authHeader := map[string]string{} + if key.Value != "" { + authHeader["Authorization"] = "Bearer " + key.Value + } + return handleOpenAIChatCompletionStreaming( ctx, provider.streamClient, provider.networkConfig.BaseURL+"/v1/chat/completions", request, - map[string]string{"Authorization": "Bearer " + key.Value}, + authHeader, provider.networkConfig.ExtraHeaders, provider.sendBackRawResponse, schemas.Mistral, postHookRunner, provider.logger, )core/providers/vertex.go (1)
224-227: Prefer context/net timeout checks over http.ErrHandlerTimeout.http.ErrHandlerTimeout is rarely surfaced here. Use context.DeadlineExceeded or net.Error Timeout().
- if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) { + if errors.Is(err, context.DeadlineExceeded) { return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex) }Optionally:
var nerr net.Error if errors.As(err, &nerr) && nerr.Timeout() { /* treat as timeout */ }Apply similarly to the other occurrences noted in the ranges.
Also applies to: 261-263, 515-518, 546-548
core/providers/parasail.go (1)
102-114: Set Authorization header only when key is provided.Avoid sending "Bearer " when no key is configured.
- return handleOpenAIChatCompletionStreaming( + authHeader := map[string]string{} + if key.Value != "" { + authHeader["Authorization"] = "Bearer " + key.Value + } + return handleOpenAIChatCompletionStreaming( ctx, provider.streamClient, provider.networkConfig.BaseURL+"/v1/chat/completions", request, - map[string]string{"Authorization": "Bearer " + key.Value}, + authHeader, provider.networkConfig.ExtraHeaders, provider.sendBackRawResponse, schemas.Parasail, postHookRunner, provider.logger, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (51)
core/providers/anthropic.go(2 hunks)core/providers/azure.go(2 hunks)core/providers/bedrock.go(2 hunks)core/providers/cerebras.go(2 hunks)core/providers/cohere.go(2 hunks)core/providers/gemini.go(5 hunks)core/providers/groq.go(2 hunks)core/providers/mistral.go(2 hunks)core/providers/ollama.go(2 hunks)core/providers/openai.go(16 hunks)core/providers/openrouter.go(2 hunks)core/providers/parasail.go(2 hunks)core/providers/sgl.go(2 hunks)core/providers/utils.go(1 hunks)core/providers/vertex.go(6 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/mux.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)tests/core-providers/anthropic_test.go(1 hunks)tests/core-providers/azure_test.go(1 hunks)tests/core-providers/bedrock_test.go(1 hunks)tests/core-providers/cerebras_test.go(1 hunks)tests/core-providers/cohere_test.go(1 hunks)tests/core-providers/config/account.go(12 hunks)tests/core-providers/custom_test.go(1 hunks)tests/core-providers/gemini_test.go(1 hunks)tests/core-providers/groq_test.go(1 hunks)tests/core-providers/mistral_test.go(1 hunks)tests/core-providers/ollama_test.go(1 hunks)tests/core-providers/openai_test.go(1 hunks)tests/core-providers/openrouter_test.go(1 hunks)tests/core-providers/parasail_test.go(1 hunks)tests/core-providers/scenarios/chat_completion_stream.go(1 hunks)tests/core-providers/scenarios/responses_stream.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(0 hunks)tests/core-providers/sgl_test.go(1 hunks)tests/core-providers/tests.go(2 hunks)tests/core-providers/vertex_test.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (5)
- plugins/telemetry/main.go
- transports/bifrost-http/handlers/server.go
- tests/core-providers/scenarios/utils.go
- core/schemas/providers/mistral/types.go
- core/schemas/providers/mistral/embedding.go
🚧 Files skipped from review as they are similar to previous changes (21)
- core/schemas/textcompletions.go
- core/schemas/chatcompletions.go
- tests/core-providers/custom_test.go
- tests/core-providers/scenarios/chat_completion_stream.go
- tests/core-providers/vertex_test.go
- core/providers/bedrock.go
- tests/core-providers/gemini_test.go
- core/providers/utils.go
- tests/core-providers/openai_test.go
- core/providers/sgl.go
- core/schemas/embedding.go
- tests/core-providers/mistral_test.go
- tests/core-providers/scenarios/text_completion_stream.go
- tests/core-providers/tests.go
- core/providers/gemini.go
- core/providers/cohere.go
- tests/core-providers/groq_test.go
- core/providers/groq.go
- core/schemas/transcriptions.go
- tests/core-providers/cerebras_test.go
- core/schemas/providers/gemini/chat.go
🧰 Additional context used
🧬 Code graph analysis (15)
core/providers/openrouter.go (2)
core/schemas/bifrost.go (4)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)BifrostStream(383-386)core/schemas/provider.go (1)
PostHookRunner(191-191)
core/providers/ollama.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Ollama(62-62)BifrostResponsesRequest(197-203)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (2)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeError(1685-1685)
core/providers/mistral.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Mistral(61-61)BifrostResponsesRequest(197-203)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
tests/core-providers/scenarios/responses_stream.go (3)
tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/responses.go (22)
ResponsesMessage(280-292)ResponsesInputMessageRoleUser(567-567)ResponsesMessageContent(573-576)ResponsesParameters(33-60)ResponsesStreamResponseType(1620-1620)ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesStreamResponseTypeContentPartAdded(1632-1632)ResponsesStreamResponseTypeCreated(1623-1623)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeError(1685-1685)ResponsesTool(1341-1356)ResponsesToolFunction(1359-1362)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesParametersReasoning(210-214)ResponsesStreamResponseTypeReasoningSummaryPartAdded(1653-1653)ResponsesStreamResponseTypeReasoningSummaryTextDelta(1655-1655)ResponsesMessageTypeReasoning(273-273)ResponsesResponse(85-114)ResponsesMessageTypeMessage(256-256)core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostStream(383-386)BifrostError(409-418)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)SGL(64-64)
core/schemas/mux.go (3)
core/schemas/bifrost.go (3)
BifrostResponse(238-254)RequestType(99-99)ResponsesStreamRequest(107-107)core/schemas/responses.go (16)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesMessage(280-292)ResponsesMessageTypeMessage(256-256)ResponsesInputMessageRoleAssistant(566-566)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeRefusalDelta(1639-1639)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesToolMessage(692-715)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeIncomplete(1627-1627)ResponsesStreamResponseTypeOutputItemDone(1630-1630)ResponsesStreamResponseStruct(1719-1722)ResponsesResponseUsage(236-239)ResponsesExtendedResponseUsage(229-234)core/schemas/chatcompletions.go (1)
BifrostStreamResponseChoice(376-378)
core/providers/cerebras.go (2)
core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/providers/azure.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (8)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)Azure(56-56)EmbeddingRequest(108-108)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/vertex.go (5)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (2)
Key(8-17)VertexKeyConfig(35-39)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Vertex(60-60)BifrostResponsesRequest(197-203)core/schemas/providers/anthropic/chat.go (1)
ToAnthropicChatCompletionRequest(365-596)core/utils.go (1)
Ptr(12-14)
core/providers/parasail.go (2)
core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
core/providers/openai.go (5)
core/schemas/bifrost.go (19)
TextCompletionStreamRequest(103-103)BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)ErrorField(426-433)RequestCancelled(378-378)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostErrorExtraFields(435-439)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/provider.go (3)
PostHookRunner(191-191)DefaultStreamBufferSize(17-17)Provider(194-219)core/utils.go (1)
Ptr(12-14)core/schemas/providers/openai/responses.go (1)
ToOpenAIResponsesRequest(30-51)core/schemas/responses.go (3)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeError(1685-1685)ResponsesStreamResponseTypeCompleted(1625-1625)
🔇 Additional comments (32)
core/providers/anthropic.go (1)
398-398: LGTM! Timeout error handling correctly aligned with http.Client usage.The changes from
fasthttp.ErrTimeouttohttp.ErrHandlerTimeoutare correct since thehandleAnthropicStreamingfunction useshttp.Clientfor streaming requests, notfasthttp.Client. The timeout handling remains robust with the existingcontext.DeadlineExceededchecks.Also applies to: 425-425
core/schemas/speech.go (1)
86-95: LGTM: Speech streaming types added.The new
BifrostSpeechandBifrostSpeechStreamResponsetypes extend the speech API to support streaming. The pointer embedding ofBifrostSpeechStreamResponsefollows Go conventions for optional fields and aligns with the broader streaming infrastructure additions across the PR.core/schemas/responses.go (3)
88-88: Error field addition looks good.The new
Errorfield inResponsesResponseappropriately extends error handling capabilities for the Responses API.
1635-1637: Verify necessity of.addedevent type constants.The code defines both
.addedand.deltavariants for output text and function call arguments:
ResponsesStreamResponseTypeOutputTextAdded(.added)ResponsesStreamResponseTypeOutputTextDelta(.delta)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(.added)ResponsesStreamResponseTypeFunctionCallArgumentsDelta(.delta)Based on past review discussions, OpenAI's Responses API uses
.deltafor streaming chunks. Having both variants could be correct if the API emits distinct event types at different stages, but it may also indicate that the.addedvariants were supposed to be removed when.deltawas introduced.Confirm whether both event types are emitted by the OpenAI Responses API or if only
.deltavariants should be retained for streaming events.Also applies to: 1643-1645
1688-1722: Streaming response structures look comprehensive.The
ResponsesStreamResponseandResponsesStreamResponseStructtypes provide a well-structured foundation for handling streaming events with appropriate fields for different event types (deltas, items, parts, annotations, etc.).tests/core-providers/ollama_test.go (1)
27-27: Field rename aligns with streaming refactor.The rename from
ChatCompletionStreamtoCompletionStreamis consistent with the broader refactoring across all provider tests to unify streaming flag naming.tests/core-providers/openrouter_test.go (1)
32-32: Consistent with streaming flag refactor.The
CompletionStreamrename maintains consistency across all provider test configurations.tests/core-providers/parasail_test.go (1)
27-27: Test configuration updated consistently.The streaming flag rename follows the same pattern applied across all provider tests.
tests/core-providers/cohere_test.go (1)
28-28: Cohere test config aligned with refactor.The field rename maintains consistency with the unified
CompletionStreamnaming convention.tests/core-providers/sgl_test.go (1)
28-28: SGL test scenarios updated.The streaming flag rename is consistent with the broader test suite refactoring.
tests/core-providers/bedrock_test.go (1)
33-33: Bedrock test config updated consistently.The rename completes the unified
CompletionStreamnaming convention across all provider test configurations.transports/bifrost-http/integrations/utils.go (1)
568-609: SSE formatting by request type and DONE gating look correct
- Emits event: lines for Responses streams and standard data: lines otherwise.
- Suppresses [DONE] for Responses streaming; uses stream close. Good.
Also applies to: 619-626
core/schemas/mux.go (1)
885-1028: Single ToResponsesStream implementation confirmed Only oneToResponsesStreamdefinition exists across the codebase.transports/bifrost-http/handlers/inference.go (3)
631-634: Speech SSE gating: LGTM.Streaming only when stream_format == "sse" is clear and safe.
891-898: Correct: don’t emit [DONE] for Responses streams.This matches OpenAI Responses API behavior.
840-846: LGTM: RequestType set on first chunk The utils.go post-hook converter and openai.go’s manual assignments ensure ExtraFields.RequestType is initialized on the initial chunk for all streaming providers.core/providers/openrouter.go (3)
118-130: Chat streaming migration: LGTM.Uses shared handleOpenAIChatCompletionStreaming with proper headers and provider key.
146-154: Verified: converter sets request type and stream response
getResponsesChunkConverterCombinedPostHookRunner calls ToResponsesStream() to populate ResponsesStreamResponse and assigns ExtraFields.RequestType = ResponsesStreamRequest on both results and errors.
132-144: Responses endpoint path is correct. The default BaseURL (https://openrouter.ai/api) plus/alpha/responsesyieldshttps://openrouter.ai/api/alpha/responses, matching the official API.core/providers/ollama.go (1)
152-160: ResponsesStream delegation: LGTM.Post-hook reuse keeps behavior consistent.
core/providers/mistral.go (1)
130-138: ResponsesStream delegation: LGTM.Combined post-hook approach is consistent.
core/providers/vertex.go (1)
418-435: Auth header usage: LGTM.Uses computed authHeader allowing OAuth client to manage Authorization when key.Value is empty.
core/providers/parasail.go (1)
116-138: Responses methods: LGTM.Non-streaming wraps chat; streaming uses combined post-hook.
core/providers/cerebras.go (4)
120-131: Good switch to specialized chat streaming wrapper.Delegation and params look correct.
134-146: Responses path wiring LGTM.Conversion + ToResponsesOnly + ExtraFields set correctly.
148-155: ResponsesStream delegation LGTM.Using combined post-hook converter is the right approach.
157-161: Clear unsupported Embedding surface.Appropriate explicit error for Cerebras.
core/providers/azure.go (4)
320-332: Chat streaming wrapper usage LGTM.Headers/auth preserved; consistent with other providers.
334-349: Responses path LGTM.Correct conversion and metadata population.
351-358: ResponsesStream delegation LGTM.Combined post-hook converter is appropriate.
360-393: Embedding implementation LGTM.Correct URL construction via completeRequest; ExtraFields set; RawResponse gated by flag.
core/schemas/bifrost.go (1)
253-254: Schema extension for Responses streaming LGTM.Clean embedding of stream variant into BifrostResponse.
19cab6c to
4daa291
Compare
4daa291 to
301ef3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
core/schemas/mux.go (1)
918-935: Avoid fallthrough; handle role+content explicitly.
fallthroughexecutes the next case body without condition re-checks, which can misclassify role-only chunks and preclude handling of other fields in the same chunk. Replace with an explicit nested content check.- case delta.Role != nil: + case delta.Role != nil: // Role initialization - typically the first chunk streamResp.Type = ResponsesStreamResponseTypeOutputItemAdded streamResp.Item = &ResponsesMessage{ Type: Ptr(ResponsesMessageTypeMessage), Role: Ptr(ResponsesInputMessageRoleAssistant), } if *delta.Role == "assistant" { streamResp.Item.Role = Ptr(ResponsesInputMessageRoleAssistant) } - fallthrough - - case delta.Content != nil && *delta.Content != "": - if delta.Content != nil && *delta.Content != "" { // Need this check again because of the fallthrough - // Text content delta - streamResp.Type = ResponsesStreamResponseTypeOutputTextDelta - streamResp.Delta = delta.Content - } + // If content is also present in this chunk, emit text delta + if delta.Content != nil && *delta.Content != "" { + streamResp.Type = ResponsesStreamResponseTypeOutputTextDelta + streamResp.Delta = delta.Content + } + // No implicit fallthrough + + case delta.Content != nil && *delta.Content != "": + // Text content delta + streamResp.Type = ResponsesStreamResponseTypeOutputTextDelta + streamResp.Delta = delta.Contentcore/schemas/textcompletions.go (1)
72-78: Allow nulls in logprobs arrays.OpenAI can return nulls for token_logprobs and elements of top_logprobs; current types will fail to unmarshal.
type TextCompletionLogProb struct { TextOffset []int `json:"text_offset"` - TokenLogProbs []float64 `json:"token_logprobs"` + TokenLogProbs []*float64 `json:"token_logprobs"` Tokens []string `json:"tokens"` - TopLogProbs []map[string]float64 `json:"top_logprobs"` + TopLogProbs []*map[string]float64 `json:"top_logprobs"` }Ensure readers handle nil entries safely.
tests/core-providers/scenarios/responses_stream.go (1)
529-535: Use explicit field access through BifrostResponse.While Go's embedded pointer allows
lastResponse.Usageas shorthand, the explicit path is clearer and avoids potential confusion. This mirrors the past review concern at the same lines.Apply this diff for clarity:
- consolidatedResponse.Usage = lastResponse.Usage - consolidatedResponse.Model = lastResponse.Model - consolidatedResponse.ID = lastResponse.ID - consolidatedResponse.Created = lastResponse.Created + consolidatedResponse.Usage = lastResponse.BifrostResponse.Usage + consolidatedResponse.Model = lastResponse.BifrostResponse.Model + consolidatedResponse.ID = lastResponse.BifrostResponse.ID + consolidatedResponse.Created = lastResponse.BifrostResponse.Created
🧹 Nitpick comments (4)
tests/core-providers/scenarios/chat_completion_stream.go (1)
17-20: Align log text with renamed flag.Message still says “Chat completion stream...”. Update to “Completion stream...” for consistency.
- t.Logf("Chat completion stream not supported for provider %s", testConfig.Provider) + t.Logf("Completion stream not supported for provider %s", testConfig.Provider)core/schemas/mux.go (1)
893-896: Also set RequestType when early-returning.If
ResponsesStreamResponseis already present, mark the request type before returning to ensure transports format SSE correctly.- if br.ResponsesStreamResponse != nil { - br.Choices = nil - return - } + if br.ResponsesStreamResponse != nil { + br.ExtraFields.RequestType = ResponsesStreamRequest + br.Choices = nil + return + }core/schemas/chatcompletions.go (1)
351-363: Consider adding runtime validation for union-style embedded pointers.The comment correctly states that only one of the three embedded pointers should be non-nil at a time, but there's no runtime enforcement of this invariant. If multiple pointers are set simultaneously, the JSON marshaling could produce unexpected results due to field overlaps or overwriting.
Consider adding a validation method:
// Validate ensures only one embedded pointer is non-nil func (c *BifrostChatResponseChoice) Validate() error { count := 0 if c.BifrostTextCompletionResponseChoice != nil { count++ } if c.BifrostNonStreamResponseChoice != nil { count++ } if c.BifrostStreamResponseChoice != nil { count++ } if count > 1 { return fmt.Errorf("multiple embedded pointers set; only one should be non-nil") } return nil }core/schemas/transcriptions.go (1)
56-66: Consider adding runtime validation for union-style embedded pointers.Similar to
BifrostChatResponseChoice, this struct uses a union pattern with embedded pointers where only one should be non-nil at a time. Without runtime validation, setting bothBifrostTranscribeNonStreamResponseandBifrostTranscribeStreamResponsesimultaneously could lead to unexpected JSON serialization behavior.Consider adding a validation method similar to the one suggested for
BifrostChatResponseChoice:// Validate ensures only one embedded pointer is non-nil func (t *BifrostTranscribe) Validate() error { count := 0 if t.BifrostTranscribeNonStreamResponse != nil { count++ } if t.BifrostTranscribeStreamResponse != nil { count++ } if count > 1 { return fmt.Errorf("both streaming and non-streaming pointers set; only one should be non-nil") } return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (51)
core/providers/anthropic.go(2 hunks)core/providers/azure.go(2 hunks)core/providers/bedrock.go(2 hunks)core/providers/cerebras.go(2 hunks)core/providers/cohere.go(2 hunks)core/providers/gemini.go(5 hunks)core/providers/groq.go(2 hunks)core/providers/mistral.go(2 hunks)core/providers/ollama.go(3 hunks)core/providers/openai.go(21 hunks)core/providers/openrouter.go(2 hunks)core/providers/parasail.go(2 hunks)core/providers/sgl.go(3 hunks)core/providers/utils.go(1 hunks)core/providers/vertex.go(6 hunks)core/schemas/bifrost.go(1 hunks)core/schemas/chatcompletions.go(1 hunks)core/schemas/embedding.go(1 hunks)core/schemas/mux.go(1 hunks)core/schemas/providers/gemini/chat.go(1 hunks)core/schemas/providers/mistral/embedding.go(0 hunks)core/schemas/providers/mistral/types.go(0 hunks)core/schemas/responses.go(2 hunks)core/schemas/speech.go(2 hunks)core/schemas/textcompletions.go(1 hunks)core/schemas/transcriptions.go(1 hunks)plugins/telemetry/main.go(0 hunks)tests/core-providers/anthropic_test.go(1 hunks)tests/core-providers/azure_test.go(1 hunks)tests/core-providers/bedrock_test.go(1 hunks)tests/core-providers/cerebras_test.go(1 hunks)tests/core-providers/cohere_test.go(1 hunks)tests/core-providers/config/account.go(12 hunks)tests/core-providers/custom_test.go(1 hunks)tests/core-providers/gemini_test.go(1 hunks)tests/core-providers/groq_test.go(1 hunks)tests/core-providers/mistral_test.go(1 hunks)tests/core-providers/ollama_test.go(1 hunks)tests/core-providers/openai_test.go(1 hunks)tests/core-providers/openrouter_test.go(1 hunks)tests/core-providers/parasail_test.go(1 hunks)tests/core-providers/scenarios/chat_completion_stream.go(1 hunks)tests/core-providers/scenarios/responses_stream.go(1 hunks)tests/core-providers/scenarios/text_completion_stream.go(1 hunks)tests/core-providers/scenarios/utils.go(0 hunks)tests/core-providers/sgl_test.go(1 hunks)tests/core-providers/tests.go(2 hunks)tests/core-providers/vertex_test.go(1 hunks)transports/bifrost-http/handlers/inference.go(4 hunks)transports/bifrost-http/handlers/server.go(0 hunks)transports/bifrost-http/integrations/utils.go(3 hunks)
💤 Files with no reviewable changes (5)
- transports/bifrost-http/handlers/server.go
- core/schemas/providers/mistral/types.go
- tests/core-providers/scenarios/utils.go
- core/schemas/providers/mistral/embedding.go
- plugins/telemetry/main.go
🚧 Files skipped from review as they are similar to previous changes (20)
- tests/core-providers/groq_test.go
- core/providers/anthropic.go
- core/schemas/providers/gemini/chat.go
- tests/core-providers/azure_test.go
- core/providers/utils.go
- core/providers/gemini.go
- tests/core-providers/openrouter_test.go
- tests/core-providers/custom_test.go
- core/providers/mistral.go
- core/providers/bedrock.go
- tests/core-providers/bedrock_test.go
- core/schemas/embedding.go
- tests/core-providers/openai_test.go
- core/providers/cohere.go
- tests/core-providers/cohere_test.go
- tests/core-providers/anthropic_test.go
- tests/core-providers/scenarios/text_completion_stream.go
- tests/core-providers/mistral_test.go
- tests/core-providers/gemini_test.go
- tests/core-providers/sgl_test.go
🧰 Additional context used
🧬 Code graph analysis (19)
transports/bifrost-http/integrations/utils.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
core/schemas/chatcompletions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/providers/vertex.go (5)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (2)
Key(8-17)VertexKeyConfig(35-39)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Vertex(60-60)BifrostResponsesRequest(197-203)core/schemas/providers/anthropic/chat.go (1)
ToAnthropicChatCompletionRequest(365-596)core/utils.go (1)
Ptr(12-14)
tests/core-providers/scenarios/responses_stream.go (4)
tests/core-providers/config/account.go (1)
ComprehensiveTestConfig(43-55)core/schemas/responses.go (22)
ResponsesMessage(280-292)ResponsesInputMessageRoleUser(567-567)ResponsesMessageContent(573-576)ResponsesParameters(33-60)ResponsesStreamResponseType(1620-1620)ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeOutputItemAdded(1629-1629)ResponsesStreamResponseTypeContentPartAdded(1632-1632)ResponsesStreamResponseTypeCreated(1623-1623)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeError(1685-1685)ResponsesTool(1341-1356)ResponsesToolFunction(1359-1362)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesParametersReasoning(210-214)ResponsesStreamResponseTypeReasoningSummaryPartAdded(1653-1653)ResponsesStreamResponseTypeReasoningSummaryTextDelta(1655-1655)ResponsesMessageTypeReasoning(273-273)ResponsesResponse(85-114)ResponsesInputMessageRoleAssistant(566-566)core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostStream(383-386)BifrostError(409-418)BifrostResponse(238-254)ModelProvider(52-52)BifrostResponseExtraFields(349-358)SGL(64-64)core/schemas/chatcompletions.go (1)
ToolFunctionParameters(73-79)
core/providers/cerebras.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/providers/groq.go (2)
core/schemas/bifrost.go (6)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/transcriptions.go (1)
core/schemas/bifrost.go (1)
LogProbs(341-346)
core/schemas/speech.go (1)
core/schemas/bifrost.go (1)
AudioLLMUsage(304-309)
core/providers/azure.go (2)
core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
tests/core-providers/tests.go (3)
tests/core-providers/scenarios/text_completion_stream.go (1)
RunTextCompletionStreamTest(16-399)tests/core-providers/scenarios/chat_completion_stream.go (1)
RunChatCompletionStreamTest(16-298)tests/core-providers/scenarios/responses_stream.go (1)
RunResponsesStreamTest(17-467)
core/providers/sgl.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)SGL(64-64)BifrostResponsesRequest(197-203)
core/providers/ollama.go (3)
core/schemas/provider.go (1)
PostHookRunner(191-191)core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (5)
BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)Ollama(62-62)BifrostResponsesRequest(197-203)
core/schemas/responses.go (1)
plugins/mocker/main.go (1)
Response(97-103)
core/schemas/bifrost.go (1)
core/schemas/responses.go (1)
ResponsesStreamResponse(1688-1717)
transports/bifrost-http/handlers/inference.go (2)
core/schemas/bifrost.go (4)
RequestType(99-99)BifrostResponse(238-254)BifrostError(409-418)ResponsesStreamRequest(107-107)core/schemas/responses.go (2)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeError(1685-1685)
core/providers/openrouter.go (2)
core/schemas/bifrost.go (4)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)BifrostStream(383-386)core/schemas/provider.go (1)
PostHookRunner(191-191)
core/providers/openai.go (4)
core/schemas/bifrost.go (18)
TextCompletionStreamRequest(103-103)BifrostChatRequest(189-195)BifrostStream(383-386)BifrostError(409-418)OpenAI(55-55)ChatCompletionStreamRequest(105-105)ModelProvider(52-52)ErrorField(426-433)LLMUsage(296-302)BifrostContextKeyStreamEndIndicator(125-125)BifrostResponse(238-254)RequestType(99-99)BifrostResponsesRequest(197-203)ResponsesRequest(106-106)ResponsesStreamRequest(107-107)BifrostErrorExtraFields(435-439)BifrostEmbeddingRequest(205-211)EmbeddingRequest(108-108)core/schemas/provider.go (3)
PostHookRunner(191-191)DefaultStreamBufferSize(17-17)Provider(194-219)core/schemas/account.go (1)
Key(8-17)core/schemas/responses.go (3)
ResponsesStreamResponse(1688-1717)ResponsesStreamResponseTypeError(1685-1685)ResponsesStreamResponseTypeCompleted(1625-1625)
core/providers/parasail.go (3)
core/schemas/account.go (1)
Key(8-17)core/schemas/bifrost.go (7)
BifrostResponsesRequest(197-203)BifrostResponse(238-254)BifrostError(409-418)RequestType(99-99)ResponsesRequest(106-106)BifrostStream(383-386)BifrostEmbeddingRequest(205-211)core/schemas/provider.go (2)
Provider(194-219)PostHookRunner(191-191)
core/schemas/mux.go (4)
core/schemas/bifrost.go (3)
BifrostResponse(238-254)RequestType(99-99)ResponsesStreamRequest(107-107)core/schemas/responses.go (15)
ResponsesStreamResponse(1688-1717)ResponsesMessage(280-292)ResponsesMessageTypeMessage(256-256)ResponsesInputMessageRoleAssistant(566-566)ResponsesStreamResponseTypeOutputTextDelta(1636-1636)ResponsesStreamResponseTypeRefusalDelta(1639-1639)ResponsesStreamResponseTypeFunctionCallArgumentsAdded(1642-1642)ResponsesMessageTypeFunctionCall(261-261)ResponsesToolMessage(692-715)ResponsesStreamResponseTypeCompleted(1625-1625)ResponsesStreamResponseTypeIncomplete(1627-1627)ResponsesStreamResponseTypeOutputItemDone(1630-1630)ResponsesStreamResponseStruct(1719-1722)ResponsesResponseUsage(236-239)ResponsesExtendedResponseUsage(229-234)core/schemas/chatcompletions.go (1)
BifrostStreamResponseChoice(376-378)plugins/mocker/main.go (2)
Usage(138-142)Response(97-103)
🔇 Additional comments (41)
core/schemas/speech.go (1)
86-95: New speech streaming surface looks good.Types are minimal and compatible with existing BifrostResponse.Speech usage.
transports/bifrost-http/integrations/utils.go (1)
568-609: SSE branching by request type looks correct.
- Uses Responses event line when type is responses_stream.
- Falls back to standard SSE and emits [DONE] only for non-responses.
Confirm that all streaming responses set ExtraFields.RequestType; if any converter bypasses it, consider inferring from presence of ResponsesStreamResponse as a fallback.
tests/core-providers/ollama_test.go (1)
27-27: LGTM! Consistent field rename across test suite.The rename from
ChatCompletionStreamtoCompletionStreamaligns with the broader refactoring effort to unify streaming terminology across all provider tests.tests/core-providers/vertex_test.go (1)
28-28: LGTM! Consistent field rename across test suite.The rename from
ChatCompletionStreamtoCompletionStreammaintains consistency with the broader test configuration refactoring.tests/core-providers/parasail_test.go (1)
27-27: LGTM! Consistent field rename across test suite.The rename from
ChatCompletionStreamtoCompletionStreamis consistent with the test configuration updates across all providers.tests/core-providers/tests.go (2)
29-32: LGTM! New streaming test scenarios properly integrated.The addition of
RunTextCompletionStreamTestandRunResponsesStreamTestexpands test coverage for different streaming formats. Based on the relevant code snippets, both test implementations properly handle streaming with timeouts, validation, and error handling.
71-71: LGTM! Summary updated to match field rename.The test summary correctly references the renamed
CompletionStreamfield, maintaining consistency with the test configuration changes.tests/core-providers/cerebras_test.go (1)
26-28: LGTM! Text completion streaming enabled for Cerebras.The addition of
TextCompletionStream: trueand the rename ofChatCompletionStreamtoCompletionStreamenable comprehensive streaming test coverage for the Cerebras provider. This aligns with the broader test configuration refactoring.tests/core-providers/config/account.go (3)
22-24: LGTM! Test configuration fields properly refactored.The addition of
TextCompletionStreamand the rename ofChatCompletionStreamtoCompletionStreamprovide clearer separation between text completion streaming and chat completion streaming scenarios. This change is consistently propagated across all provider configurations inAllProviderConfigs.
466-468: LGTM! OpenAI configuration updated consistently.The test configuration correctly disables
TextCompletionStream(OpenAI doesn't support legacy text completions in newer models) and enablesCompletionStreamfor chat completion streaming.
496-496: LGTM! All provider configurations updated consistently.The
CompletionStreamfield is consistently enabled across Anthropic, Bedrock, Cohere, Azure, Vertex, Ollama, Groq, OpenAI Custom, Gemini, and OpenRouter provider configurations, maintaining the same streaming capabilities as before the refactor.Also applies to: 523-523, 550-550, 577-577, 604-604, 657-657, 684-684, 711-711, 741-741, 768-768
core/schemas/chatcompletions.go (1)
389-402: Drop duplication concern—LogProbandContentLogProbare defined once inchatcompletions.goand merely referenced (not redefined) byLogProbsinbifrost.go.Likely an incorrect or invalid review comment.
transports/bifrost-http/handlers/inference.go (1)
828-898: LGTM: Correct Responses API streaming detection and SSE formatting.The implementation correctly:
- Detects request type from the first chunk's ExtraFields
- Emits
event: <type>lines for Responses streaming (OpenAI Responses API format)- Uses standard
data: ...format for other streaming APIs- Omits
[DONE]marker for Responses API (per OpenAI spec)Minor observation: If
requestTypeis never populated (no chunks with BifrostResponse/BifrostError), it defaults to standard SSE format, which is a reasonable fallback.core/providers/openrouter.go (1)
118-153: LGTM: Clean refactoring and new Responses API surfaces.The changes correctly:
- Switch to specialized
handleOpenAIChatCompletionStreamingfor chat completions- Add
Responsesmethod using OpenRouter's/alpha/responsesendpoint- Add
ResponsesStreammethod that delegates toChatCompletionStreamwith a post-hook converterThe pattern is consistent with other providers in this PR.
core/providers/vertex.go (2)
348-436: LGTM: Correct streaming implementation with proper auth handling.The implementation correctly:
- Validates key config (projectID, region)
- Routes Claude models to Anthropic streaming, others to OpenAI streaming
- Sets Authorization header only when
key.Valueis non-empty (addresses past review concern about OAuth conflicts)- Handles auth client creation errors
The conditional auth header approach at lines 418-421 is appropriate: when using OAuth (via
getAuthClient),key.Valuewould typically be empty, soauthHeaderremains empty and OAuth handles authentication.
452-459: LGTM: Standard ResponsesStream implementation.Correctly delegates to
ChatCompletionStreamwith a post-hook converter and request transformation.core/providers/ollama.go (2)
94-136: LGTM: Correct auth handling for Ollama.Passing
nilas the auth header (lines 94, 129) is correct for Ollama, which typically doesn't require authentication for local deployments. This addresses the past review concern about sending"Authorization: Bearer "with an empty key causing 401 errors.
152-159: LGTM: Standard ResponsesStream implementation.Correctly delegates to
ChatCompletionStreamwith post-hook converter.tests/core-providers/scenarios/responses_stream.go (1)
1-637: LGTM: Comprehensive Responses streaming test coverage.The test file correctly implements:
- Basic streaming with event type and sequence number tracking
- Timeout handling with context cancellation
- Tool calls streaming validation
- Reasoning streaming validation
- Proper event structure validation
- Content accumulation across streaming chunks
The three test scenarios (basic, tools, reasoning) provide solid coverage for the Responses API streaming path.
core/providers/azure.go (1)
320-393: LGTM: Standard provider surfaces for Responses and Embedding.The additions correctly:
- Switch to specialized
handleOpenAIChatCompletionStreamingfor chat completions- Implement
Responsesby delegating toChatCompletionand formatting the result- Implement
ResponsesStreamviaChatCompletionStreamwith post-hook converter- Implement
Embeddingusing OpenAI converter and Azure's request infrastructureAll methods properly populate
ExtraFields(Provider, RequestType, ModelRequested, Latency, RawResponse).core/providers/parasail.go (1)
102-142: LGTM: Standard provider surfaces added.The changes correctly:
- Switch to specialized
handleOpenAIChatCompletionStreaming- Add
ResponsesandResponsesStreamfollowing the standard pattern- Return unsupported operation error for
Embedding(Parasail doesn't support embeddings)core/providers/cerebras.go (1)
120-160: LGTM: Standard provider surfaces added.The changes correctly:
- Switch to specialized
handleOpenAIChatCompletionStreaming- Add
ResponsesandResponsesStreamfollowing the standard pattern- Return unsupported operation error for
Embedding(Cerebras doesn't support embeddings)core/providers/sgl.go (3)
95-95: LGTM: Authorization handling aligned with provider requirements.The change to pass
nilfor the Authorization header is consistent with howChatCompletionStreamhandles authentication (line 130) and appears intentional for the SGL provider.
119-137: LGTM: Clean streaming implementation.The
ChatCompletionStreammethod correctly delegates to the sharedhandleOpenAIChatCompletionStreaminghandler with all required parameters. The implementation follows the established pattern used across other providers.
153-160: LGTM: Correct responses streaming delegation.The
ResponsesStreamimplementation correctly delegates throughChatCompletionStreamusing the combined post-hook runner pattern. This approach converts chat streaming responses into the Responses API format, which is the expected pattern for non-OpenAI providers.core/providers/groq.go (4)
160-172: LGTM: Refactored to use shared streaming handler.The change to use
handleOpenAIChatCompletionStreamingfollows the established pattern and improves code maintainability by leveraging shared streaming logic.
174-186: LGTM: Proper Responses API delegation.The
Responsesmethod correctly converts the Responses request to a chat request, processes it, and transforms the response usingToResponsesOnly(). The metadata fields are properly populated.
188-195: LGTM: Consistent responses streaming pattern.The
ResponsesStreamimplementation follows the same delegation pattern used in other providers, correctly routing throughChatCompletionStreamwith the converter post-hook runner.
197-200: LGTM: Proper unsupported operation handling.The
Embeddingmethod correctly returns an unsupported operation error for Groq, maintaining interface compliance while clearly communicating the limitation.core/schemas/responses.go (2)
88-88: LGTM: Error field addition for response error handling.The new
Errorfield inResponsesResponseappropriately handles error scenarios in the Responses API format.
1635-1643: Verify dual event type constants for streaming.The code now includes both
.addedand.deltavariants for output text and function call arguments:
ResponsesStreamResponseTypeOutputTextAdded= "response.output_text.added" (line 1635)ResponsesStreamResponseTypeOutputTextDelta= "response.output_text.delta" (line 1636)ResponsesStreamResponseTypeFunctionCallArgumentsAdded= "response.function_call_arguments.added" (line 1642)ResponsesStreamResponseTypeFunctionCallArgumentsDelta= "response.function_call_arguments.delta" (line 1643)Previous review discussions suggested replacing
.addedwith.delta, but both variants now exist. Please confirm:
- Are both event types emitted by the OpenAI Responses API?
- If only
.deltaevents are emitted for incremental updates, should the.addedconstants be removed?Based on past review discussions, please verify the actual OpenAI Responses API event names to ensure these constants match the streamed events.
core/schemas/bifrost.go (1)
253-253: LGTM: Embedded field for responses streaming support.The addition of
*ResponsesStreamResponsetoBifrostResponseproperly integrates the new responses streaming surface into the unified response structure.core/providers/openai.go (9)
132-134: LGTM: Conditional authorization header handling.The addition of
if key.Value != ""checks before setting the Authorization header across all request methods properly handles scenarios where API keys may not be required (e.g., local deployments, authenticated proxies).Also applies to: 475-477, 836-838, 1162-1164
227-230: LGTM: Safe authHeader handling in streaming methods.The conditional check for
authHeader != nilbefore copying headers prevents nil pointer issues and properly supports providers that don't require authentication.Also applies to: 573-576, 928-931
264-267: LGTM: Comprehensive timeout error handling.The addition of checks for both
http.ErrHandlerTimeoutandcontext.DeadlineExceededensures proper timeout detection across different failure scenarios in all streaming methods.Also applies to: 596-599, 623-626, 951-954, 978-981, 1318-1321, 1345-1348, 1597-1600, 1624-1627
308-313: LGTM: Improved SSE parsing with CutPrefix.The switch to
strings.CutPrefixprovides clearer semantics and better performance. The additional filtering for "event:" lines in responses streaming (lines 1020-1026) correctly handles the Responses API's event-based SSE format.Also applies to: 668-673, 1018-1026
329-336: LGTM: Proper error handling in streaming responses.The error handling correctly parses stream errors, marks the stream as ending with
BifrostContextKeyStreamEndIndicator, and propagates the error through the response channel.Also applies to: 690-697
747-750: LGTM: Nil-check prevents potential panic.The addition of
choice.BifrostStreamResponseChoice.Delta != nilguard prevents nil pointer dereferences when processing chunks that contain only role information or tool calls without delta content.
1044-1069: LGTM: Complete error handling for responses streaming.The error handling properly detects error events, populates all necessary fields including
ExtraFieldswith request metadata, and correctly propagates error codes, messages, and parameters. This addresses previous review feedback.
1080-1085: LGTM: Proper completion event handling.The detection of
ResponsesStreamResponseTypeCompletedevents and the subsequent stream termination with final latency calculation correctly implements the terminal state handling for responses streaming.
1102-1201: LGTM: Complete embedding support implementation.The new
Embeddingmethod andhandleOpenAIEmbeddingRequesthandler follow the established patterns for request handling, including proper error handling, response parsing, and metadata population. The implementation is consistent with other request types in the provider.
Merge activity
|

Summary
Refactored streaming handlers to improve code organization by separating chat completion streaming from responses streaming functionality. This PR introduces dedicated streaming handlers for different API types to enhance maintainability and support for the OpenAI Responses API streaming format.
Changes
handleOpenAIStreamingtohandleOpenAIChatCompletionStreamingto better reflect its purposeType of change
Affected areas
How to test
Test streaming functionality with different providers to ensure it works correctly:
Breaking changes
Security considerations
No security implications as this is a refactoring of existing functionality.
Checklist
docs/contributing/README.mdand followed the guidelines