feat: added responses api streaming #598

Pratham-Mishra04 · 2025-10-11T11:35:41Z

Summary

Refactored streaming handlers to improve code organization by separating chat completion streaming from responses streaming functionality. This PR introduces dedicated streaming handlers for different API types to enhance maintainability and support for the OpenAI Responses API streaming format.

Changes

Renamed handleOpenAIStreaming to handleOpenAIChatCompletionStreaming to better reflect its purpose
Updated all provider references to use the renamed function
Moved type definitions to their respective schema files for better organization
Added support for OpenAI Responses API streaming format with proper event types
Enhanced stream handling in HTTP transports to detect request type and format responses accordingly
Implemented proper event line handling for Responses API streaming

Type of change

Affected areas

How to test

Test streaming functionality with different providers to ensure it works correctly:

# Core/Transports
go version
go test ./...

# Test chat completion streaming
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-3.5-turbo", "messages":[{"role":"user","content":"Hello"}], "stream":true}'

# Test responses streaming
curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o", "messages":[{"role":"user","content":"Hello"}], "stream":true}'

Breaking changes

Yes
No

Security considerations

No security implications as this is a refactoring of existing functionality.

Checklist

I read docs/contributing/README.md and followed the guidelines
I added/updated tests where appropriate
I updated documentation where needed
I verified builds succeed (Go and UI)
I verified the CI pipeline passes locally if applicable

Pratham-Mishra04 · 2025-10-11T11:36:04Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

coderabbitai · 2025-10-11T11:36:17Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Introduces unified Responses streaming across major providers with event-based SSE.
- Adds Embeddings support and expanded schema types (responses streaming, embeddings, speech, transcriptions).
- Enables text and chat completion streaming with consistent metadata.
Improvements
- Standardizes streaming handlers and timeout handling; aligns streaming endpoints and formatting.
- Omits [DONE] marker for Responses streams; preserves per-chunk usage/metadata.
Tests
- Adds end-to-end streaming tests for Responses and Text Completion.
- Renames ChatCompletionStream flag to CompletionStream; adds TextCompletionStream.
Chores
- Removes a debug log; adjusts telemetry to skip final-token latency emission.

Walkthrough

Unified streaming handlers and new streaming entry points added across providers; Responses and Embedding public surfaces reintroduced for many providers; schemas extended for Responses streaming, embeddings, speech, transcriptions, and chat choice/log-prob types; transports detect Responses streams and adjust SSE framing; tests updated/added for text and responses streaming.

Changes

Cohort / File(s)	Change summary
Providers — unified streaming & responses/embedding surface `core/providers/*.go` (e.g., `core/providers/openai.go`, `core/providers/azure.go`, `core/providers/cerebras.go`, `core/providers/gemini.go`, `core/providers/groq.go`, `core/providers/mistral.go`, `core/providers/ollama.go`, `core/providers/openrouter.go`, `core/providers/parasail.go`, `core/providers/sgl.go`, `core/providers/vertex.go`)	Replaced generic streaming helper with chat-specific streaming handlers; added or re-exposed public `Responses`, `ResponsesStream`, and `Embedding` methods where applicable; updated streaming call sites, headers/auth, and minor timeout checks.
OpenAI provider — new streaming entry points & flows `core/providers/openai.go`	Added specialized streaming handlers (`handleOpenAIChatCompletionStreaming`, `handleOpenAIResponsesStreaming`, `handleOpenAITextCompletionStreaming`), new public streaming entry points (`ChatCompletionStream`, `ResponsesStream`, `Embedding`), and centralized per-chunk metadata and unified error parsing.
Schemas — responses streaming, embeddings, chat choices, transcribe, speech, logprobs `core/schemas/responses.go`, `core/schemas/embedding.go`, `core/schemas/chatcompletions.go`, `core/schemas/textcompletions.go`, `core/schemas/transcriptions.go`, `core/schemas/speech.go`, `core/schemas/bifrost.go`	Added rich `ResponsesStream` types/constants and streaming payload structs; added `BifrostEmbedding`/`BifrostEmbeddingResponse` with custom JSON marshal/unmarshal; added chat choice/stream delta and `TextCompletionLogProb`; added transcribe and speech types; updated `BifrostResponse` to embed `*ResponsesStreamResponse` and removed some legacy types.
Schema mux & provider helpers `core/schemas/mux.go`, `core/schemas/providers/gemini/chat.go`, `core/schemas/providers/mistral/embedding.go`, `core/schemas/providers/mistral/types.go`	Added `BifrostResponse.ToResponsesStream()` conversion (duplicate blocks present in diff); simplified nil/len checks in Gemini helper; removed Mistral-specific embedding helper/type.
Provider utils — post-hook composition for Responses stream `core/providers/utils.go`	Added `getResponsesChunkConverterCombinedPostHookRunner` and `responsesChunkConverter` to wrap post-hook runners and convert chunks/errors to Responses stream format.
Transports — SSE framing and request-type detection `transports/bifrost-http/handlers/inference.go`, `transports/bifrost-http/integrations/utils.go`	Detect streaming request type from the first non-nil chunk; emit `event: <Type>\n data: <json>\n` and omit final `[DONE]` for Responses streams; use standard `data:` framing and `[DONE]` for other streams; flush per-chunk writes.
Tests — flags rename, new streaming scenarios & wiring `tests/core-providers/`, `tests/core-providers/config/`, `tests/core-providers/scenarios/*.go`	Renamed `ChatCompletionStream` → `CompletionStream`, added `TextCompletionStream` flag, updated provider test configs; added streaming test scenarios and helpers `RunTextCompletionStreamTest` and `RunResponsesStreamTest`.
Telemetry & server minor edits `plugins/telemetry/main.go`, `transports/bifrost-http/handlers/server.go`	Stop emitting final-token inter-token latency metric; removed debug bootstrap print.
Timeout & error-check standardization `core/providers/*.go` (e.g., `anthropic.go`, `cohere.go`, `bedrock.go`, `gemini.go`)	Replaced `fasthttp.ErrTimeout` checks with `http.ErrHandlerTimeout`.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Provider as Provider (e.g., OpenAI)
  participant StreamHandler as Chat/Responses StreamHandler
  participant HTTP as HTTP Client
  participant Transport as Bifrost Transport

  Client->>Provider: ChatCompletionStream / ResponsesStream request
  Provider->>StreamHandler: call (http client, url, req, auth, extraHeaders, postHookRunner)
  StreamHandler->>HTTP: POST streaming endpoint (/v1/chat/completions or /v1/responses)
  HTTP-->>StreamHandler: SSE / chunked stream
  StreamHandler->>Transport: emit *schemas.BifrostStream chunks
  Transport->>Client: SSE frames per chunk
  alt First non-nil chunk indicates Responses stream
    note right of Transport #EEF6FF: emit "event: <Type>\n data: <json>\n" and omit final [DONE]
  else Other stream types
    note right of Transport #F6FFEE: emit "data: <json>\n" and final "data: [DONE]"
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I hopped through handlers, swapped streams with delight,
Responses wear event-tags and embeddings learned to write.
Chunks carry metadata, tests now chase the hum,
SSEs sing in order — hop, stream, and here I come.
🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.24% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly describes the primary feature addition—support for streaming the Responses API—and clearly reflects the main change in the PR without unnecessary detail or ambiguity.
Description Check	✅ Passed	The description closely follows the repository template by providing a clear summary, detailed change list, type of change, affected areas, testing instructions, breaking change declaration, security considerations, and a completion checklist; only the optional “Related issues” and “Screenshots/Recordings” sections are omitted, with screenshots not applicable for this non-UI refactor.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 10-11-feat_responses_streaming_added

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

core/providers/vertex.go (2)

585-601: Fix unused authHeader and avoid duplicate map literal (compile blocker)

authHeader is created but not used; the inline map literal is passed instead. This causes an unused variable compile error and duplicates logic. Pass authHeader to the helper.

Apply this diff:

-        authHeader := map[string]string{}
-        if key.Value != "" {
-            authHeader["Authorization"] = "Bearer " + key.Value
-        }
-        // Use shared OpenAI streaming logic
-        return handleOpenAIChatCompletionStreaming(
+        authHeader := map[string]string{}
+        if key.Value != "" {
+            authHeader["Authorization"] = "Bearer " + key.Value
+        }
+        // Use shared OpenAI streaming logic
+        return handleOpenAIChatCompletionStreaming(
             ctx,
             client,
             url,
             request,
-            map[string]string{"Authorization": "Bearer " + key.Value},
+            authHeader,
             provider.networkConfig.ExtraHeaders,
             schemas.Vertex,
             postHookRunner,
             provider.logger,
         )

225-235: Use net.Error Timeout() for http.Client timeouts (fasthttp.ErrTimeout won’t match)

fasthttp.ErrTimeout is not returned by net/http. You’ll misclassify timeouts as generic errors. Prefer context.DeadlineExceeded or net.Error.Timeout().

Apply these diffs to use a helper:

-        if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
+        if isTimeout(err) {
             return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
         }

Repeat the same replacement at all occurrences shown in this comment.

Add this helper and import:

@@
 import (
+    "net"
@@
 )
@@
+func isTimeout(err error) bool {
+    if err == nil {
+        return false
+    }
+    if errors.Is(err, context.DeadlineExceeded) {
+        return true
+    }
+    var ne net.Error
+    return errors.As(err, &ne) && ne.Timeout()
+}

Also applies to: 262-268, 417-421, 449-454

🧹 Nitpick comments (8)

transports/bifrost-http/handlers/inference.go (1)

889-907: Stream Responses: send only the event payload JSON, not the entire BifrostStream

For OpenAI Responses streaming, you compute eventType from ResponsesStreamResponse but still write data as the whole BifrostStream. Clients expect the event payload object. Marshal ResponsesStreamResponse (fallback to data if nil).

Apply this diff:

-            if requestType == schemas.ResponsesStreamRequest {
-                // For responses API, use OpenAI-compatible format with event line
-                eventType := ""
-                if response.BifrostResponse != nil && response.BifrostResponse.ResponsesStreamResponse != nil {
-                    eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type)
-                }
-
-                if eventType != "" {
-                    if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil {
-                        h.logger.Warn(fmt.Sprintf("Failed to write SSE event: %v", err))
-                        break
-                    }
-                }
-
-                if _, err := fmt.Fprintf(w, "data: %s\n\n", responseJSON); err != nil {
-                    h.logger.Warn(fmt.Sprintf("Failed to write SSE data: %v", err))
-                    break
-                }
-            } else {
+            if requestType == schemas.ResponsesStreamRequest {
+                // For responses API, use OpenAI-compatible format with event line + payload-only JSON
+                eventType := ""
+                if response.BifrostResponse != nil && response.BifrostResponse.ResponsesStreamResponse != nil {
+                    eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type)
+                }
+                if eventType != "" {
+                    if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil {
+                        h.logger.Warn(fmt.Sprintf("Failed to write SSE event: %v", err))
+                        break
+                    }
+                }
+                // Marshal only the stream payload
+                var payload interface{} = data
+                if response.BifrostResponse != nil && response.BifrostResponse.ResponsesStreamResponse != nil {
+                    payload = response.BifrostResponse.ResponsesStreamResponse
+                }
+                b, err := sonic.Marshal(payload)
+                if err != nil {
+                    h.logger.Warn(fmt.Sprintf("Failed to marshal streaming response: %v", err))
+                    break
+                }
+                if _, err := fmt.Fprintf(w, "data: %s\n\n", b); err != nil {
+                    h.logger.Warn(fmt.Sprintf("Failed to write SSE data: %v", err))
+                    break
+                }
+            } else {
                 // For other APIs, use standard format
                 if _, err := fmt.Fprintf(w, "data: %s\n\n", responseJSON); err != nil {
                     h.logger.Warn(fmt.Sprintf("Failed to write SSE data: %v", err))
                     break
                 }
             }

core/schemas/responses.go (2)

88-114: Error shape in ResponsesResponse may be too narrow

OpenAI Responses failures often include type/param; consider extending to capture them for parity with streamed error payloads.

You can augment the struct (outside this hunk) like:
type ResponsesResponseError struct {
  Type    *string `json:"type,omitempty"`
  Code    string  `json:"code"`
  Message string  `json:"message"`
  Param   *string `json:"param,omitempty"`
}
Please confirm the exact non-streaming error shape you expect from providers to avoid losing fields.

1686-1715: Consider a small helper for terminal/error detection

Downstream code must special-case event types to end streams. Add a method:

IsTerminal() bool for completed/failed/incomplete

ToBifrostError() for error-ish events mapping Message/Param/Code

This keeps providers/transports simpler and consistent.

core/providers/openai.go (5)

193-205: Stale comment refers to wrong function

Comment says handleOpenAIChatCompletionStreaming but the function is handleOpenAITextCompletionStreaming. Please update.

526-744: Improve scanner robustness for large SSE lines; clean up TODO

SSE lines (tool-call args, long deltas) can exceed bufio.Scanner’s ~64KB default. Increase the buffer to avoid truncated tokens.

Remove/resolve the TODO “FIX THIS WITHING RESPONSE STRUCT” (typo) or link to a tracking issue.

Apply near scanner creation:
-		scanner := bufio.NewScanner(resp.Body)
+		scanner := bufio.NewScanner(resp.Body)
+		// Allow larger SSE lines (up to 10MB)
+		scanner.Buffer(make([]byte, 0, 64*1024), 10*1024*1024)
Also update the TODO line or delete it.

746-763: Copy/paste nit in comment

“Check if chat completion is allowed” — this handler is for Responses; update the comment for clarity.

861-1038: Responses streaming: handle failed/incomplete as errors and mark stream end on [DONE]

Currently only type=="error" is converted to BifrostError. Treat response.failed (and likely response.incomplete) as terminal errors to propagate proper failure state.

If an upstream sends [DONE] without a response.completed event, the code breaks out without setting the stream end indicator. Set it to avoid dangling streams.

Increase scanner buffer like chat for long lines.

Apply these diffs:

Increase scanner buffer
-		scanner := bufio.NewScanner(resp.Body)
+		scanner := bufio.NewScanner(resp.Body)
+		// Allow larger SSE lines (up to 10MB)
+		scanner.Buffer(make([]byte, 0, 64*1024), 10*1024*1024)
Mark end-of-stream on [DONE]
-			if line == "data: [DONE]" {
-				break
-			}
+			if line == "data: [DONE]" {
+				// Some providers may not emit response.completed; ensure downstream sees end.
+				ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true)
+				return
+			}
Treat response.failed/incomplete as errors
-			if response.ResponsesStreamResponse.Type == schemas.ResponsesStreamResponseTypeCompleted {
+			switch response.ResponsesStreamResponse.Type {
+			case schemas.ResponsesStreamResponseTypeCompleted:
 				ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true)
 				processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger)
 				return
-			}
+			case schemas.ResponsesStreamResponseTypeFailed, schemas.ResponsesStreamResponseTypeIncomplete:
+				// Build a BifrostError from stream payload
+				be := &schemas.BifrostError{
+					IsBifrostError: false,
+					Type:           schemas.Ptr(string(response.ResponsesStreamResponse.Type)),
+					Error:          &schemas.ErrorField{},
+				}
+				if response.ResponsesStreamResponse.Message != nil {
+					be.Error.Message = *response.ResponsesStreamResponse.Message
+				} else if response.ResponsesStreamResponse.Response != nil && response.ResponsesStreamResponse.Response.Error != nil {
+					be.Error.Message = response.ResponsesStreamResponse.Response.Error.Message
+					be.Error.Code = response.ResponsesStreamResponse.Response.Error.Code
+				}
+				if response.ResponsesStreamResponse.Param != nil {
+					be.Error.Param = *response.ResponsesStreamResponse.Param
+				}
+				ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true)
+				processAndSendBifrostError(ctx, postHookRunner, be, responseChan, logger)
+				return
+			}
Verify if upstream always emits response.completed and never [DONE] for Responses API. If [DONE] can occur, the above change is necessary to reliably end streams.

861-895: Stream client timeout for long-lived SSE

The stream http.Client uses a global Timeout. For SSE, this can abort healthy long streams. Consider a dedicated stream client with Timeout=0 (no overall deadline) and per-connection timeouts at transport level.

Confirm your DefaultRequestTimeoutInSeconds is large enough for typical streams, or switch to a separate no-timeout stream client.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 38338a2 and d0f4414.

📒 Files selected for processing (20)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (8 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (8)

core/schemas/transcriptions.go (1)

core/schemas/bifrost.go (1)

LogProbs (339-344)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (97-97)

BifrostResponse (236-252)

BifrostError (392-401)

ResponsesStreamRequest (105-105)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (302-307)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (339-344)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (97-97)

BifrostResponse (236-252)

BifrostError (392-401)

ResponsesStreamRequest (105-105)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/providers/openai.go (6)

core/schemas/provider.go (3)

PostHookRunner (191-191)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (13)

BifrostChatRequest (187-193)

BifrostStream (381-384)

BifrostError (392-401)

OpenAI (53-53)

ModelProvider (50-50)

ErrorField (409-416)

LLMUsage (294-300)

BifrostContextKeyStreamEndIndicator (123-123)

BifrostResponse (236-252)

BifrostResponsesRequest (195-201)

ResponsesRequest (104-104)

BifrostEmbeddingRequest (203-209)

EmbeddingRequest (106-106)

core/utils.go (1)

Ptr (12-14)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

🔇 Additional comments (18)

core/providers/openrouter.go (1)

131-131: LGTM: Streaming handler updated for clarity.

The refactor from handleOpenAIStreaming to handleOpenAIChatCompletionStreaming improves code clarity by explicitly indicating this handler is for chat completion streaming. All arguments and behavior are preserved.

core/providers/cerebras.go (1)

138-138: LGTM: Consistent streaming handler refactor.

The update to handleOpenAIChatCompletionStreaming maintains consistency with the broader refactor across all OpenAI-compatible providers.

core/providers/gemini.go (1)

180-180: LGTM: Streaming handler aligned with chat completion semantics.

The handler rename correctly reflects that this path handles chat completion streaming specifically, improving code maintainability.

core/providers/azure.go (1)

371-371: LGTM: Azure streaming handler updated consistently.

The handler rename preserves Azure's specific deployment-based URL construction and authentication logic while aligning with the unified chat streaming pathway.

core/providers/groq.go (1)

179-179: LGTM: Groq streaming handler refactored consistently.

The update to handleOpenAIChatCompletionStreaming maintains consistency across all OpenAI-compatible providers without altering functionality.

core/providers/parasail.go (1)

121-121: LGTM: Parasail streaming handler updated.

The handler rename improves clarity by explicitly indicating this is for chat completion streaming.

core/providers/ollama.go (1)

151-151: LGTM: Ollama streaming handler refactored.

The update to handleOpenAIChatCompletionStreaming aligns Ollama with the unified chat streaming approach across providers.

core/providers/sgl.go (1)

153-153: LGTM: SGL streaming handler consistently updated.

The final provider update completes the consistent refactor across all OpenAI-compatible providers, improving code clarity and separation between chat completion and responses streaming pathways.

core/providers/mistral.go (1)

133-144: Rename to chat-specific streaming helper looks good

Call switch to handleOpenAIChatCompletionStreaming aligns with the refactor. No issues spotted.

core/providers/vertex.go (1)

590-600: Chat streaming path switch LGTM (once header fix above is applied)

Switch to handleOpenAIChatCompletionStreaming matches the refactor. With the authHeader fix, this is good.

Ensure handleOpenAIChatCompletionStreaming accepts the same header semantics as prior handleOpenAIStreaming for Vertex.

transports/bifrost-http/handlers/inference.go (2)

867-873: Request type detection on first chunk looks correct

Deriving requestType from ExtraFields on first non-nil item is fine and keeps the stream formatting consistent.

922-929: DONE sentinel gating matches Responses API behavior

[DONE] only for non-responses streams is correct per OpenAI Responses spec.

transports/bifrost-http/integrations/utils.go (3)

456-472: Request type capture for stream-format branching LGTM

Capturing RequestType once and branching format accordingly is clean and avoids per-chunk checks.

568-609: Responses streaming event/data formatting is correct

Emitting event: and then data: for ResponsesStreamRequest matches OpenAI Responses streaming. Fallback to standard SSE for others is good.

Confirm DeriveOpenAIStreamFromBifrostResponse returns the event payload object (not a pre-wrapped SSE string) for Responses streams.

619-626: Conditional [DONE] emission is correct

No [DONE] for Responses API; emit for others. Matches expected client behavior.

core/schemas/bifrost.go (1)

250-252: Embedding ResponsesStreamResponse pointer is appropriate

Keeps stream payload accessible without bloating non-stream responses. Watch for field-name collisions across embeddings (none apparent).

Please ensure any JSON schema/contract tests cover presence/absence of these fields across streaming vs non-streaming responses.

core/providers/openai.go (2)

500-521: ChatCompletionStream entrypoint: LGTM

New entrypoint and routing to shared chat streaming handler looks consistent with the Provider interface.

Confirm all other providers in this stack updated to call handleOpenAIChatCompletionStreaming (as per PR summary).

1040-1136: Embedding request path: LGTM

Consistent with other non-streaming handlers; proper extra fields set.

core/schemas/embedding.go

core/schemas/responses.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

core/schemas/responses.go (1)

1623-1684: Fix streaming event type constant values per previous review.

Several streaming event constant values remain incorrect and do not match the OpenAI Responses API specification. This is a duplicate of the previous review comment:

Line 1635: ResponsesStreamResponseTypeOutputTextAdded value should be "response.output_text.delta" (not .added)
Line 1641: ResponsesStreamResponseTypeFunctionCallArgumentsAdded value should be "response.function_call_arguments.delta" (not .added)
Line 1673: ResponsesStreamResponseTypeCodeInterpreterCallCodeDelta value should be "response.code_interpreter_call.code.delta" (missing dot between call and code)
Line 1674: ResponsesStreamResponseTypeCodeInterpreterCallCodeDone value should be "response.code_interpreter_call.code.done" (missing dot between call and code)
Line 1676: ResponsesStreamResponseTypeOutputTextAnnotationAdded value should be "response.output_text.annotation.added" (missing dot before annotation)
Missing constants: "response.cancelled" (terminal event) and "done" (final SSE sentinel)

Based on learnings

Apply these corrections:

-	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added"
+	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta"
 	ResponsesStreamResponseTypeOutputTextDone  ResponsesStreamResponseType = "response.output_text.done"

 	ResponsesStreamResponseTypeRefusalDelta ResponsesStreamResponseType = "response.refusal.delta"
 	ResponsesStreamResponseTypeRefusalDone  ResponsesStreamResponseType = "response.refusal.done"

-	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.added"
+	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.delta"
 	ResponsesStreamResponseTypeFunctionCallArgumentsDone      ResponsesStreamResponseType = "response.function_call_arguments.done"

 	ResponsesStreamResponseTypeCodeInterpreterCallInProgress   ResponsesStreamResponseType = "response.code_interpreter_call.in_progress"
 	ResponsesStreamResponseTypeCodeInterpreterCallInterpreting ResponsesStreamResponseType = "response.code_interpreter_call.interpreting"
 	ResponsesStreamResponseTypeCodeInterpreterCallCompleted    ResponsesStreamResponseType = "response.code_interpreter_call.completed"
-	ResponsesStreamResponseTypeCodeInterpreterCallCodeDelta    ResponsesStreamResponseType = "response.code_interpreter_call_code.delta"
-	ResponsesStreamResponseTypeCodeInterpreterCallCodeDone     ResponsesStreamResponseType = "response.code_interpreter_call_code.done"
+	ResponsesStreamResponseTypeCodeInterpreterCallCodeDelta    ResponsesStreamResponseType = "response.code_interpreter_call.code.delta"
+	ResponsesStreamResponseTypeCodeInterpreterCallCodeDone     ResponsesStreamResponseType = "response.code_interpreter_call.code.done"

-	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added"
+	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"

Add missing terminal events:

 	ResponsesStreamResponseTypeCustomToolCallInputDelta ResponsesStreamResponseType = "response.custom_tool_call_input.delta"
 	ResponsesStreamResponseTypeCustomToolCallInputDone  ResponsesStreamResponseType = "response.custom_tool_call_input.done"

+	ResponsesStreamResponseTypeCancelled ResponsesStreamResponseType = "response.cancelled"
+	ResponsesStreamResponseTypeDone      ResponsesStreamResponseType = "done"
+
 	ResponsesStreamResponseTypeError ResponsesStreamResponseType = "error"
 )

🧹 Nitpick comments (3)

core/schemas/transcriptions.go (1)

60-66: Consider omitting empty text in streams

Without omitempty, streaming deltas that only deliver delta events will still serialize "text": "", which deviates from the OpenAI payloads we mirror. Making text optional will trim that noise while retaining the field whenever it’s populated.
core/providers/openai.go (2)
567-582: Consider extracting duplicated error handling logic.

The error handling for http.NewRequestWithContext (lines 567-582) and client.Do (lines 594-609) contains identical logic for context cancellation, timeout, and generic errors. This pattern appears multiple times across different streaming handlers in this file.

Extract the error handling into a helper function:
func handleStreamingRequestError(err error, providerName schemas.ModelProvider) *schemas.BifrostError {
	if errors.Is(err, context.Canceled) {
		return &schemas.BifrostError{
			IsBifrostError: false,
			Error: &schemas.ErrorField{
				Type:    schemas.Ptr(schemas.RequestCancelled),
				Message: schemas.ErrRequestCancelled,
				Error:   err,
			},
		}
	}
	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
		return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
	}
	return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
}
Then use it:
 	req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody))
 	if err != nil {
-		if errors.Is(err, context.Canceled) {
-			return nil, &schemas.BifrostError{
-				IsBifrostError: false,
-				Error: &schemas.ErrorField{
-					Type:    schemas.Ptr(schemas.RequestCancelled),
-					Message: schemas.ErrRequestCancelled,
-					Error:   err,
-				},
-			}
-		}
-		if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
-			return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
-		}
-		return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
+		return nil, handleStreamingRequestError(err, providerName)
 	}
Also applies to: 594-609

970-991: Event line filtering logic is sound but could be simplified.

The logic to skip "event:" lines when parsing SSE data is correct for the Responses API streaming format. However, the nested conditions could be more readable.

Consider simplifying:
 		var jsonData string

 		// Parse SSE data
 		if after, ok := strings.CutPrefix(line, "data: "); ok {
 			jsonData = after
-		} else if !strings.HasPrefix(line, "event:") {
-			// Handle raw JSON errors (without "data: " prefix) but skip event lines
-			jsonData = line
-		} else {
-			// This is an event line, skip it
+		} else if strings.HasPrefix(line, "event:") {
+			// Skip event lines
 			continue
+		} else {
+			// Handle raw JSON errors (without "data: " prefix)
+			jsonData = line
 		}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0f4414 and 2fe27b0.

📒 Files selected for processing (20)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (8 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (7)

core/providers/sgl.go
core/providers/vertex.go
core/schemas/textcompletions.go
core/providers/azure.go
core/providers/groq.go
core/schemas/embedding.go
core/providers/gemini.go

🧰 Additional context used

🧬 Code graph analysis (8)

core/schemas/transcriptions.go (1)

core/schemas/bifrost.go (1)

LogProbs (339-344)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (339-344)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (302-307)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (97-97)

BifrostResponse (236-252)

BifrostError (407-416)

ResponsesStreamRequest (105-105)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (97-97)

BifrostResponse (236-252)

BifrostError (407-416)

ResponsesStreamRequest (105-105)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/responses.go (2)

plugins/mocker/main.go (1)

Response (97-103)

ui/lib/types/logs.ts (1)

Annotation (236-239)

core/providers/openai.go (4)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (14)

BifrostChatRequest (187-193)

BifrostStream (381-384)

BifrostError (407-416)

OpenAI (53-53)

ModelProvider (50-50)

ErrorField (424-431)

RequestCancelled (376-376)

LLMUsage (294-300)

BifrostContextKeyStreamEndIndicator (123-123)

BifrostResponse (236-252)

RequestType (97-97)

BifrostResponsesRequest (195-201)

ResponsesRequest (104-104)

ResponsesStreamRequest (105-105)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

🔇 Additional comments (10)

core/providers/mistral.go (1)

133-143: Streaming handler swap looks good

The migration to handleOpenAIChatCompletionStreaming keeps the wiring identical while opting into the new chat-specific path. ✅

core/schemas/bifrost.go (1)

251-251: LGTM!

The embedded *ResponsesStreamResponse pointer enables BifrostResponse to carry Responses API streaming data alongside existing fields. The pointer semantics ensure backward compatibility.

core/schemas/responses.go (2)

88-88: LGTM!

The Error field addition allows ResponsesResponse to capture error states directly in the response structure, which is appropriate for the Responses API.

1686-1715: LGTM with dependency on event constant fixes.

The ResponsesStreamResponse struct provides appropriate fields to carry granular streaming event payloads. The optional pointer fields support the union-style type pattern needed for different event types. However, ensure the event type constants (lines 1623-1684) are corrected before this struct is used in production.

core/providers/openai.go (6)

508-526: LGTM!

The ChatCompletionStream method properly delegates to the new shared handleOpenAIChatCompletionStreaming handler. The operation check and parameter passing are correct.

757-850: LGTM!

The Responses method and handleOpenAIResponsesRequest function correctly handle non-streaming Responses API requests. The request type is properly set to schemas.ResponsesRequest and the endpoint is correctly formed.

852-870: LGTM!

The ResponsesStream method properly delegates to handleOpenAIResponsesStreaming. The operation check for ResponsesStreamRequest is correct.

1054-1072: LGTM!

The Embedding method properly delegates to the shared handleOpenAIEmbeddingRequest handler with correct parameters.

1076-1147: LGTM!

The handleOpenAIEmbeddingRequest function correctly handles non-streaming embedding requests. Request construction, error handling, and response processing are all implemented properly.

1033-1046: Ensure final response is emitted on stream end

The handler sends a final response when Type == ResponsesStreamResponseTypeCompleted, but if the scanner finishes without that event (no scanner.Err()), no response is sent. Confirm whether the OpenAI Responses API always emits a completed event before closing; if it may close without one, add an else block after if err := scanner.Err() to emit a final chunk (mirroring chat completion streaming).

core/providers/openai.go

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

core/providers/vertex.go (1)
585-599: Do not override Vertex’s OAuth header on streaming calls.

authHeader is populated to skip the Authorization header when key.Value is empty, yet the call still hardcodes map[string]string{"Authorization": "Bearer " + key.Value}. With OAuth-based keys (key.Value == "") this forces Authorization: Bearer and prevents the Google client from attaching its own token, resulting in 401s on streaming. Pass the authHeader map you just built instead.
-	return handleOpenAIChatCompletionStreaming(
+	return handleOpenAIChatCompletionStreaming(
 		ctx,
 		client,
 		url,
 		request,
-		map[string]string{"Authorization": "Bearer " + key.Value},
+		authHeader,
 		provider.networkConfig.ExtraHeaders,
 		schemas.Vertex,
 		postHookRunner,
 		provider.logger,
 	)

♻️ Duplicate comments (2)

core/schemas/embedding.go (1)
122-145: Reset embedding union before decoding

UnmarshalJSON never clears the other shapes before trying the next decode, so reusing the same BifrostEmbeddingResponse instance can leak the previous branch (e.g., prior string sticks around when the next payload is a vector). Nil out every variant up front so each decode starts clean.
 func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error {
+	be.EmbeddingStr = nil
+	be.EmbeddingArray = nil
+	be.Embedding2DArray = nil
+
 	// First, try to unmarshal as a direct string
 	var stringContent string
 	if err := sonic.Unmarshal(data, &stringContent); err == nil {
 		be.EmbeddingStr = &stringContent
 		return nil
core/schemas/responses.go (1)
1635-1677: Align Responses stream event names with OpenAI spec

Several event literals don’t match the Responses API stream (response.output_text.delta, response.function_call_arguments.delta, response.output_text.annotation.added). With the current strings we’ll never classify those chunks correctly.
 	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added"
+	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta"
@@
-	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.added"
+	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.delta"
@@
-	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added"
+	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"

🧹 Nitpick comments (3)

core/providers/openai.go (3)
568-582: Deduplicate error handling logic.

The error handling blocks after http.NewRequestWithContext (lines 568-582) and client.Do (lines 594-609) are identical. Consider extracting this into a helper function to reduce duplication and improve maintainability.

Example refactor:
func handleHTTPError(err error, providerName schemas.ModelProvider) *schemas.BifrostError {
	if errors.Is(err, context.Canceled) {
		return &schemas.BifrostError{
			IsBifrostError: false,
			Error: &schemas.ErrorField{
				Type:    schemas.Ptr(schemas.RequestCancelled),
				Message: schemas.ErrRequestCancelled,
				Error:   err,
			},
		}
	}
	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
		return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
	}
	return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
}
Then use it as:
if err != nil {
	return nil, handleHTTPError(err, providerName)
}
Also applies to: 594-609

973-994: Simplify event line filtering logic.

The event line filtering has redundant checks:

Line 974 skips all lines starting with "event:"

Line 988's condition !strings.HasPrefix(line, "event:") and the else block at lines 991-994 are unreachable because event lines are already filtered at line 974

Apply this diff to simplify the logic:
 		// Skip empty lines, comments, and event lines
 		if line == "" || strings.HasPrefix(line, ":") || strings.HasPrefix(line, "event:") {
 			continue
 		}

 		// Check for end of stream
 		if line == "data: [DONE]" {
 			break
 		}

 		var jsonData string

 		// Parse SSE data
 		if after, ok := strings.CutPrefix(line, "data: "); ok {
 			jsonData = after
-		} else if !strings.HasPrefix(line, "event:") {
-			// Handle raw JSON errors (without "data: " prefix) but skip event lines
+		} else {
+			// Handle raw JSON errors (without "data: " prefix)
 			jsonData = line
-		} else {
-			// This is an event line, skip it
-			continue
 		}
872-1056: Consider adding usage tracking for consistency.

The handleOpenAIResponsesStreaming function does not accumulate usage information like handleOpenAIChatCompletionStreaming does (lines 690-709). If the Responses API returns usage data in stream chunks, consider adding similar usage tracking logic for consistency and completeness.

Review the Responses API specification to determine if usage information should be collected and returned in the final response. If so, implement similar logic to the chat completion streaming handler.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe27b0 and cc59aff.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (8 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

transports/bifrost-http/handlers/server.go
core/schemas/providers/mistral/types.go
core/schemas/providers/mistral/embedding.go
plugins/telemetry/main.go

✅ Files skipped from review due to trivial changes (1)

core/schemas/providers/gemini/chat.go

🚧 Files skipped from review as they are similar to previous changes (7)

core/providers/cerebras.go
core/providers/azure.go
core/providers/openrouter.go
core/providers/mistral.go
core/providers/parasail.go
core/providers/groq.go
core/schemas/speech.go

🧰 Additional context used

🧬 Code graph analysis (7)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/transcriptions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/responses.go (2)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/providers/openai.go (5)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (16)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

🔇 Additional comments (5)

core/providers/openai.go (5)

505-526: LGTM! Clean refactor to shared streaming handler.

The refactored ChatCompletionStream method properly delegates to the shared handleOpenAIChatCompletionStreaming function with correct parameters and authorization header construction.

757-774: LGTM! Well-structured Responses API method.

The Responses method follows the established pattern and properly delegates to the shared handler with correct parameters.

776-850: LGTM! Consistent implementation.

The handleOpenAIResponsesRequest function follows the established pattern for non-streaming requests with proper error handling and response construction.

852-870: LGTM! Properly delegates to streaming handler.

The ResponsesStream method correctly delegates to the shared streaming handler with appropriate parameters.

1058-1154: LGTM! Well-implemented embedding support.

The Embedding method and handleOpenAIEmbeddingRequest function follow the established patterns for non-streaming requests with proper error handling, authorization, and response construction.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

core/schemas/chatcompletions.go (1)
351-363: Update comment to reflect all three embedded pointer types.

The comment at lines 353-354 states "IMPORTANT: Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice should be non-nil at a time" but the struct embeds three pointer types: BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice, and BifrostStreamResponseChoice. The comment should mention all three to accurately document the constraint.

Apply this diff to update the comment:
 // BifrostChatResponseChoice represents a choice in the completion result.
 // This struct can represent either a streaming or non-streaming response choice.
-// IMPORTANT: Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice
-// should be non-nil at a time.
+// IMPORTANT: Only one of BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice,
+// or BifrostStreamResponseChoice should be non-nil at a time.
 type BifrostChatResponseChoice struct {

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc59aff and 4a84b84.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (8 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

core/schemas/providers/mistral/types.go
core/schemas/providers/mistral/embedding.go
plugins/telemetry/main.go
transports/bifrost-http/handlers/server.go

🚧 Files skipped from review as they are similar to previous changes (10)

core/providers/cerebras.go
core/providers/openrouter.go
core/providers/azure.go
core/providers/gemini.go
core/providers/ollama.go
core/schemas/textcompletions.go
core/schemas/speech.go
core/providers/sgl.go
core/providers/vertex.go
core/providers/parasail.go

🧰 Additional context used

🧬 Code graph analysis (6)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/providers/openai.go (4)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (16)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check

🔇 Additional comments (7)

core/schemas/transcriptions.go (1)

56-81: LGTM!

The transcription types follow a clean embedded pointer pattern that clearly separates streaming and non-streaming fields. The structure is consistent with the broader changes in this PR.

core/schemas/embedding.go (1)

97-145: LGTM!

The embedding types and custom marshaling logic are well-implemented. The union-type pattern with exclusive field usage is clear and consistent with similar patterns elsewhere in the codebase.

core/schemas/providers/gemini/chat.go (1)

638-641: LGTM!

Removing the explicit nil check is correct since len(nil) returns 0 for slices in Go. This simplification makes the code more idiomatic while preserving behavior.

core/schemas/responses.go (1)

1618-1715: LGTM!

The streaming types comprehensively model the OpenAI Responses API streaming events. The ResponsesStreamResponse struct with its optional pointer fields appropriately handles the union-type nature of different streaming event payloads.

core/providers/mistral.go (1)

133-143: LGTM!

The rename to handleOpenAIChatCompletionStreaming clarifies that this is specifically for chat completion streaming, distinguishing it from the new Responses API streaming path. This aligns with the broader PR refactoring.

core/providers/groq.go (1)

179-189: LGTM!

Consistent with the Mistral provider changes, this rename clarifies the streaming handler's purpose and aligns with the separation of chat completion and Responses API streaming paths.

transports/bifrost-http/handlers/inference.go (1)

828-895: LGTM!

The dynamic SSE format detection elegantly handles both standard and Responses API streaming:

requestType is determined from the first non-nil chunk's metadata, ensuring consistent formatting throughout the stream

For Responses API (ResponsesStreamRequest), optional event: lines are emitted before data: lines, matching OpenAI's SSE format

The conditional [DONE] marker correctly omits the marker for Responses API streams (which signal completion by closing the stream) while preserving it for other streaming APIs

The implementation correctly handles the format differences between streaming API types.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

transports/bifrost-http/handlers/inference.go (1)

828-846: Populate RequestType on error chunks
BifrostError.ExtraFields.RequestType isn’t set in any streaming error path, so if the first chunk is an error the handler’s requestType remains empty and SSE framing/[DONE] emission will fail. Add ExtraFields.RequestType when constructing BifrostError in streaming flows (e.g. in core/bifrost.go and provider error wrappers).

🧹 Nitpick comments (6)

transports/bifrost-http/handlers/inference.go (1)

856-880: LGTM with minor observation.

The conditional SSE formatting correctly distinguishes between Responses API streaming (with event lines) and standard streaming (data-only). The logic appropriately handles both formats.

Note: This logic is duplicated in transports/bifrost-http/integrations/utils.go. Consider extracting to a shared helper function to maintain consistency and reduce maintenance burden.

transports/bifrost-http/integrations/utils.go (2)

456-471: Verify request type population across all streaming paths.

The runtime detection of requestType from the first chunk's ExtraFields.RequestType is identical to the logic in inference.go. Ensure all streaming code paths consistently populate this field.

Note: This detection logic is duplicated between handlers/inference.go and integrations/utils.go. Consider extracting to a shared utility function.

568-609: LGTM with refactoring suggestion.

The conditional SSE formatting correctly handles both Responses API (with event lines) and standard streaming (data-only) formats based on the detected request type.

The logic here closely mirrors handlers/inference.go lines 856-880. Consider extracting the common SSE formatting logic into a shared helper function to reduce duplication and ensure consistency.
core/schemas/chatcompletions.go (1)
351-363: Consider adding custom JSON marshaling to enforce the one-of constraint.

The comment states that only one of the three embedded pointer structs should be non-nil at a time, but there's no runtime enforcement of this invariant. Other similar types in the codebase (e.g., ChatMessageContent and ChatToolChoice in this same file) implement custom MarshalJSON and UnmarshalJSON methods to validate and enforce one-of semantics.

Without validation, callers could accidentally set multiple fields, leading to ambiguous JSON output when the embedded struct fields are flattened.

Consider implementing custom JSON marshaling similar to the pattern used for ChatToolChoice (lines 128-165):
+// MarshalJSON implements custom JSON marshalling for BifrostChatResponseChoice.
+func (bc BifrostChatResponseChoice) MarshalJSON() ([]byte, error) {
+	// Count how many embedded structs are set
+	set := 0
+	if bc.BifrostTextCompletionResponseChoice != nil {
+		set++
+	}
+	if bc.BifrostNonStreamResponseChoice != nil {
+		set++
+	}
+	if bc.BifrostStreamResponseChoice != nil {
+		set++
+	}
+	if set > 1 {
+		return nil, fmt.Errorf("multiple embedded response types are set; only one should be non-nil")
+	}
+	// Proceed with default marshaling if validation passes
+	type Alias BifrostChatResponseChoice
+	return sonic.Marshal((*Alias)(&bc))
+}
core/schemas/transcriptions.go (1)
56-66: Consider adding custom JSON marshaling to enforce mutual exclusivity.

BifrostTranscribe embeds two optional pointer structs (*BifrostTranscribeNonStreamResponse and *BifrostTranscribeStreamResponse). Similar to the pattern in core/schemas/chatcompletions.go (see BifrostChatResponseChoice), without custom marshaling, callers could set both fields, leading to ambiguous JSON output when the embedded struct fields are flattened.

Consider implementing validation to ensure only one embedded struct is non-nil at a time:
+// MarshalJSON implements custom JSON marshalling for BifrostTranscribe.
+func (bt BifrostTranscribe) MarshalJSON() ([]byte, error) {
+	if bt.BifrostTranscribeNonStreamResponse != nil && bt.BifrostTranscribeStreamResponse != nil {
+		return nil, fmt.Errorf("both non-stream and stream response types are set; only one should be non-nil")
+	}
+	type Alias BifrostTranscribe
+	return sonic.Marshal((*Alias)(&bt))
+}
core/providers/openai.go (1)
593-609: Remove redundant context cancellation check after client.Do.

The context cancellation check on lines 595-604 is redundant. If the context is canceled after the request is created but before client.Do is called, the HTTP client will return an error (wrapped context.Canceled) immediately. The error handling on line 595 already checks for context.Canceled, so the explicit check on lines 595-604 is unnecessary.

The same pattern appears in several streaming functions (handleOpenAIResponsesStreaming, SpeechStream, TranscriptionStream), suggesting this refactor could be applied consistently.

Apply this diff to simplify the error handling:
 	// Make the request
 	resp, err := client.Do(req)
 	if err != nil {
-		if errors.Is(err, context.Canceled) {
-			return nil, &schemas.BifrostError{
-				IsBifrostError: false,
-				Error: &schemas.ErrorField{
-					Type:    schemas.Ptr(schemas.RequestCancelled),
-					Message: schemas.ErrRequestCancelled,
-					Error:   err,
-				},
-			}
-		}
-		if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
-			return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
-		}
-		return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
+		return nil, handleHTTPClientError(err, providerName)
 	}
And define a helper function to reduce duplication:
func handleHTTPClientError(err error, providerName schemas.ModelProvider) *schemas.BifrostError {
	if errors.Is(err, context.Canceled) {
		return &schemas.BifrostError{
			IsBifrostError: false,
			Error: &schemas.ErrorField{
				Type:    schemas.Ptr(schemas.RequestCancelled),
				Message: schemas.ErrRequestCancelled,
				Error:   err,
			},
		}
	}
	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
		return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
	}
	return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc59aff and 4a84b84.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (8 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

core/schemas/providers/mistral/types.go
plugins/telemetry/main.go
core/schemas/providers/mistral/embedding.go
transports/bifrost-http/handlers/server.go

🚧 Files skipped from review as they are similar to previous changes (7)

core/providers/sgl.go
core/providers/vertex.go
core/schemas/speech.go
core/providers/groq.go
core/providers/cerebras.go
core/providers/gemini.go
core/schemas/providers/gemini/chat.go

🧰 Additional context used

🧬 Code graph analysis (7)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/transcriptions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/providers/openai.go (5)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (16)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/utils.go (1)

Ptr (12-14)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check

🔇 Additional comments (16)

core/providers/parasail.go (1)

121-121: LGTM!

The function rename from handleOpenAIStreaming to handleOpenAIChatCompletionStreaming aligns with the PR's objective to separate chat completion streaming from Responses API streaming. The change is consistent with similar updates across other providers.

core/providers/openrouter.go (1)

131-131: LGTM!

The function rename is consistent with the broader refactoring across providers to use specialized chat completion streaming handlers.

core/providers/ollama.go (1)

151-151: LGTM!

Consistent with the streaming handler refactoring across all providers.

core/providers/azure.go (1)

371-371: LGTM!

The function rename maintains consistency with the chat-centric streaming refactoring.

core/providers/mistral.go (1)

133-133: LGTM!

Completes the consistent refactoring of streaming handlers across all providers.

transports/bifrost-http/handlers/inference.go (1)

889-895: LGTM!

Correctly omits the [DONE] marker for OpenAI Responses API, which ends when the stream closes rather than with an explicit marker.

transports/bifrost-http/integrations/utils.go (1)

619-625: LGTM!

Correctly omits the [DONE] marker for Responses API streaming, which terminates on stream closure per OpenAI's specification.
core/schemas/textcompletions.go (1)
72-78: Fix JSON tag formatting error.

Line 76 has an extra space in the JSON tag: json:" Tokens" should be json:"tokens" to match the Go naming convention and avoid serialization issues.

Apply this diff:
-	Tokens        []string             `json:" Tokens"`
+	Tokens        []string             `json:"tokens"`
Likely an incorrect or invalid review comment.
core/schemas/embedding.go (1)

97-145: LGTM!

The custom JSON marshaling logic correctly handles the three possible embedding formats (string, 1D array, 2D array). While the UnmarshalJSON method doesn't reset fields before decoding (as noted in a previous review), the author confirmed that instances are not reused in their usage patterns, making the implementation safe for the current use case.

core/schemas/bifrost.go (1)

253-253: LGTM!

The addition of *ResponsesStreamResponse as an embedded pointer field is consistent with the existing pattern used in BifrostResponse (e.g., *ResponsesResponse on line 252). This enables support for the new Responses API streaming format.

core/schemas/responses.go (2)

88-88: LGTM!

The addition of the Error field to ResponsesResponse enables proper error handling for the Responses API. The field is optional (pointer type) and uses the well-defined ResponsesResponseError type.

1618-1715: LGTM!

The streaming types introduced here provide comprehensive coverage of the OpenAI Responses API streaming events. The ResponsesStreamResponseType constants define a wide range of event types, and the ResponsesStreamResponse struct uses optional fields to accommodate the various event payload structures.

Based on the previous review discussion, the event type string literals have been verified against the official OpenAI documentation.

core/providers/openai.go (4)

757-870: LGTM!

The Responses and ResponsesStream methods follow consistent patterns with the existing chat completion implementations. The URL endpoints (/v1/responses) are correct for the OpenAI Responses API, and error handling is properly implemented.

1058-1154: LGTM!

The Embedding method and its handler handleOpenAIEmbeddingRequest follow the established request/response pattern used throughout the provider implementations. The endpoint (/v1/embeddings) is correct, and error handling is properly implemented.

1008-1029: LGTM!

The error handling for ResponsesStreamResponseTypeError properly constructs a BifrostError from the stream response fields and sets the stream end indicator before propagating the error through the channel.

973-994: SSE event lines can be safely skipped – the type field is mandatory in the JSON payload (ResponsesStreamResponse.Type is non-omitempty), so event lines need not be parsed separately.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

core/schemas/responses.go (1)
1618-1684: Fix event type string values to match OpenAI Responses API specification.

The event type constants still contain incorrect values that were flagged in a previous review. These must be corrected to ensure compatibility with the OpenAI Responses API streaming format:

Line 1635: Change "response.output_text.added" to "response.output_text.delta"

Line 1641: Change "response.function_call_arguments.added" to "response.function_call_arguments.delta"

Line 1676: Change "response.output_text_annotation.added" to "response.output_text.annotation.added" (add dot before annotation)

Apply this diff to fix the event type values:
-	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added"
+	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta"
 	ResponsesStreamResponseTypeOutputTextDone  ResponsesStreamResponseType = "response.output_text.done"
 
 	ResponsesStreamResponseTypeRefusalDelta ResponsesStreamResponseType = "response.refusal.delta"
 	ResponsesStreamResponseTypeRefusalDone  ResponsesStreamResponseType = "response.refusal.done"
 
-	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.added"
+	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.delta"
 	ResponsesStreamResponseTypeFunctionCallArgumentsDone      ResponsesStreamResponseType = "response.function_call_arguments.done"
-	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added"
+	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"
These corrections are based on the official OpenAI Responses API streaming documentation.
core/providers/openai.go (2)
1026-1043: Preserve Responses API error events instead of downgrading to generic errors.

When encountering ResponsesStreamResponseTypeError, the code converts it to a generic BifrostError via processAndSendBifrostError. This breaks compatibility with the OpenAI Responses API streaming specification, where clients expect structured error events with the event: error line and the full error schema (type/code/message/param fields).

Impact: Clients consuming the Responses API will not receive properly formatted error events, breaking their error handling logic.

Solution: Forward the error as a normal responses-stream event (preserving all fields from ResponsesStreamResponse) so the transport layer can emit the correct event: error line with the structured payload. Only set the stream-end indicator and terminate after sending that structured event.

Consider this approach:
 if response.ResponsesStreamResponse.Type == schemas.ResponsesStreamResponseTypeError {
-	bifrostErr := &schemas.BifrostError{
-		Type:           schemas.Ptr(string(schemas.ResponsesStreamResponseTypeError)),
-		IsBifrostError: false,
-		Error:          &schemas.ErrorField{},
-	}
-
-	if response.ResponsesStreamResponse.Message != nil {
-		bifrostErr.Error.Message = *response.ResponsesStreamResponse.Message
-	}
-	if response.ResponsesStreamResponse.Param != nil {
-		bifrostErr.Error.Param = *response.ResponsesStreamResponse.Param
-	}
-
+	// Send error as a structured responses-stream event, then terminate
+	response.ExtraFields.RequestType = schemas.ResponsesStreamRequest
+	response.ExtraFields.Provider = providerName
+	response.ExtraFields.ModelRequested = request.Model
+	response.ExtraFields.ChunkIndex = response.ResponsesStreamResponse.SequenceNumber
+	response.ExtraFields.Latency = time.Since(startTime).Milliseconds()
 	ctx = context.WithValue(ctx, schemas.BifrostContextKeyStreamEndIndicator, true)
-	processAndSendBifrostError(ctx, postHookRunner, bifrostErr, responseChan, logger)
+	processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger)
 	return
 }
This preserves the OpenAI error event structure for proper client handling.

671-671: Fix TODO comment typo and address the underlying issue.

The TODO comment contains a typo: "WITHING" should be "WITHIN". More importantly, this comment was marked as addressed in commit cc59aff in a previous review, but the typo is still present. This suggests the underlying issue about error response handling within the response struct may not have been fully resolved either.

Apply this diff to fix the typo:
-			//TODO: FIX THIS WITHING RESPONSE STRUCT
+			//TODO: FIX THIS WITHIN RESPONSE STRUCT
Additionally, consider addressing or clarifying what needs to be fixed regarding error response handling, or remove the TODO if the issue has been resolved.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a84b84 and bb24cd9.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (7 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

plugins/telemetry/main.go
core/schemas/providers/mistral/types.go
transports/bifrost-http/handlers/server.go
core/schemas/providers/mistral/embedding.go

🚧 Files skipped from review as they are similar to previous changes (10)

core/schemas/providers/gemini/chat.go
core/providers/parasail.go
core/providers/ollama.go
core/providers/gemini.go
core/providers/openrouter.go
transports/bifrost-http/integrations/utils.go
core/providers/vertex.go
core/providers/cerebras.go
core/providers/sgl.go
core/schemas/embedding.go

🧰 Additional context used

🧬 Code graph analysis (7)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/transcriptions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/providers/openai.go (5)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (14)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check

🔇 Additional comments (14)

core/schemas/textcompletions.go (1)

72-78: LGTM!

The TextCompletionLogProb type is well-structured and follows Go conventions. The JSON tags are properly defined, and the type aligns with similar log probability structures in the codebase.

core/schemas/speech.go (1)

86-95: LGTM!

The BifrostSpeech and BifrostSpeechStreamResponse types follow the same embedding pattern used throughout this PR for streaming support. The structure allows for conditional streaming fields while maintaining non-streaming fields.

core/schemas/chatcompletions.go (1)

389-402: LGTM!

The LogProb and ContentLogProb types are well-structured and consistent with log probability representations used elsewhere in the codebase.

core/schemas/transcriptions.go (1)

56-81: LGTM!

The BifrostTranscribe structure follows the consistent embedding pattern used throughout this PR. The separation of common fields, non-streaming fields, and streaming fields is clear and maintainable.

core/providers/groq.go (1)

179-179: LGTM!

The switch from handleOpenAIStreaming to handleOpenAIChatCompletionStreaming aligns with the PR's refactoring to separate chat completion streaming from Responses API streaming.

core/providers/azure.go (1)

372-372: LGTM!

Consistent with the broader refactoring to distinguish chat completion streaming from other streaming types.

core/providers/mistral.go (1)

133-133: LGTM!

Consistent with the streaming handler refactoring across providers.

transports/bifrost-http/handlers/inference.go (3)

828-846: LGTM!

The requestType extraction logic correctly captures the request type from the first non-nil chunk, handling both success and error cases. The empty string default ensures the code falls back to standard SSE formatting when no chunks are received.

856-880: LGTM!

The conditional SSE formatting correctly distinguishes between Responses API streams (with optional event lines) and standard chat completion streams. The logic properly extracts the event type and formats the output accordingly.

889-896: LGTM!

The conditional [DONE] marker emission correctly follows OpenAI's Responses API specification, which ends streams by closing the connection rather than sending a [DONE] signal. The explanatory comment is helpful.

core/schemas/bifrost.go (1)

253-253: LGTM!

The addition of *ResponsesStreamResponse is consistent with the existing pattern of embedded pointer fields in BifrostResponse. The placement alongside *ResponsesResponse is logical, and the custom MarshalJSON on BifrostStream will handle any potential conflicts between embedded fields.

core/schemas/responses.go (1)

88-88: LGTM!

The addition of the Error field to ResponsesResponse provides proper error handling support for the Responses API. The field is appropriately optional (pointer type) and follows the established pattern for error fields in the codebase.

core/providers/openai.go (2)

987-1008: LGTM! Proper SSE event line handling.

The implementation correctly handles Server-Sent Events (SSE) format by:

Skipping event: lines in the initial check (line 988)

Filtering out event lines when parsing data (lines 1002-1008)

Only processing actual data: lines or raw JSON

This defensive approach ensures clean parsing of the Responses API streaming format.

1072-1168: LGTM! Clean implementation of embedding support.

The new Embedding method and handleOpenAIEmbeddingRequest helper follow the established patterns in this file:

Proper authorization via Bearer token

Consistent error handling using parseOpenAIError

Use of shared converter function ToOpenAIEmbeddingRequest

Appropriate metadata in ExtraFields

The implementation is well-structured and maintainable.

core/providers/openai.go

core/schemas/chatcompletions.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

core/schemas/transcriptions.go (1)
55-81: Document or enforce mutual exclusivity of embedded streaming fields.

The embedding of both *BifrostTranscribeNonStreamResponse and *BifrostTranscribeStreamResponse allows both to be non-nil simultaneously, which may not reflect the intended usage (one for streaming, one for non-streaming). Consider:

Adding a comment clarifying that exactly one should be set based on context

Adding a validation method to ensure mutual exclusivity at runtime

Example validation:
func (t *BifrostTranscribe) Validate() error {
    if t.BifrostTranscribeNonStreamResponse != nil && t.BifrostTranscribeStreamResponse != nil {
        return fmt.Errorf("only one of non-stream or stream response should be set")
    }
    return nil
}
transports/bifrost-http/handlers/inference.go (1)

840-846: Verify edge case: requestType remains empty if first chunk lacks RequestType.

Similar to the logic in transports/bifrost-http/integrations/utils.go, if the first chunk is an error without ExtraFields.RequestType set, requestType remains empty and the code falls through to standard SSE format. Ensure this is the intended fallback.

Additionally, note that this logic is duplicated between handlers/inference.go and integrations/utils.go. Consider extracting to a shared helper function to reduce duplication.
core/schemas/chatcompletions.go (1)
351-363: Update comment to reflect all three embedded pointer types.

The comment states "Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice should be non-nil at a time," but the struct actually has three embedded pointer types: BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice, and BifrostStreamResponseChoice.

Apply this diff to clarify the comment:
 // BifrostChatResponseChoice represents a choice in the completion result.
 // This struct can represent either a streaming or non-streaming response choice.
-// IMPORTANT: Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice
-// should be non-nil at a time.
+// IMPORTANT: Only one of BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice,
+// or BifrostStreamResponseChoice should be non-nil at a time.
 type BifrostChatResponseChoice struct {
core/providers/openai.go (1)
573-617: Consider extracting duplicate error-handling logic.

The error-handling blocks for NewRequestWithContext (lines 573-590) and client.Do (lines 601-617) are nearly identical, checking for context.Canceled, fasthttp.ErrTimeout, and context.DeadlineExceeded. While this duplication provides clarity at each call site, extracting it into a helper function would reduce repetition.

Example helper function:
func handleRequestError(err error, providerName schemas.ModelProvider) *schemas.BifrostError {
	if errors.Is(err, context.Canceled) {
		return &schemas.BifrostError{
			IsBifrostError: false,
			Error: &schemas.ErrorField{
				Type:    schemas.Ptr(schemas.RequestCancelled),
				Message: schemas.ErrRequestCancelled,
				Error:   err,
			},
		}
	}
	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
		return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
	}
	return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
}
Then use it at both call sites:
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody))
if err != nil {
	return nil, handleRequestError(err, providerName)
}

// ...

resp, err := client.Do(req)
if err != nil {
	return nil, handleRequestError(err, providerName)
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a84b84 and bb24cd9.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (7 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

core/schemas/providers/mistral/embedding.go
plugins/telemetry/main.go
core/schemas/providers/mistral/types.go
transports/bifrost-http/handlers/server.go

🚧 Files skipped from review as they are similar to previous changes (10)

core/providers/ollama.go
core/providers/gemini.go
core/providers/sgl.go
core/schemas/providers/gemini/chat.go
core/providers/vertex.go
core/providers/parasail.go
core/providers/mistral.go
core/schemas/responses.go
core/schemas/speech.go
core/providers/azure.go

🧰 Additional context used

🧬 Code graph analysis (5)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/providers/openai.go (6)

core/schemas/provider.go (3)

PostHookRunner (191-191)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (12)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ModelProvider (52-52)

ErrorField (426-433)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

core/utils.go (1)

Ptr (12-14)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check

🔇 Additional comments (14)

core/providers/groq.go (1)

179-179: LGTM!

The function rename to handleOpenAIChatCompletionStreaming clearly distinguishes chat completion streaming from other streaming types, aligning with the PR's goal to separate streaming handlers.

core/providers/openrouter.go (1)

132-132: LGTM!

Consistent with other providers switching to handleOpenAIChatCompletionStreaming.

core/providers/cerebras.go (1)

139-139: LGTM!

Consistent refactor aligning with the chat completion streaming handler separation.
transports/bifrost-http/integrations/utils.go (3)
456-471: Verify edge case: requestType remains empty if first chunk lacks RequestType.

If the first chunk is an error without ExtraFields.RequestType set, requestType remains empty and the code falls through to standard SSE format. Ensure this is the intended fallback behavior.

Consider adding a log or default value if requestType is still empty after the first chunk:
 if requestType == "" {
     if response.BifrostResponse != nil {
         requestType = response.BifrostResponse.ExtraFields.RequestType
     } else if response.BifrostError != nil {
         requestType = response.BifrostError.ExtraFields.RequestType
     }
+    if requestType == "" {
+        // Default to standard streaming format if type cannot be determined
+        log.Printf("Warning: Unable to determine request type from first chunk, defaulting to standard format")
+    }
 }
568-609: Consider skipping event line when eventType is empty.

At lines 577-581, if eventType is empty, the code writes event: \n which is technically valid SSE but unusual. Consider skipping the event line entirely when the type is unavailable:
-if eventType != "" {
-    if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil {
-        return // Network error, stop streaming
-    }
-}
+// Only write event line if we have a non-empty event type
+if eventType != "" {
+    if _, err := fmt.Fprintf(w, "event: %s\n", eventType); err != nil {
+        return // Network error, stop streaming
+    }
+}
The current check already does this, so the logic appears correct. However, ensure that ResponsesStreamResponse.Type is always populated when requestType == schemas.ResponsesStreamRequest.

619-626: LGTM!

The conditional [DONE] marker emission correctly omits it for the Responses API, aligning with OpenAI's API specification where the Responses API stream ends without a [DONE] sentinel.
core/schemas/textcompletions.go (1)

71-78: LGTM!

The TextCompletionLogProb type provides a clean structure for log probability data, consistent with similar types elsewhere in the codebase (e.g., LogProb in chat completions).

transports/bifrost-http/handlers/inference.go (2)

856-880: LGTM!

The streaming format branching correctly distinguishes between Responses API (event+data format) and other APIs (data-only format), with proper handling of empty event types.

889-895: LGTM!

The conditional [DONE] marker emission correctly aligns with OpenAI's API specification, omitting it for the Responses API.

core/schemas/embedding.go (1)

109-120: LGTM!

The marshaling logic correctly selects between string, float32 array, or 2D array based on which field is non-nil, with appropriate error handling when all fields are nil.

core/schemas/bifrost.go (1)

253-253: LGTM!

The addition of the embedded *ResponsesStreamResponse field properly extends BifrostResponse to support Responses API streaming, consistent with the PR's objective.

core/providers/openai.go (3)

511-533: LGTM!

The refactoring to use the dedicated handleOpenAIChatCompletionStreaming handler improves code organization by separating chat completion streaming from other streaming types (Responses, TextCompletion).

885-1070: LGTM!

The Responses API streaming handler correctly:

Skips event: lines (lines 987-990, 1002-1008) which is specific to the Responses streaming format

Handles error responses by constructing and sending a BifrostError (lines 1026-1043)

Sets the stream-end indicator for completion events (lines 1050-1055)

Tracks latency per chunk and for the overall stream

The implementation properly differentiates Responses streaming from chat completion streaming.

1072-1168: LGTM!

The embedding implementation follows the established patterns for OpenAI request handling:

Proper operation allowance checking

Centralized request body conversion via ToOpenAIEmbeddingRequest

Consistent error handling and response parsing

Appropriate metadata population in ExtraFields

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (5)

core/schemas/embedding.go (1)
122-144: Reset embedding fields before decoding.

UnmarshalJSON never clears EmbeddingStr, EmbeddingArray, or Embedding2DArray before decoding. If the same BifrostEmbeddingResponse instance is reused (common when unmarshalling into preallocated structs), stale state survives: e.g., a prior 1-D decode leaves EmbeddingArray non-nil, then decoding a string populates EmbeddingStr but the old array remains and MarshalJSON serializes the wrong shape. Nil out every variant up front so each decode starts from a clean slate.
 func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error {
+	be.EmbeddingStr = nil
+	be.EmbeddingArray = nil
+	be.Embedding2DArray = nil
+
 	// First, try to unmarshal as a direct string
core/schemas/responses.go (3)
1635-1635: Fix event type string: should be "delta" not "added".

Line 1635 defines ResponsesStreamResponseTypeOutputTextAdded but the string value should be "response.output_text.delta" to match the OpenAI Responses API streaming specification.

Apply this diff:
-	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.added"
+	ResponsesStreamResponseTypeOutputTextAdded ResponsesStreamResponseType = "response.output_text.delta"
1641-1641: Fix event type string: should be "delta" not "added".

Line 1641 defines ResponsesStreamResponseTypeFunctionCallArgumentsAdded but the string value should be "response.function_call_arguments.delta" to match the OpenAI Responses API streaming specification.

Apply this diff:
-	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.added"
+	ResponsesStreamResponseTypeFunctionCallArgumentsAdded     ResponsesStreamResponseType = "response.function_call_arguments.delta"
1676-1676: Fix event type string: missing dot before "annotation".

Line 1676 defines ResponsesStreamResponseTypeOutputTextAnnotationAdded but the string value should be "response.output_text.annotation.added" (note the dot between "text" and "annotation").

Apply this diff:
-	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added"
+	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"
core/providers/openai.go (1)
671-671: Fix TODO comment typo.

Line 671 contains a typo: "WITHING" should be "WITHIN".

Apply this diff:
-			//TODO: FIX THIS WITHING RESPONSE STRUCT
+			//TODO: FIX THIS WITHIN RESPONSE STRUCT

🧹 Nitpick comments (1)

core/schemas/chatcompletions.go (1)

351-363: Clarify the mutually exclusive choice comment.

The struct now embeds three shapes (BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice, BifrostStreamResponseChoice), yet the comment still says only the last two are mutually exclusive. Please update the doc to spell out that exactly one of the three embeds should be populated and give a short note on when each variant is used. Keeps downstream implementers from assuming the text-completion pointer can coexist with the others.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5451ac and 1ba34d3.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (7 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

transports/bifrost-http/handlers/server.go
core/schemas/providers/mistral/embedding.go
plugins/telemetry/main.go
core/schemas/providers/mistral/types.go

🚧 Files skipped from review as they are similar to previous changes (14)

core/providers/groq.go
core/providers/openrouter.go
core/providers/ollama.go
core/providers/sgl.go
core/schemas/providers/gemini/chat.go
core/providers/gemini.go
core/providers/parasail.go
core/providers/vertex.go
core/schemas/textcompletions.go
core/providers/azure.go
core/providers/mistral.go
core/schemas/bifrost.go
core/schemas/transcriptions.go
core/providers/cerebras.go

🧰 Additional context used

🧬 Code graph analysis (6)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/providers/openai.go (5)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (17)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)

GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check

🔇 Additional comments (5)

core/schemas/responses.go (1)

88-88: LGTM! Error field addition is properly structured.

The addition of the Error field to ResponsesResponse follows Go conventions with a pointer type for optionality and proper JSON tagging. The field integrates well with the existing struct definition.

core/providers/openai.go (4)

511-533: LGTM! Clean delegation to shared streaming logic.

The ChatCompletionStream method properly delegates to the shared handleOpenAIChatCompletionStreaming function, reducing code duplication. Authorization headers and configuration are correctly passed through.

769-862: LGTM! Responses method implementation is solid.

The Responses method and handleOpenAIResponsesRequest follow the established pattern used for chat completions. Proper authorization, error handling, and metadata population are all in place.

864-1070: LGTM! Responses streaming implementation handles SSE format correctly.

The handleOpenAIResponsesStreaming function properly handles the Responses API's SSE format:

Correctly skips event: lines (lines 988, 1002-1007) since responses streaming uses typed events

Defensive nil-check for ResponsesStreamResponse (line 1022)

Proper terminal event handling for both error (lines 1026-1043) and completion (lines 1050-1055) states

Stream end context marker set appropriately for terminal events

1072-1168: LGTM! Embedding implementation follows established patterns.

The Embedding method and handleOpenAIEmbeddingRequest follow the same solid patterns used throughout the provider:

Proper operation validation

Standard error handling and resource management

Correct metadata population with EmbeddingRequest type

Consistent with other non-streaming methods

core/providers/openai.go

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

core/providers/vertex.go (1)
585-602: Fix the Authorization header regression.

Line 596 currently hardcodes Authorization: Bearer even when key.Value is empty because the computed authHeader map is never used. That overwrites the service-account token that the OAuth client injects and turns every default-credential stream into a 401. Pass through the conditional header map instead so we only set the header when a direct key exists.
-		authHeader := map[string]string{}
-		if key.Value != "" {
-			authHeader["Authorization"] = "Bearer " + key.Value
-		}
+		var authHeader map[string]string
+		if key.Value != "" {
+			authHeader = map[string]string{"Authorization": "Bearer " + key.Value}
+		}
 		// Use shared OpenAI streaming logic
 		return handleOpenAIChatCompletionStreaming(
 			ctx,
 			client,
 			url,
 			request,
-			map[string]string{"Authorization": "Bearer " + key.Value},
+			authHeader,
 			provider.networkConfig.ExtraHeaders,
 			provider.sendBackRawResponse,
 			schemas.Vertex,
 			postHookRunner,
 			provider.logger,
 		)

♻️ Duplicate comments (3)

core/schemas/chatcompletions.go (1)

351-363: Clarify which embedded pointers are mutually exclusive.

The comment states "Only one of BifrostNonStreamResponseChoice or BifrostStreamResponseChoice should be non-nil," but BifrostChatResponseChoice embeds three pointers. The documentation should clarify which combinations are valid (e.g., only one of the three should be non-nil at any time, or specific pairings are allowed).

core/schemas/responses.go (1)

1617-1684: Event type constants previously flagged for verification.

A previous review identified potential issues with some streaming event type string constants (e.g., ResponsesStreamResponseTypeOutputTextAdded, ResponsesStreamResponseTypeFunctionCallArgumentsAdded, ResponsesStreamResponseTypeOutputTextAnnotationAdded) and requested verification against OpenAI's official Responses API documentation to ensure correctness.
core/schemas/embedding.go (1)
122-145: Reset embedding variants before decoding.

Reusing a pooled BifrostEmbeddingResponse will leave whichever variant was populated first, because Line 126 sets (say) EmbeddingStr but later decode paths never clear it. On the next unmarshal, the string pointer stays non-nil, so MarshalJSON returns the old string instead of the newly decoded vector. Clear every variant up front so each decode starts from a clean slate.
 func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error {
+	be.EmbeddingStr = nil
+	be.EmbeddingArray = nil
+	be.Embedding2DArray = nil
+
 	// First, try to unmarshal as a direct string
 	var stringContent string
 	if err := sonic.Unmarshal(data, &stringContent); err == nil {
 		be.EmbeddingStr = &stringContent
 		return nil

🧹 Nitpick comments (1)

core/providers/openai.go (1)

574-617: Consider extracting duplicate error handling into a helper function.

The error handling logic for context.Canceled, fasthttp.ErrTimeout, and context.DeadlineExceeded is duplicated in two places (lines 576-590 after NewRequestWithContext and lines 603-617 after client.Do). This duplication makes the code harder to maintain.

Consider extracting this into a helper function:

func handleStreamRequestError(err error, providerName schemas.ModelProvider) *schemas.BifrostError {
	if errors.Is(err, context.Canceled) {
		return &schemas.BifrostError{
			IsBifrostError: false,
			Error: &schemas.ErrorField{
				Type:    schemas.Ptr(schemas.RequestCancelled),
				Message: schemas.ErrRequestCancelled,
				Error:   err,
			},
		}
	}
	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
		return newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
	}
	return newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
}

Then use it in both places:

 req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(jsonBody))
 if err != nil {
-	if errors.Is(err, context.Canceled) {
-		return nil, &schemas.BifrostError{
-			IsBifrostError: false,
-			Error: &schemas.ErrorField{
-				Type:    schemas.Ptr(schemas.RequestCancelled),
-				Message: schemas.ErrRequestCancelled,
-				Error:   err,
-			},
-		}
-	}
-	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
-		return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
-	}
-	return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
+	return nil, handleStreamRequestError(err, providerName)
 }

 // ... later ...

 resp, err := client.Do(req)
 if err != nil {
-	if errors.Is(err, context.Canceled) {
-		return nil, &schemas.BifrostError{
-			IsBifrostError: false,
-			Error: &schemas.ErrorField{
-				Type:    schemas.Ptr(schemas.RequestCancelled),
-				Message: schemas.ErrRequestCancelled,
-				Error:   err,
-			},
-		}
-	}
-	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
-		return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, providerName)
-	}
-	return nil, newBifrostOperationError(schemas.ErrProviderRequest, err, providerName)
+	return nil, handleStreamRequestError(err, providerName)
 }

This same pattern appears in other streaming functions in this file (handleOpenAIResponsesStreaming, SpeechStream, TranscriptionStream), so the helper would reduce duplication across all of them.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5451ac and 1ba34d3.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (7 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

core/schemas/providers/mistral/embedding.go
transports/bifrost-http/handlers/server.go
plugins/telemetry/main.go
core/schemas/providers/mistral/types.go

🚧 Files skipped from review as they are similar to previous changes (7)

core/providers/sgl.go
core/providers/groq.go
core/providers/cerebras.go
core/providers/mistral.go
core/providers/azure.go
core/providers/gemini.go
core/schemas/providers/gemini/chat.go

🧰 Additional context used

🧬 Code graph analysis (7)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/providers/openai.go (5)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (14)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)

GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check
GitHub Check: Graphite / mergeability_check

🔇 Additional comments (12)

core/schemas/speech.go (2)

16-18: LGTM: Formatting improvements.

The whitespace alignment changes improve readability without altering functionality.

86-95: LGTM: New streaming speech types.

The new BifrostSpeech and BifrostSpeechStreamResponse types follow the established pattern of embedding pointers for streaming variants. The structure aligns with similar types introduced in this PR (e.g., BifrostTranscribe*, BifrostChatResponseChoice).

core/schemas/textcompletions.go (1)

72-78: LGTM: Log probability support for text completions.

The new TextCompletionLogProb type provides token-level log probability information for text completions, complementing the similar log-probability structures added in core/schemas/chatcompletions.go (LogProb, ContentLogProb).

core/schemas/transcriptions.go (1)

56-81: LGTM: Transcription response types.

The new BifrostTranscribe* types follow the established pattern of embedding variant-specific pointers to distinguish streaming from non-streaming fields. The structure is consistent with other Bifrost response types introduced in this PR.

core/schemas/chatcompletions.go (1)

365-402: LGTM: Chat completion streaming types.

The new types (BifrostTextCompletionResponseChoice, BifrostNonStreamResponseChoice, BifrostStreamResponseChoice, BifrostStreamDelta, LogProb, ContentLogProb) provide comprehensive support for both streaming and non-streaming chat completion responses with token-level log probabilities.

core/schemas/responses.go (2)

88-88: LGTM: Error field addition.

The new Error field enables structured error reporting in responses, following the standard pattern of optional pointer fields (omitempty).

1686-1720: LGTM: Streaming response structures.

The ResponsesStreamResponse and ResponsesStreamResponseStruct types provide a comprehensive structure for handling various streaming event payloads. The use of optional pointer fields allows different event types to carry their specific data.

core/providers/parasail.go (1)

121-121: LGTM: Streaming handler refactor.

The switch from handleOpenAIStreaming to handleOpenAIChatCompletionStreaming improves code clarity by distinguishing chat completion streaming from other streaming types (e.g., Responses API streaming). This change is consistent with the same refactor applied across other providers in this PR.

core/providers/openrouter.go (1)

132-132: LGTM: Streaming handler refactor.

The switch to handleOpenAIChatCompletionStreaming aligns with the streaming handler refactor applied across all providers in this PR, improving code organization by separating chat completion streaming from other streaming paths.

core/providers/ollama.go (1)

152-152: LGTM: Streaming handler refactor.

The switch to handleOpenAIChatCompletionStreaming completes the consistent refactor across all providers, distinguishing chat completion streaming from other streaming types (e.g., Responses API streaming).

core/providers/openai.go (2)

769-862: LGTM! Clean implementation of non-streaming Responses API.

The implementation follows the same proven pattern as ChatCompletion, with appropriate error handling, latency tracking, and metadata population. The use of the centralized ToOpenAIResponsesRequest converter maintains consistency.

1072-1168: LGTM! Clean implementation of Embedding API.

The embedding request handler follows the established pattern used for chat completions and responses. The implementation is straightforward with appropriate error handling and metadata tracking.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ba34d3 and ea5a3e4.

📒 Files selected for processing (25)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (7 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (4)

core/schemas/providers/mistral/embedding.go
transports/bifrost-http/handlers/server.go
plugins/telemetry/main.go
core/schemas/providers/mistral/types.go

🚧 Files skipped from review as they are similar to previous changes (11)

core/providers/groq.go
core/providers/cerebras.go
core/providers/azure.go
core/schemas/speech.go
core/schemas/textcompletions.go
core/providers/vertex.go
core/providers/gemini.go
core/schemas/transcriptions.go
core/providers/sgl.go
core/schemas/providers/gemini/chat.go
core/schemas/chatcompletions.go

🧰 Additional context used

🧬 Code graph analysis (5)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1686-1715)

core/providers/openai.go (4)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (17)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1686-1715)

ResponsesStreamResponseTypeError (1683-1683)

ResponsesStreamResponseTypeCompleted (1625-1625)

🔇 Additional comments (13)

core/schemas/embedding.go (1)

97-145: LGTM! Type definitions and marshaling logic are correct.

The custom marshaling and unmarshaling logic properly handles the three embedding format variants (string, float32 array, 2D float32 array) and provides clear error messages when none match.

core/providers/parasail.go (1)

121-121: LGTM! Streaming handler update aligns with PR refactoring objectives.

The switch to handleOpenAIChatCompletionStreaming is consistent with the broader refactor to separate chat completion streaming from Responses API streaming across all providers.

core/providers/openrouter.go (1)

132-132: LGTM! Consistent streaming handler refactor.

The change mirrors the pattern applied across other providers in this PR, maintaining argument compatibility while routing through the chat-specific streaming handler.

core/providers/ollama.go (1)

152-152: LGTM! Handler rename aligns with project-wide refactor.

core/providers/mistral.go (1)

133-133: LGTM! Streaming handler update matches PR scope.

transports/bifrost-http/handlers/inference.go (4)

828-829: LGTM! Request type detection enables format-specific streaming.

The local requestType variable properly tracks the API type from the first chunk, enabling the correct SSE format selection downstream.

840-846: LGTM! Safe extraction of request type with defensive nil checks.

The code correctly checks both BifrostResponse and BifrostError for the request type, ensuring the value is captured from whichever appears first in the stream.

856-880: LGTM! Proper SSE format selection based on API type.

The branching logic correctly:

Emits event: lines for Responses API streaming (OpenAI Responses format)

Uses standard data: only format for other APIs (chat completions, etc.)

Safely handles nil ResponsesStreamResponse with the check at line 859

889-895: LGTM! Correct end-of-stream signaling per API spec.

The conditional [DONE] emission aligns with the OpenAI API specification:

Chat completions and other APIs: send [DONE] marker

Responses API: no [DONE] marker (stream close indicates end)

transports/bifrost-http/integrations/utils.go (3)

456-471: LGTM! Request type detection matches the pattern in handlers/inference.go.

The implementation correctly initializes the requestType variable and extracts it from the first streaming chunk with proper nil checks.

569-609: LGTM! Streaming format logic correctly distinguishes API types.

The branching between Responses API format (with event: lines) and standard SSE format (data-only) is implemented correctly:

Line 572: Safe nil check for ResponsesStreamResponse before accessing Type

Lines 577-581: Event line only written when eventType is non-empty

Lines 595-608: Standard SSE format for other request types

619-625: LGTM! End-of-stream marker correctly omitted for Responses API.

The conditional [DONE] logic matches the OpenAI specification and is consistent with the implementation in handlers/inference.go.

core/schemas/bifrost.go (1)

252-253: LGTM! Embedded streaming response field complements existing response structure.

The *ResponsesStreamResponse embedded field is correctly positioned alongside *ResponsesResponse. These fields are mutually exclusive in practice (one for non-streaming, one for streaming responses), so no JSON marshaling conflicts are expected.

core/schemas/responses.go

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (6)

core/schemas/embedding.go (1)
122-144: Reset embedding fields before decoding to avoid stale data

UnmarshalJSON never clears the other shape fields before decoding. If the same BifrostEmbeddingResponse instance is reused (common when unmarshalling successive SSE chunks), decoding a later payload with a different shape leaves the previously populated fields intact. Example: decode a base64 string first (sets EmbeddingStr), then decode a float array; EmbeddingStr remains non-nil, so MarshalJSON (or any caller that checks the fields) will still see the old string instead of the new array. Clear every variant up front so each branch assigns a clean shape.
 func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error {
+	// Reset all shapes so repeated unmarshals don't leak previous values
+	be.EmbeddingStr = nil
+	be.EmbeddingArray = nil
+	be.Embedding2DArray = nil
+
 	// First, try to unmarshal as a direct string
 	var stringContent string
 	if err := sonic.Unmarshal(data, &stringContent); err == nil {
core/providers/vertex.go (3)
262-264: Remove incorrect fasthttp.ErrTimeout check.

Same issue as Line 225: fasthttp.ErrTimeout will never match when using net/http.

Apply this diff:
-	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
+	if errors.Is(err, context.DeadlineExceeded) {
		return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
	}
417-419: Remove incorrect fasthttp.ErrTimeout check.

Same issue as Lines 225 and 262: fasthttp.ErrTimeout will never match when using net/http.

Apply this diff:
-	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
+	if errors.Is(err, context.DeadlineExceeded) {
		return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
	}
448-450: Remove incorrect fasthttp.ErrTimeout check.

Same issue as Lines 225, 262, and 417: fasthttp.ErrTimeout will never match when using net/http.

Apply this diff:
-	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
+	if errors.Is(err, context.DeadlineExceeded) {
		return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
	}
core/schemas/responses.go (1)
1677-1677: Verify annotation event type string format.

Line 1677 defines ResponsesStreamResponseTypeOutputTextAnnotationAdded with an underscore:
ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added"
Based on past review comments and the OpenAI Responses API SSE format, this should use a dot separator for consistency with other hierarchical event types (e.g., response.output_text.delta). The correct format appears to be:
"response.output_text.annotation.added"
Please verify whether the current underscore format (output_text_annotation) matches the actual SSE events emitted by the OpenAI Responses API, or if it should use dot notation (output_text.annotation) for proper event matching.
core/providers/openai.go (1)
671-671: Fix TODO comment typo.

The TODO comment contains a typo: "WITHING" should be "WITHIN". This issue was supposedly addressed in a previous commit (cc59aff) according to past review comments, but the typo remains in the code.

Apply this diff to fix the typo:
-			//TODO: FIX THIS WITHING RESPONSE STRUCT
+			//TODO: FIX THIS WITHIN RESPONSE STRUCT
Additionally, consider clarifying what needs to be fixed in the response struct or creating a tracked issue if this represents ongoing work.

🧹 Nitpick comments (5)

tests/core-providers/scenarios/responses_stream.go (3)
17-21: Consider adding a dedicated ResponsesStream flag for clarity.

The test uses the generic CompletionStream flag (line 18) to gate Responses API streaming tests. While this works, it would be clearer to have a dedicated ResponsesStream flag in TestScenarios, similar to how TextCompletionStream was separated from CompletionStream. This would allow finer-grained control over which streaming tests run for each provider.

If you agree, consider adding a ResponsesStream bool field to TestScenarios in tests/core-providers/config/account.go and updating the test configurations accordingly.

279-283: Consider using retry wrapper for consistency.

Unlike the main test (line 61), this tool calling test doesn't use the WithStreamRetry wrapper. For consistency and to handle transient provider issues, consider wrapping this call as well:

Apply this diff to add retry wrapper:
-		responseChannel, err := client.ResponsesStreamRequest(ctx, request)
-		RequireNoError(t, err, "Responses stream with tools failed")
-		if responseChannel == nil {
-			t.Fatal("Response channel should not be nil")
-		}
+		retryConfig := StreamingRetryConfig()
+		retryContext := TestRetryContext{
+			ScenarioName: "ResponsesStreamWithTools",
+			ExpectedBehavior: map[string]interface{}{
+				"should_call_tool": true,
+			},
+			TestMetadata: map[string]interface{}{
+				"provider": testConfig.Provider,
+				"model":    testConfig.ChatModel,
+			},
+		}
+
+		responseChannel, err := WithStreamRetry(t, retryConfig, retryContext, func() (chan *schemas.BifrostStream, *schemas.BifrostError) {
+			return client.ResponsesStreamRequest(ctx, request)
+		})
+		RequireNoError(t, err, "Responses stream with tools failed")
+		if responseChannel == nil {
+			t.Fatal("Response channel should not be nil")
+		}
388-392: Consider using retry wrapper for consistency.

Similar to the tool calling test, this reasoning test also lacks retry wrapping. Consider adding WithStreamRetry here for consistent error handling across all test scenarios.
core/schemas/bifrost.go (1)
237-254: Document the union constraint for ResponsesResponse and ResponsesStreamResponse.

The BifrostResponse struct now embeds both *ResponsesResponse and *ResponsesStreamResponse, following a union-type pattern similar to other structs in the codebase. However, the struct lacks documentation clarifying that only one of these embedded pointers should be non-nil at a time.

Add a comment above the struct or in the struct documentation to clarify the usage:
 // BifrostResponse represents the complete result from any bifrost request.
+// 
+// For Responses API requests: Only one of ResponsesResponse (non-stream) or 
+// ResponsesStreamResponse (stream) should be non-nil at a time.
 type BifrostResponse struct {
 	ID                string                      `json:"id,omitempty"`
tests/core-providers/scenarios/text_completion_stream.go (1)
146-148: Consider documenting or extracting magic numbers for streaming chunk limits.

The test uses different maximum chunk counts across sub-tests (500, 100, 150) without clear documentation of why these specific values were chosen.

Consider either:

Extracting these as named constants with explanatory comments:
const (
    maxChunksMainTest = 500  // High limit for full story generation
    maxChunksVariedPrompts = 100  // Lower limit for short prompt completions
    maxChunksParameterTest = 150  // Mid-range for parameter variations
)
Or adding inline comments explaining the rationale:
if responseCount > 500 {
    // Safety limit: story generation should complete well under 500 chunks
    t.Fatal("Received too many streaming chunks, something might be wrong")
}
This improves test maintainability and makes the test's assumptions explicit.

Also applies to: 279-281, 381-383

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1ba34d3 and 3f77d16.

📒 Files selected for processing (46)

core/providers/azure.go (1 hunks)
core/providers/cerebras.go (1 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (1 hunks)
core/providers/mistral.go (1 hunks)
core/providers/ollama.go (1 hunks)
core/providers/openai.go (7 hunks)
core/providers/openrouter.go (1 hunks)
core/providers/parasail.go (1 hunks)
core/providers/sgl.go (1 hunks)
core/providers/vertex.go (5 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
tests/core-providers/anthropic_test.go (1 hunks)
tests/core-providers/azure_test.go (1 hunks)
tests/core-providers/bedrock_test.go (1 hunks)
tests/core-providers/cerebras_test.go (1 hunks)
tests/core-providers/cohere_test.go (1 hunks)
tests/core-providers/config/account.go (12 hunks)
tests/core-providers/custom_test.go (1 hunks)
tests/core-providers/gemini_test.go (1 hunks)
tests/core-providers/groq_test.go (1 hunks)
tests/core-providers/mistral_test.go (1 hunks)
tests/core-providers/ollama_test.go (1 hunks)
tests/core-providers/openai_test.go (1 hunks)
tests/core-providers/openrouter_test.go (1 hunks)
tests/core-providers/parasail_test.go (1 hunks)
tests/core-providers/scenarios/chat_completion_stream.go (1 hunks)
tests/core-providers/scenarios/responses_stream.go (1 hunks)
tests/core-providers/scenarios/text_completion_stream.go (1 hunks)
tests/core-providers/scenarios/utils.go (0 hunks)
tests/core-providers/sgl_test.go (1 hunks)
tests/core-providers/tests.go (2 hunks)
tests/core-providers/vertex_test.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (5)

plugins/telemetry/main.go
tests/core-providers/scenarios/utils.go
transports/bifrost-http/handlers/server.go
core/schemas/providers/mistral/embedding.go
core/schemas/providers/mistral/types.go

🚧 Files skipped from review as they are similar to previous changes (9)

core/providers/cerebras.go
core/providers/sgl.go
transports/bifrost-http/handlers/inference.go
core/providers/parasail.go
core/providers/openrouter.go
core/schemas/providers/gemini/chat.go
core/providers/gemini.go
core/providers/groq.go
core/providers/mistral.go

🧰 Additional context used

🧬 Code graph analysis (10)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

tests/core-providers/groq_test.go (1)

tests/core-providers/config/account.go (1)

TestScenarios (20-40)

tests/core-providers/scenarios/responses_stream.go (3)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/responses.go (23)

ResponsesMessage (280-292)

ResponsesInputMessageRoleUser (567-567)

ResponsesMessageContent (573-576)

ResponsesParameters (33-60)

ResponsesStreamResponseType (1620-1620)

ResponsesStreamResponse (1687-1716)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesStreamResponseTypeContentPartAdded (1632-1632)

ResponsesStreamResponseTypeCreated (1623-1623)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeError (1684-1684)

ResponsesTool (1341-1356)

ResponsesToolFunction (1359-1362)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesParametersReasoning (210-214)

ResponsesStreamResponseTypeReasoningSummaryPartAdded (1652-1652)

ResponsesStreamResponseTypeReasoningSummaryTextDelta (1654-1654)

ResponsesMessageTypeReasoning (273-273)

ResponsesResponse (85-114)

ResponsesMessageTypeMessage (256-256)

ResponsesInputMessageRoleAssistant (566-566)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostStream (383-386)

BifrostError (409-418)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

SGL (64-64)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1687-1716)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1687-1716)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

tests/core-providers/scenarios/text_completion_stream.go (4)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/textcompletions.go (2)

TextCompletionInput (9-12)

TextCompletionParameters (50-70)

core/schemas/bifrost.go (9)

BifrostTextCompletionRequest (134-140)

BifrostStream (383-386)

BifrostError (409-418)

TextCompletionStreamRequest (103-103)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

RequestType (99-99)

TextCompletionRequest (102-102)

core/schemas/chatcompletions.go (3)

BifrostStreamResponseChoice (376-378)

BifrostTextCompletionResponseChoice (365-367)

BifrostChatResponseChoice (355-363)

core/providers/openai.go (4)

core/schemas/provider.go (6)

PostHookRunner (191-191)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/bifrost.go (14)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1687-1716)

ResponsesStreamResponseTypeError (1684-1684)

ResponsesStreamResponseTypeCompleted (1625-1625)

tests/core-providers/tests.go (3)

tests/core-providers/scenarios/text_completion_stream.go (1)

RunTextCompletionStreamTest (16-399)

tests/core-providers/scenarios/chat_completion_stream.go (1)

RunChatCompletionStreamTest (16-298)

tests/core-providers/scenarios/responses_stream.go (1)

RunResponsesStreamTest (17-467)

🔇 Additional comments (22)

tests/core-providers/custom_test.go (1)

31-31: CompletionStream rename is consistent
Verified that TestScenarios defines CompletionStream and all test files reference it.

tests/core-providers/gemini_test.go (1)

35-35: LGTM: Test configuration updated for streaming refactor.

The field rename from ChatCompletionStream to CompletionStream aligns with the PR's refactoring to separate chat completion streaming from Responses API streaming. The change maintains the same test coverage.

tests/core-providers/sgl_test.go (1)

28-28: LGTM: Test configuration updated for streaming refactor.

Consistent with the PR-wide refactoring to rename streaming flags for clarity.

tests/core-providers/bedrock_test.go (1)

33-33: LGTM: Test configuration updated for streaming refactor.

The field rename maintains test coverage while improving naming clarity.

core/providers/azure.go (1)

372-372: LGTM: Provider updated to use chat-specific streaming handler.

The function call change aligns with the PR's refactoring to separate chat completion streaming from Responses API streaming. The call site preserves all arguments and behavior.

tests/core-providers/azure_test.go (1)

28-28: LGTM: Test configuration updated for streaming refactor.

Consistent field rename maintaining test coverage.

tests/core-providers/scenarios/chat_completion_stream.go (1)

17-17: LGTM: Test scenario gating updated for config refactor.

The condition now references CompletionStream instead of ChatCompletionStream, aligning with the updated test configuration structure. The test logic remains unchanged.

core/providers/ollama.go (1)

152-152: LGTM: Verified handler signature and call arguments align.

tests/core-providers/tests.go (2)

29-32: LGTM! New streaming test scenarios properly wired.

The addition of RunTextCompletionStreamTest and RunResponsesStreamTest correctly extends the test suite to cover the new streaming capabilities introduced in this PR.

71-71: LGTM! Field rename aligns with broader refactoring.

The rename from "ChatCompletionStream" to "CompletionStream" provides clearer semantics and is consistently applied across all test configurations.

tests/core-providers/vertex_test.go (1)

28-28: LGTM! Field rename is consistent.

The update from ChatCompletionStream to CompletionStream aligns with the broader test configuration refactoring.

tests/core-providers/ollama_test.go (1)

27-27: LGTM! Field rename is consistent.

The update from ChatCompletionStream to CompletionStream aligns with the broader test configuration refactoring.

core/providers/vertex.go (1)

591-603: LGTM! Function rename aligns with streaming refactoring.

The rename from handleOpenAIStreaming to handleOpenAIChatCompletionStreaming properly reflects the chat-specific streaming behavior and aligns with the PR's goal of separating chat completion streaming from Responses API streaming.

tests/core-providers/anthropic_test.go (1)

28-28: LGTM! Field rename is consistent.

The update from ChatCompletionStream to CompletionStream aligns with the broader test configuration refactoring.

tests/core-providers/cohere_test.go (1)

28-28: LGTM! Field rename is consistent.

The update from ChatCompletionStream to CompletionStream aligns with the broader test configuration refactoring.

tests/core-providers/groq_test.go (2)

22-26: LGTM! Text completion enabled via chat conversion.

Enabling text completion and text completion streaming for Groq by leveraging chat completion conversion is a reasonable approach. The inline comments clearly document this implementation strategy.

28-28: LGTM! Field rename is consistent.

The update from ChatCompletionStream to CompletionStream aligns with the broader test configuration refactoring.

tests/core-providers/mistral_test.go (1)

28-28: LGTM! Field rename is consistent.

The update from ChatCompletionStream to CompletionStream aligns with the broader test configuration refactoring.

tests/core-providers/scenarios/responses_stream.go (1)

541-544: LGTM - StreamingValidationResult type.

The StreamingValidationResult type provides a clean way to collect and report validation issues without failing tests immediately. This approach is appropriate for streaming tests where some provider variations are expected.
transports/bifrost-http/integrations/utils.go (1)
568-609: Add nil check for ResponsesStreamResponse to prevent potential panic.

When requestType == schemas.ResponsesStreamRequest, the code assumes response.BifrostResponse.ResponsesStreamResponse is non-nil (line 572), but there's no explicit nil check. While unlikely, if a response has requestType set to ResponsesStreamRequest but ResponsesStreamResponse is nil, line 572 will panic when accessing .Type.

Apply this diff to add a defensive nil check:
 				} else {
 					// Handle different streaming formats based on request type
 					if requestType == schemas.ResponsesStreamRequest {
+						// Verify ResponsesStreamResponse is present
+						if response.BifrostResponse.ResponsesStreamResponse == nil {
+							log.Printf("Warning: ResponsesStreamResponse is nil for ResponsesStreamRequest")
+							continue
+						}
+
 						// OPENAI RESPONSES FORMAT: Use event: and data: lines for OpenAI responses API compatibility
 						eventType := ""
-						if response.BifrostResponse.ResponsesStreamResponse != nil {
-							eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type)
-						}
+						eventType = string(response.BifrostResponse.ResponsesStreamResponse.Type)
 
 						// Send event line if available
 						if eventType != "" {
Likely an incorrect or invalid review comment.
core/providers/openai.go (2)

987-1008: LGTM: Proper event-line filtering for Responses API streaming.

The event-line handling correctly:

Skips SSE event metadata lines (lines starting with event:) at lines 988 and 1002

Extracts only the data: payload for JSON parsing

Prevents event-type strings from being parsed as JSON data

This ensures clean SSE parsing for the Responses API streaming format.

1026-1043: LGTM: Error event handling follows the correct pattern.

The error handling for ResponsesStreamResponseTypeError:

Properly constructs a BifrostError from the error event fields

Sets the stream-end indicator in the context (line 1040)

Routes through processAndSendBifrostError for consistent error propagation

This ensures that Responses API streaming errors are handled consistently with the rest of the streaming infrastructure.

core/providers/vertex.go

tests/core-providers/openrouter_test.go

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

transports/bifrost-http/integrations/utils.go (2)

629-652: Honor custom SSE strings in initial stream error too (parity with in-loop errors).

sendStreamError always JSON-wraps the error and ignores preformatted SSE strings from StreamErrorConverter. Mirror handleStreaming’s behavior: if converter returns string, write it verbatim; else marshal and send as data:.

 func (g *GenericRouter) sendStreamError(ctx *fasthttp.RequestCtx, config RouteConfig, bifrostErr *schemas.BifrostError) {
   var errorResponse interface{}
@@
-  errorJSON, err := json.Marshal(map[string]interface{}{
-    "error": errorResponse,
-  })
-  if err != nil {
-    log.Printf("Failed to marshal error for SSE: %v", err)
-    ctx.SetStatusCode(fasthttp.StatusInternalServerError)
-    return
-  }
-
-  if _, err := fmt.Fprintf(ctx, "data: %s\n\n", errorJSON); err != nil {
-    log.Printf("Failed to write SSE error: %v", err)
-  }
+  // If converter returned a raw SSE string, send as-is
+  if sseString, ok := errorResponse.(string); ok {
+    if _, err := fmt.Fprint(ctx, sseString); err != nil {
+      log.Printf("Failed to write SSE error: %v", err)
+    }
+    return
+  }
+  // Default: wrap as data: {json}
+  errorJSON, err := json.Marshal(errorResponse)
+  if err != nil {
+    log.Printf("Failed to marshal error for SSE: %v", err)
+    ctx.SetStatusCode(fasthttp.StatusInternalServerError)
+    return
+  }
+  if _, err := fmt.Fprintf(ctx, "data: %s\n\n", errorJSON); err != nil {
+    log.Printf("Failed to write SSE error: %v", err)
+  }
 }

792-812: Bug: reflection looks for unexported field name "fallbacks".

Go struct fields are exported as "Fallbacks". FieldByName("fallbacks") will always miss, dropping fallbacks. Search for "Fallbacks" or read by JSON tag.

-  // Look for the "fallbacks" field
-  fallbacksField := reqValue.FieldByName("fallbacks")
+  // Look for the exported "Fallbacks" field
+  fallbacksField := reqValue.FieldByName("Fallbacks")
   if !fallbacksField.IsValid() {
-    return nil, nil // No fallbacks field found
+    return nil, nil // No fallbacks field found
   }

Optional: iterate struct fields and match json:"fallbacks" tag for maximum robustness.

core/providers/openai.go (2)

278-286: Increase Scanner buffer to avoid token-too-long on SSE lines

Default bufio.Scanner buffer (64K) can fail on large SSE chunks. Set a larger max buffer.
-        scanner := bufio.NewScanner(resp.Body)
+        scanner := bufio.NewScanner(resp.Body)
+        // Increase max token size for large SSE payloads
+        scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
1331-1336: Speech/Transcription streams: enlarge Scanner buffers

Prevent scan failures on large audio/transcription SSE chunks.
-        scanner := bufio.NewScanner(resp.Body)
+        scanner := bufio.NewScanner(resp.Body)
+        scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
Also applies to: 1610-1616

♻️ Duplicate comments (2)

core/schemas/embedding.go (1)
122-144: Clear previous embedding shape before decoding

UnmarshalJSON never resets EmbeddingStr, EmbeddingArray, or Embedding2DArray. If this struct is reused (common with pooling or external consumers), decoding a new payload after a prior shape leaves the old field populated. Because MarshalJSON prefers the string branch, a prior string decode will still be emitted even after successfully decoding an array. Please nil out every shape up front to avoid stale state.
 func (be *BifrostEmbeddingResponse) UnmarshalJSON(data []byte) error {
+	be.EmbeddingStr = nil
+	be.EmbeddingArray = nil
+	be.Embedding2DArray = nil
+
 	// First, try to unmarshal as a direct string
 	var stringContent string
core/providers/openai.go (1)
671-671: Fix TODO typo (already noted previously)

WITHING → WITHIN.
-            //TODO: FIX THIS WITHING RESPONSE STRUCT
+            //TODO: FIX THIS WITHIN RESPONSE STRUCT

🧹 Nitpick comments (6)

transports/bifrost-http/integrations/utils.go (2)
619-625: Prevent stray [DONE] on Responses streams.

If ExtraFields.RequestType isn’t set, you may emit [DONE] for Responses streams, which is spec-incompatible. The detection fix above mitigates; ensure post-hooks set RequestType or keep the heuristic.

Would you also confirm all Responses streaming chunks have ExtraFields.RequestType=responses_stream? See verification script below.

368-373: SSE proxying hardening (optional): disable proxy buffering.

Consider adding X-Accel-Buffering: no to improve SSE behavior behind nginx.
   ctx.Response.Header.Set("Cache-Control", "no-cache")
   ctx.Response.Header.Set("Connection", "keep-alive")
   ctx.Response.Header.Set("Access-Control-Allow-Origin", "*")
+  ctx.Response.Header.Set("X-Accel-Buffering", "no")
core/providers/ollama.go (1)

152-160: ResponsesStream delegation: verify RequestType propagation.

Same concern as others: ensure post-hook or conversion sets ExtraFields.RequestType=responses_stream so HTTP layer emits event: lines and omits [DONE].

core/providers/mistral.go (1)

130-137: ResponsesStream delegation: verify RequestType propagation.

Ensure combined post-hook sets ExtraFields.RequestType=responses_stream; otherwise SSE formatting will be wrong.
tests/core-providers/scenarios/text_completion_stream.go (1)
413-417: Use stream request type in consolidated response

The consolidated response represents a streamed scenario. Set RequestType to TextCompletionStreamRequest for accuracy.
-        ExtraFields: schemas.BifrostResponseExtraFields{
-            Provider:    provider,
-            RequestType: schemas.TextCompletionRequest,
-        },
+        ExtraFields: schemas.BifrostResponseExtraFields{
+            Provider:    provider,
+            RequestType: schemas.TextCompletionStreamRequest,
+        },
core/providers/openai.go (1)
1057-1061: Responses streaming: include RawResponse when enabled

Parity with chat/text streaming; attach raw JSON to aid debugging/telemetry.
-            processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger)
+            if sendBackRawResponse {
+                response.ExtraFields.RawResponse = jsonData
+            }
+            processAndSendResponse(ctx, postHookRunner, &response, responseChan, logger)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f77d16 and 43efca8.

📒 Files selected for processing (48)

core/providers/azure.go (2 hunks)
core/providers/cerebras.go (2 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (2 hunks)
core/providers/mistral.go (2 hunks)
core/providers/ollama.go (2 hunks)
core/providers/openai.go (9 hunks)
core/providers/openrouter.go (2 hunks)
core/providers/parasail.go (2 hunks)
core/providers/sgl.go (2 hunks)
core/providers/utils.go (1 hunks)
core/providers/vertex.go (6 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/mux.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
tests/core-providers/anthropic_test.go (1 hunks)
tests/core-providers/azure_test.go (1 hunks)
tests/core-providers/bedrock_test.go (1 hunks)
tests/core-providers/cerebras_test.go (1 hunks)
tests/core-providers/cohere_test.go (1 hunks)
tests/core-providers/config/account.go (12 hunks)
tests/core-providers/custom_test.go (1 hunks)
tests/core-providers/gemini_test.go (1 hunks)
tests/core-providers/groq_test.go (1 hunks)
tests/core-providers/mistral_test.go (1 hunks)
tests/core-providers/ollama_test.go (1 hunks)
tests/core-providers/openai_test.go (1 hunks)
tests/core-providers/openrouter_test.go (1 hunks)
tests/core-providers/parasail_test.go (1 hunks)
tests/core-providers/scenarios/chat_completion_stream.go (1 hunks)
tests/core-providers/scenarios/responses_stream.go (1 hunks)
tests/core-providers/scenarios/text_completion_stream.go (1 hunks)
tests/core-providers/scenarios/utils.go (0 hunks)
tests/core-providers/sgl_test.go (1 hunks)
tests/core-providers/tests.go (2 hunks)
tests/core-providers/vertex_test.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (5)

tests/core-providers/scenarios/utils.go
core/schemas/providers/mistral/embedding.go
core/schemas/providers/mistral/types.go
transports/bifrost-http/handlers/server.go
plugins/telemetry/main.go

🚧 Files skipped from review as they are similar to previous changes (18)

core/providers/gemini.go
tests/core-providers/azure_test.go
tests/core-providers/groq_test.go
tests/core-providers/anthropic_test.go
tests/core-providers/parasail_test.go
tests/core-providers/mistral_test.go
core/schemas/speech.go
core/schemas/chatcompletions.go
core/schemas/providers/gemini/chat.go
core/providers/sgl.go
core/schemas/textcompletions.go
tests/core-providers/bedrock_test.go
tests/core-providers/gemini_test.go
tests/core-providers/cohere_test.go
core/providers/vertex.go
tests/core-providers/scenarios/responses_stream.go
tests/core-providers/tests.go
tests/core-providers/scenarios/chat_completion_stream.go

🧰 Additional context used

🧬 Code graph analysis (15)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/providers/groq.go (2)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/providers/utils.go (2)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/bifrost.go (4)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesStreamRequest (107-107)

core/providers/ollama.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Ollama (62-62)

BifrostResponsesRequest (197-203)

tests/core-providers/scenarios/text_completion_stream.go (4)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/textcompletions.go (2)

TextCompletionInput (9-12)

TextCompletionParameters (50-70)

core/schemas/bifrost.go (9)

BifrostTextCompletionRequest (134-140)

BifrostStream (383-386)

BifrostError (409-418)

TextCompletionStreamRequest (103-103)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

RequestType (99-99)

TextCompletionRequest (102-102)

core/schemas/chatcompletions.go (3)

BifrostStreamResponseChoice (376-378)

BifrostTextCompletionResponseChoice (365-367)

BifrostChatResponseChoice (355-363)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/mux.go (3)

core/schemas/bifrost.go (1)

BifrostResponse (238-254)

core/schemas/responses.go (16)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesMessage (280-292)

ResponsesMessageTypeMessage (256-256)

ResponsesInputMessageRoleAssistant (566-566)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeRefusalDelta (1639-1639)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesToolMessage (692-715)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeIncomplete (1627-1627)

ResponsesStreamResponseTypeOutputItemDone (1630-1630)

ResponsesStreamResponseStruct (1719-1722)

ResponsesResponseUsage (236-239)

ResponsesExtendedResponseUsage (229-234)

core/schemas/chatcompletions.go (1)

BifrostStreamResponseChoice (376-378)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/providers/azure.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (9)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

Azure (56-56)

EmbeddingRequest (108-108)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/cerebras.go (2)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/parasail.go (3)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

transports/bifrost-http/handlers/inference.go (1)

ResponsesRequest (195-199)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/openrouter.go (2)

core/schemas/bifrost.go (4)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

BifrostStream (383-386)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/providers/mistral.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Mistral (61-61)

BifrostResponsesRequest (197-203)

core/providers/openai.go (4)

core/schemas/bifrost.go (16)

TextCompletionStreamRequest (103-103)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

ErrorField (426-433)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

EmbeddingRequest (108-108)

core/schemas/provider.go (3)

PostHookRunner (191-191)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

🔇 Additional comments (19)

tests/core-providers/config/account.go (1)

22-24: LGTM: Field rename executed consistently.

The renaming of ChatCompletionStream to CompletionStream and addition of TextCompletionStream properly separates text completion streaming from chat completion streaming. This change is applied consistently across all provider configurations in the file.

core/schemas/transcriptions.go (1)

56-81: LGTM: Well-structured transcription response types.

The new BifrostTranscribe type with embedded pointers to streaming and non-streaming variants follows the established pattern in the codebase. This design allows a single type to flexibly represent both response formats.

core/schemas/bifrost.go (1)

253-253: LGTM: Responses streaming support added.

The embedded *ResponsesStreamResponse pointer enables streaming support for the Responses API while maintaining backward compatibility through nil-safe access.

core/schemas/responses.go (2)

88-88: LGTM: Error field added to response.

The addition of the Error field enables proper error reporting in responses while maintaining backward compatibility through the pointer type.

1617-1722: LGTM: Comprehensive streaming support added.

The streaming event types and response structures are now properly defined. The event type names appear to have been corrected from earlier review feedback:

Both .added and .delta variants exist where needed (e.g., response.output_text.added and response.output_text.delta)

Annotation events use proper dot notation (response.output_text.annotation.added)

Code interpreter events use correct underscore format (response.code_interpreter_call_code.delta)

The extensive set of constants covers the full spectrum of streaming events for the Responses API.

tests/core-providers/sgl_test.go (1)

28-28: LGTM: Test configuration updated for streaming refactor.

The field rename from ChatCompletionStream to CompletionStream is consistent with the broader refactoring effort.

tests/core-providers/ollama_test.go (1)

27-27: LGTM: Test configuration updated for streaming refactor.

The field rename is consistent with the broader refactoring effort.

tests/core-providers/openrouter_test.go (1)

32-32: LGTM: Test configuration updated for streaming refactor.

The field rename is consistent with the broader refactoring effort and was properly addressed in the previous review cycle.

tests/core-providers/vertex_test.go (1)

28-28: LGTM: Test configuration updated for streaming refactor.

The field rename from ChatCompletionStream to CompletionStream is consistent with the broader refactoring effort across all provider tests.

tests/core-providers/custom_test.go (1)

31-31: Rename to CompletionStream looks correct

Matches the new scenario flag naming and unified streaming path.

tests/core-providers/openai_test.go (1)

30-33: Streaming flags update LGTM

Enabling TextCompletionStream and CompletionStream aligns with the refactor.

tests/core-providers/cerebras_test.go (1)

26-29: Streaming flags update LGTM

Using TextCompletionStream and CompletionStream is consistent with the unified streaming setup.

transports/bifrost-http/handlers/inference.go (1)

828-896: Verify error chunks carry the correct RequestType
Confirm that every error emitted into the stream (i.e., via processAndSendError or post-hook) has its ExtraFields.RequestType set to ResponsesStreamRequest. Without that, the first-chunk–error edge case will misclassify the stream and emit a stray [DONE].

core/providers/openrouter.go (3)

118-130: LGTM: switched to chat-specific streaming handler.

Correct use of handleOpenAIChatCompletionStreaming with OpenRouter provider key.

146-153: RequestType is already set in the converter
getResponsesChunkConverterCombinedPostHookRunner invokes result.ToResponsesStream() and then sets result.ExtraFields.RequestType = schemas.ResponsesStreamRequest for every non-nil chunk. No changes needed.

132-145: No changes needed for OpenRouter Responses endpoint The current provider.networkConfig.BaseURL + "/alpha/responses" correctly yields https://openrouter.ai/api/alpha/responses per the OpenRouter docs.

core/providers/ollama.go (1)

118-136: LGTM: added chat-specific streaming with shared handler.

Consistent with other providers and uses correct provider key.

core/providers/mistral.go (1)

96-114: LGTM: added chat-specific streaming with shared handler.

Matches provider conventions and uses correct provider key.

transports/bifrost-http/integrations/utils.go (1)

456-625: No action needed: ExtraFields.RequestType is correctly set for responses streams

core/providers/openai.go

core/providers/utils.go

core/schemas/mux.go

tests/core-providers/scenarios/text_completion_stream.go

transports/bifrost-http/integrations/utils.go

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

transports/bifrost-http/handlers/inference.go (1)

695-699: Bug: multipart file may be partially read. Use io.ReadAll/io.ReadFull.

Single Read doesn’t guarantee full file contents. This can truncate audio and break transcription.

Apply this diff:

-  // Read file data
-  fileData := make([]byte, fileHeader.Size)
-  if _, err := file.Read(fileData); err != nil {
-    SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger)
-    return
-  }
+  // Read file data (ensure full read)
+  fileData, err := io.ReadAll(file)
+  if err != nil {
+    SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger)
+    return
+  }

♻️ Duplicate comments (6)

core/schemas/responses.go (1)
1635-1645: Streaming event string literals still incorrect.

The past review identified that the OpenAI Responses API uses response.output_text.delta and response.function_call_arguments.delta for streaming content, not .added variants. While Delta constants exist (lines 1636, 1644), the Added constants at lines 1635 and 1643 don't match the official API event names and could cause event mismatches.

Additionally, line 1678 uses an underscore in "response.output_text_annotation.added" but should use a dot: "response.output_text.annotation.added".

Apply this diff to fix the annotation event:
-	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text_annotation.added"
+	ResponsesStreamResponseTypeOutputTextAnnotationAdded ResponsesStreamResponseType = "response.output_text.annotation.added"
For the .added constants, either remove them if unused or verify they match actual API events. Based on official OpenAI documentation, only .delta variants exist for output_text and function_call_arguments.

Also applies to: 1678-1678
core/providers/utils.go (1)

552-575: Resolved: set RequestType on error chunks for responses streams

This implements the earlier feedback to tag error-first chunks, preventing SSE misclassification.
core/providers/vertex.go (1)
225-227: Remove incorrect fasthttp.ErrTimeout check.

This is the same issue flagged in previous reviews: Vertex provider uses net/http (not fasthttp), so the fasthttp.ErrTimeout check will never match and should be removed.

Apply this diff:
-	if errors.Is(err, fasthttp.ErrTimeout) || errors.Is(err, context.DeadlineExceeded) {
+	if errors.Is(err, context.DeadlineExceeded) {
		return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
	}
The same issue exists at lines 262, 516, and 547.
core/providers/openai.go (3)
632-632: Enlarge Scanner buffer to handle large streaming chunks.

The default bufio.Scanner buffer (64KB max) can fail on large SSE chunks containing extensive logprobs or deltas. A past review comment flagged this issue, but it remains unaddressed.

Apply this diff:
 		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
 		chunkIndex := -1
671-671: Fix TODO comment typo.

The TODO comment contains a typo: "WITHING" should be "WITHIN". A past review flagged this, but it remains unfixed.

Apply this diff:
-			//TODO: FIX THIS WITHING RESPONSE STRUCT
+			//TODO: FIX THIS WITHIN RESPONSE STRUCT
982-982: Enlarge Scanner buffer for Responses streaming.

Same issue as chat streaming: the default scanner buffer can fail on long response events.

Apply this diff:
 		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)

🧹 Nitpick comments (5)

tests/core-providers/tests.go (1)
71-87: Add TextCompletionStream to the printed summary

The runner exists, but the summary omits it. Add a row for visibility.
  testScenarios := []struct {
     name      string
     supported bool
   }{
     {"TextCompletion", testConfig.Scenarios.TextCompletion && testConfig.TextModel != ""},
+    {"TextCompletionStream", testConfig.Scenarios.TextCompletionStream},
     {"SimpleChat", testConfig.Scenarios.SimpleChat},
     {"CompletionStream", testConfig.Scenarios.CompletionStream},
core/providers/gemini.go (1)
194-201: Fix docstring provider name

Comment says “Anthropic’s API” in Gemini provider.
-// Responses performs a chat completion request to Anthropic's API.
+// Responses performs a chat completion request using Gemini's OpenAI‑compatible API.
transports/bifrost-http/handlers/inference.go (1)
840-846: Make requestType inference robust (fallback to response shape).

If ExtraFields.RequestType is missing on early chunks, Responses stream formatting won’t trigger. Fallback to detect Responses stream by presence of ResponsesStreamResponse.

Apply this diff:
-      if requestType == "" {
-        if chunk.BifrostResponse != nil {
-          requestType = chunk.BifrostResponse.ExtraFields.RequestType
-        } else if chunk.BifrostError != nil {
-          requestType = chunk.BifrostError.ExtraFields.RequestType
-        }
-      }
+      if requestType == "" {
+        if chunk.BifrostResponse != nil {
+          // Prefer explicit request type
+          requestType = chunk.BifrostResponse.ExtraFields.RequestType
+          // Fallback: infer from shape (Responses stream emits ResponsesStreamResponse)
+          if requestType == "" && chunk.BifrostResponse.ResponsesStreamResponse != nil {
+            requestType = schemas.ResponsesStreamRequest
+          }
+        } else if chunk.BifrostError != nil {
+          requestType = chunk.BifrostError.ExtraFields.RequestType
+        }
+      }
core/providers/sgl.go (1)
119-137: Prefer provider.GetProviderKey() for consistency.

Use the instance method for provider identity, matching the rest of the file.

Apply this diff:
-    schemas.SGL,
+    provider.GetProviderKey(),
core/providers/ollama.go (1)
118-136: Use provider.GetProviderKey() for consistency.

Minor consistency tweak across methods.

Apply this diff:
-    schemas.Ollama,
+    provider.GetProviderKey(),

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 43efca8 and d4f79e1.

📒 Files selected for processing (48)

core/providers/azure.go (2 hunks)
core/providers/cerebras.go (2 hunks)
core/providers/gemini.go (1 hunks)
core/providers/groq.go (2 hunks)
core/providers/mistral.go (2 hunks)
core/providers/ollama.go (2 hunks)
core/providers/openai.go (9 hunks)
core/providers/openrouter.go (2 hunks)
core/providers/parasail.go (2 hunks)
core/providers/sgl.go (2 hunks)
core/providers/utils.go (1 hunks)
core/providers/vertex.go (6 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/mux.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
tests/core-providers/anthropic_test.go (1 hunks)
tests/core-providers/azure_test.go (1 hunks)
tests/core-providers/bedrock_test.go (1 hunks)
tests/core-providers/cerebras_test.go (1 hunks)
tests/core-providers/cohere_test.go (1 hunks)
tests/core-providers/config/account.go (12 hunks)
tests/core-providers/custom_test.go (1 hunks)
tests/core-providers/gemini_test.go (1 hunks)
tests/core-providers/groq_test.go (1 hunks)
tests/core-providers/mistral_test.go (1 hunks)
tests/core-providers/ollama_test.go (1 hunks)
tests/core-providers/openai_test.go (1 hunks)
tests/core-providers/openrouter_test.go (1 hunks)
tests/core-providers/parasail_test.go (1 hunks)
tests/core-providers/scenarios/chat_completion_stream.go (1 hunks)
tests/core-providers/scenarios/responses_stream.go (1 hunks)
tests/core-providers/scenarios/text_completion_stream.go (1 hunks)
tests/core-providers/scenarios/utils.go (0 hunks)
tests/core-providers/sgl_test.go (1 hunks)
tests/core-providers/tests.go (2 hunks)
tests/core-providers/vertex_test.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (5)

transports/bifrost-http/handlers/server.go
plugins/telemetry/main.go
core/schemas/providers/mistral/types.go
core/schemas/providers/mistral/embedding.go
tests/core-providers/scenarios/utils.go

🚧 Files skipped from review as they are similar to previous changes (10)

tests/core-providers/gemini_test.go
tests/core-providers/config/account.go
tests/core-providers/cerebras_test.go
tests/core-providers/cohere_test.go
tests/core-providers/custom_test.go
tests/core-providers/mistral_test.go
core/schemas/providers/gemini/chat.go
tests/core-providers/bedrock_test.go
core/schemas/embedding.go
tests/core-providers/groq_test.go

🧰 Additional context used

🧬 Code graph analysis (21)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/providers/openrouter.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (4)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

BifrostStream (383-386)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/providers/utils.go (2)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/bifrost.go (4)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesStreamRequest (107-107)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

core/schemas/mux.go (3)

core/schemas/bifrost.go (3)

BifrostResponse (238-254)

RequestType (99-99)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (16)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesMessage (280-292)

ResponsesMessageTypeMessage (256-256)

ResponsesInputMessageRoleAssistant (566-566)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeRefusalDelta (1639-1639)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesToolMessage (692-715)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeIncomplete (1627-1627)

ResponsesStreamResponseTypeOutputItemDone (1630-1630)

ResponsesStreamResponseStruct (1719-1722)

ResponsesResponseUsage (236-239)

ResponsesExtendedResponseUsage (229-234)

core/schemas/chatcompletions.go (1)

BifrostStreamResponseChoice (376-378)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/providers/vertex.go (5)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (2)

Key (8-17)

VertexKeyConfig (35-39)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Vertex (60-60)

BifrostResponsesRequest (197-203)

core/schemas/providers/anthropic/chat.go (1)

ToAnthropicChatCompletionRequest (365-596)

core/utils.go (1)

Ptr (12-14)

tests/core-providers/scenarios/responses_stream.go (4)

core/bifrost.go (1)

Bifrost (33-50)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/responses.go (21)

ResponsesMessage (280-292)

ResponsesMessageContent (573-576)

ResponsesParameters (33-60)

ResponsesStreamResponseType (1620-1620)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesStreamResponseTypeContentPartAdded (1632-1632)

ResponsesStreamResponseTypeCreated (1623-1623)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesTool (1341-1356)

ResponsesToolFunction (1359-1362)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesParametersReasoning (210-214)

ResponsesStreamResponseTypeReasoningSummaryPartAdded (1653-1653)

ResponsesStreamResponseTypeReasoningSummaryTextDelta (1655-1655)

ResponsesMessageTypeReasoning (273-273)

ResponsesResponse (85-114)

ResponsesMessageTypeMessage (256-256)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostStream (383-386)

BifrostError (409-418)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

tests/core-providers/scenarios/text_completion_stream.go (5)

core/bifrost.go (1)

Bifrost (33-50)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/textcompletions.go (2)

TextCompletionInput (9-12)

TextCompletionParameters (50-70)

core/schemas/bifrost.go (9)

BifrostTextCompletionRequest (134-140)

BifrostStream (383-386)

BifrostError (409-418)

TextCompletionStreamRequest (103-103)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

RequestType (99-99)

TextCompletionRequest (102-102)

core/schemas/chatcompletions.go (3)

BifrostStreamResponseChoice (376-378)

BifrostTextCompletionResponseChoice (365-367)

BifrostChatResponseChoice (355-363)

core/providers/azure.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (8)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

Azure (56-56)

EmbeddingRequest (108-108)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/sgl.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

SGL (64-64)

BifrostResponsesRequest (197-203)

core/providers/groq.go (2)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/providers/cerebras.go (2)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/ollama.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Ollama (62-62)

BifrostResponsesRequest (197-203)

core/providers/parasail.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

tests/core-providers/tests.go (3)

tests/core-providers/scenarios/text_completion_stream.go (1)

RunTextCompletionStreamTest (16-399)

tests/core-providers/scenarios/chat_completion_stream.go (1)

RunChatCompletionStreamTest (16-298)

tests/core-providers/scenarios/responses_stream.go (1)

RunResponsesStreamTest (17-467)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/providers/openai.go (5)

core/schemas/bifrost.go (19)

TextCompletionStreamRequest (103-103)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostErrorExtraFields (435-439)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/provider.go (7)

PostHookRunner (191-191)

ErrProviderJSONMarshaling (26-26)

ErrRequestCancelled (23-23)

ErrProviderRequestTimedOut (22-22)

ErrProviderRequest (24-24)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesStreamResponseTypeCompleted (1625-1625)

core/providers/mistral.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Mistral (61-61)

BifrostResponsesRequest (197-203)

🔇 Additional comments (44)

core/schemas/speech.go (1)

86-91: Pointer embedding is safe; nil-safety checks are in place.

Code consistently checks BifrostSpeechStreamResponse for nil before accessing Type, preventing runtime panics—no changes needed.

core/schemas/textcompletions.go (1)

71-78: LGTM! Clean schema addition for log probability support.

The new TextCompletionLogProb type is well-structured with proper JSON tags and aligns with the PR's broader streaming schema enhancements.

core/schemas/transcriptions.go (1)

55-81: LGTM! Well-designed transcription response structure.

The separation of common fields from streaming/non-streaming specifics via embedded pointers is clean and follows the patterns established in other schema files in this PR.

core/schemas/responses.go (1)

88-88: LGTM! Error field addition is correct.

The Error field properly uses omitempty and follows the established pattern.

tests/core-providers/scenarios/chat_completion_stream.go (1)

17-17: LGTM! Test config field rename aligns with streaming API refactor.

The change from ChatCompletionStream to CompletionStream is part of the broader streaming API standardization across the codebase.

tests/core-providers/sgl_test.go (1)

28-28: LGTM! Consistent test config update.

The field rename matches the streaming API standardization across all test configurations.

tests/core-providers/parasail_test.go (1)

27-27: LGTM! Test config updated consistently.

The field rename is part of the coordinated streaming API refactor across all provider tests.

tests/core-providers/openrouter_test.go (1)

32-32: LGTM! Test config field rename completed.

The field rename from ChatCompletionStream to CompletionStream aligns with the streaming API standardization. This was previously reviewed and addressed.

tests/core-providers/ollama_test.go (1)

27-27: LGTM! Final test config update looks good.

The field rename completes the streaming API standardization across all provider test configurations.

tests/core-providers/anthropic_test.go (1)

28-28: LGTM: migrated to CompletionStream flag

Matches the new unified streaming gating across scenarios.

tests/core-providers/tests.go (1)

29-33: Good: added text and responses stream scenario runners

New runners are properly wired into the comprehensive suite.

tests/core-providers/azure_test.go (1)

28-28: LGTM: migrated to CompletionStream flag

Consistent with other providers and updated scenarios.

tests/core-providers/vertex_test.go (1)

28-28: LGTM: migrated to CompletionStream flag

Aligned with the new streaming architecture.

tests/core-providers/openai_test.go (1)

30-33: LGTM: added TextCompletionStream and unified CompletionStream

Covers both text and chat streaming paths for OpenAI.

core/providers/gemini.go (2)

180-192: LGTM: switched to chat-specific streaming handler

handleOpenAIChatCompletionStreaming is the right path for OpenAI‑compatible chat SSE.

919-921: Intentional: ResponsesStream unsupported for Gemini

If this is expected, consider noting it in provider capabilities/config to avoid accidental test invocation.

transports/bifrost-http/integrations/utils.go (2)

456-472: Detect request type from first chunk

Solid approach; keeps formatting decisions consistent throughout the stream.

Please confirm all providers’ responses/responses_stream post-hooks propagate ExtraFields.RequestType on both success and error chunks (using the new combined post-hook), so detection never falls back ambiguously.

568-609: Correct Responses API SSE: event lines + no [DONE]

Emits event: lines and data: payload for Responses streams.

Suppresses [DONE] for responses API.

Looks correct per OpenAI Responses streaming semantics.

Also applies to: 619-626

core/providers/sgl.go (1)

153-160: Confirm ResponsesStream sets requestType=responses_stream on first chunk.

HTTP SSE formatting depends on this to emit event lines and omit [DONE]. Verify getResponsesChunkConverterCombinedPostHookRunner sets ExtraFields.RequestType accordingly for the initial chunk (including error paths).

core/schemas/chatcompletions.go (1)

351-402: LGTM: union choice types and deltas are clear and align with streaming needs.

Structure and tags look good.

core/providers/openrouter.go (3)

118-129: LGTM: shared chat streaming handler usage.

Consistent with other providers.

146-153: Confirm ResponsesStream requestType and consider direct Responses streaming.

Ensure the post-hook sets requestType=responses_stream on the first chunk (to drive SSE event lines and omit [DONE]).

If OpenRouter supports direct Responses streaming, consider using it for parity (event fidelity, server-side behavior).

132-144: No changes needed for OpenRouter Responses endpoint.
‘/alpha/responses’ correctly maps to https://openrouter.ai/api/alpha/responses and supports both non-streaming (default) and streaming via stream: true.

core/providers/ollama.go (1)

152-159: Confirm ResponsesStream sets requestType=responses_stream on first chunk.

Needed by HTTP SSE layer to format events and avoid [DONE].

core/providers/mistral.go (2)

96-114: LGTM!

The streaming implementation correctly uses handleOpenAIChatCompletionStreaming with proper endpoint, headers, and parameters.

130-137: LGTM!

The ResponsesStream implementation properly delegates to ChatCompletionStream with the responses-specific post-hook converter, following the established pattern across providers.

core/providers/vertex.go (2)

349-437: LGTM!

The streaming implementation properly handles both Claude (Anthropic-style) and non-Claude (OpenAI-style) models with appropriate validation, authentication, and streaming paths.

453-460: LGTM!

The ResponsesStream implementation correctly delegates to ChatCompletionStream with the responses-specific post-hook converter.

core/schemas/mux.go (1)

885-1027: LGTM! Previous issue has been resolved.

The ToResponsesStream method now correctly sets br.ExtraFields.RequestType = ResponsesStreamRequest at line 1024, addressing the concern raised in previous reviews. The implementation comprehensively handles various streaming delta types (role, content, thought, refusal, tool calls) and properly maps finish reasons to completion states.

tests/core-providers/scenarios/responses_stream.go (2)

17-467: LGTM!

The test implementation is comprehensive and well-structured, covering:

Basic streaming with proper timeout handling and context cancellation

Tool calls validation when supported

Reasoning scenarios with appropriate event tracking

Sequence number validation and event type tracking

Good error handling and diagnostic logging

The streaming validation logic properly handles the Responses API streaming format with appropriate checks for content, events, and metadata.

470-637: LGTM!

The helper functions provide clear validation logic for responses streaming:

validateResponsesStreamingStructure: validates sequence ordering and event distribution

createConsolidatedResponsesResponse: constructs consolidated response for validation

validateResponsesStreamingResponse: performs comprehensive streaming validation

The validation approach is thorough and handles provider-specific variations appropriately.

core/providers/cerebras.go (2)

120-132: LGTM!

The streaming implementation correctly uses handleOpenAIChatCompletionStreaming for Cerebras's OpenAI-compatible streaming format.

134-160: LGTM!

The Responses and ResponsesStream implementations follow the established pattern:

Responses: properly converts chat response to responses format with correct metadata

ResponsesStream: correctly delegates through ChatCompletionStream with the responses-specific post-hook converter

Embedding: appropriately returns unsupported error for Cerebras

core/providers/parasail.go (1)

102-142: LGTM! Clean implementation of Responses API for Parasail provider.

The changes correctly implement the Responses and ResponsesStream methods by routing through chat completion endpoints with appropriate conversions. The use of getResponsesChunkConverterCombinedPostHookRunner ensures streaming chunks are properly transformed to the Responses format.

core/providers/groq.go (1)

160-200: LGTM! Consistent implementation across providers.

The Groq provider changes mirror the Parasail implementation, maintaining consistency across the codebase. The routing of Responses API calls through chat completion endpoints is the correct approach for providers without native Responses support.

core/providers/azure.go (1)

320-393: LGTM! Azure provider fully implements Responses and Embedding APIs.

The changes correctly implement the Responses streaming architecture while also providing full Embedding support (unlike other providers which stub it out). The implementation properly leverages Azure's deployment-based URL construction and maintains consistency with Azure's authentication patterns.

core/schemas/bifrost.go (1)

253-253: LGTM! Embedded pointer enables Responses streaming support.

Adding *ResponsesStreamResponse as an embedded pointer allows BifrostResponse to carry streaming-specific data for the Responses API. This follows the established pattern of using embedded pointers (like *ResponsesResponse on line 252) to extend the response structure.

tests/core-providers/scenarios/text_completion_stream.go (3)

114-121: LGTM! Nil Delta check properly implemented.

The code now correctly guards against nil Delta before accessing Delta.Content (line 117: if delta != nil && delta.Content != nil). This addresses the past review comment.

269-276: LGTM! Nil Delta check properly implemented in varied prompts loop.

The nil check for Delta is correctly implemented here as well (line 272: if delta != nil && delta.Content != nil), addressing the past review concern.

402-433: LGTM! Well-structured consolidation helper.

The createConsolidatedTextCompletionResponse function properly constructs a synthetic response for validation, correctly copying metadata from the last streaming response when available.

core/providers/openai.go (4)

737-740: LGTM! Nil Delta guard prevents potential panic.

The code now properly checks for nil Delta before accessing its fields, addressing the past review comment.

990-1011: LGTM! Event line handling is correct for Responses API format.

The special handling for event: lines (skipping them) is intentional and correct for the OpenAI Responses API streaming format, which uses SSE with separate event and data lines.

1029-1054: LGTM! Error handling properly sets metadata for transport layer.

The error handling correctly populates ExtraFields with RequestType, Provider, and ModelRequested (lines 1034-1038), and properly extracts the error code (lines 1047-1049). This ensures the transport layer can emit correct SSE events.

1083-1179: LGTM! Embedding implementation follows established patterns.

The Embedding method and handleOpenAIEmbeddingRequest handler are well-structured and consistent with other request handlers in the provider.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

core/schemas/speech.go (1)

56-84: Bug: UnmarshalJSON doesn’t handle single VoiceConfig object (only string or array).

Comment says “object/array,” but a single object isn’t parsed. Add object handling.

Apply this diff:

 func (vi *SpeechVoiceInput) UnmarshalJSON(data []byte) error {
   // Reset receiver state before attempting any decode to avoid stale data
   vi.Voice = nil
   vi.MultiVoiceConfig = nil

   // First, try to unmarshal as a direct string
   var stringContent string
   if err := sonic.Unmarshal(data, &stringContent); err == nil {
     vi.Voice = &stringContent
     return nil
   }

+  // Try to unmarshal as a single VoiceConfig object
+  var single VoiceConfig
+  if err := sonic.Unmarshal(data, &single); err == nil {
+    if single.Voice == "" {
+      return fmt.Errorf("voice config has empty voice field")
+    }
+    vi.MultiVoiceConfig = []VoiceConfig{single}
+    return nil
+  }
+
   // Try to unmarshal as an array of VoiceConfig objects
   var voiceConfigs []VoiceConfig
   if err := sonic.Unmarshal(data, &voiceConfigs); err == nil {
     // Validate each VoiceConfig and build a new slice deterministically
     validConfigs := make([]VoiceConfig, 0, len(voiceConfigs))
     for _, config := range voiceConfigs {
       if config.Voice == "" {
         return fmt.Errorf("voice config has empty voice field")
       }
       validConfigs = append(validConfigs, config)
     }
     vi.MultiVoiceConfig = validConfigs
     return nil
   }

-  return fmt.Errorf("voice field is neither a string, nor an array of VoiceConfig objects")
+  return fmt.Errorf("voice field is neither a string, a VoiceConfig object, nor an array of VoiceConfig objects")
 }

core/providers/groq.go (1)

36-39: http.Client.Timeout will prematurely cut off SSE streams.

A finite Timeout aborts long-running streams. Use 0 (no overall timeout) or a dedicated streaming timeout plus per-phase timeouts (Dial/TLS/ResponseHeaderTimeout) on Transport.

Apply this minimal safe change:
- streamClient := &http.Client{
-   Timeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
- }
+ streamClient := &http.Client{
+   // Do not set an overall timeout for streaming; rely on transport-level timeouts.
+   Timeout: 0,
+}
If you have a stream-specific timeout in config, prefer that instead.

core/providers/openrouter.go (1)

36-39: http.Client.Timeout will prematurely cut off SSE streams.

Same as Groq: avoid overall Timeout for streaming.
- streamClient := &http.Client{
-   Timeout: time.Second * time.Duration(config.NetworkConfig.DefaultRequestTimeoutInSeconds),
- }
+ streamClient := &http.Client{
+   Timeout: 0,
+}

♻️ Duplicate comments (4)

core/schemas/responses.go (1)

1617-1722: Note: Streaming event types have open discussion from past reviews.

The streaming response types and constants are comprehensive. However, past review comments raised concerns about specific event type string literals (lines 1635, 1643) using "added" vs "delta" suffixes, with extensive discussion between reviewer and author. This remains an open item from previous reviews.
core/providers/openai.go (3)
671-671: Fix TODO comment typo.

The TODO comment contains a typo: "WITHING" should be "WITHIN". This was previously flagged and marked as addressed but the typo remains.

Apply this diff:
-			//TODO: FIX THIS WITHING RESPONSE STRUCT
+			//TODO: FIX THIS WITHIN RESPONSE STRUCT
632-632: Enlarge scanner buffer to prevent errors on large chunks.

The bufio.NewScanner uses a default 64KB max buffer, which can cause scan errors when the API returns large chunks (e.g., extensive logprobs, large tool call arguments, or verbose content). This was previously flagged and marked as addressed but the fix is not present in the current code.

Apply this diff to enlarge the buffer:
 		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
 		chunkIndex := -1
982-982: Enlarge scanner buffer for Responses streaming.

Similar to chat streaming, the Responses API scanner needs a larger buffer to handle long response events. The default 64KB max can cause scan errors on large structured payloads.

Apply this diff to enlarge the buffer:
 		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)

🧹 Nitpick comments (8)

core/providers/mistral.go (1)

100-114: Avoid client-wide Timeout for SSE; rely on ctx instead.

A non-zero http.Client.Timeout can abort long-lived SSE. Prefer Timeout=0 on the streaming client and enforce limits via context deadlines per request.

core/providers/vertex.go (4)

224-227: Remove http.ErrHandlerTimeout check; use context deadline.

http.ErrHandlerTimeout is server-side; client timeouts surface as context.DeadlineExceeded (or net.Error Timeout). Keep only the context check.

Apply this diff:

-        if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) {
+        if errors.Is(err, context.DeadlineExceeded) {
             return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
         }

261-263: Same here: drop http.ErrHandlerTimeout check.

Use only context.DeadlineExceeded (or a net.Error timeout test).

-        if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) {
+        if errors.Is(err, context.DeadlineExceeded) {
             return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
         }

515-517: Timeout check: remove http.ErrHandlerTimeout here too.

-        if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) {
+        if errors.Is(err, context.DeadlineExceeded) {
             return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
         }

546-548: Timeout check: same fix for Do error branch.

-        if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) {
+        if errors.Is(err, context.DeadlineExceeded) {
             return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
         }

core/providers/groq.go (2)

41-49: Proxy not applied to streamClient (net/http).

Only the fasthttp client is proxied; streaming requests may bypass proxies. Mirror proxy settings on streamClient.Transport.

Example:

 streamClient := &http.Client{
-  Timeout: 0,
+  Timeout: 0,
+  Transport: &http.Transport{
+    Proxy: http.ProxyFromEnvironment, // or build from config.ProxyConfig
+    // set Dial/TLS/Idle timeouts as needed
+  },
 }

119-136: Avoid shadowing channel var in range.

for response := range response { … } is confusing. Rename the loop variable.

- for response := range response {
-   if response.BifrostError != nil {
-     responseChan <- response
+ for chunk := range response {
+   if chunk.BifrostError != nil {
+     responseChan <- chunk
      continue
    }
-   response.ToTextCompletionResponse()
-   if response.BifrostResponse != nil {
-     response.BifrostResponse.ExtraFields.RequestType = schemas.TextCompletionRequest
-     response.BifrostResponse.ExtraFields.Provider = provider.GetProviderKey()
-     response.BifrostResponse.ExtraFields.ModelRequested = request.Model
+   chunk.ToTextCompletionResponse()
+   if chunk.BifrostResponse != nil {
+     chunk.BifrostResponse.ExtraFields.RequestType = schemas.TextCompletionRequest
+     chunk.BifrostResponse.ExtraFields.Provider = provider.GetProviderKey()
+     chunk.BifrostResponse.ExtraFields.ModelRequested = request.Model
    }
-   responseChan <- response
+   responseChan <- chunk
 }

core/providers/openrouter.go (1)

41-43: Apply proxy config to streamClient as well.

Streaming requests should honor the same proxy settings.
 streamClient := &http.Client{
   Timeout: 0,
+  Transport: &http.Transport{
+    Proxy: http.ProxyFromEnvironment, // or from config.ProxyConfig
+  },
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d4f79e1 and 39465aa.

📒 Files selected for processing (51)

core/providers/anthropic.go (2 hunks)
core/providers/azure.go (2 hunks)
core/providers/bedrock.go (2 hunks)
core/providers/cerebras.go (2 hunks)
core/providers/cohere.go (2 hunks)
core/providers/gemini.go (5 hunks)
core/providers/groq.go (2 hunks)
core/providers/mistral.go (2 hunks)
core/providers/ollama.go (2 hunks)
core/providers/openai.go (16 hunks)
core/providers/openrouter.go (2 hunks)
core/providers/parasail.go (2 hunks)
core/providers/sgl.go (2 hunks)
core/providers/utils.go (1 hunks)
core/providers/vertex.go (6 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/mux.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
tests/core-providers/anthropic_test.go (1 hunks)
tests/core-providers/azure_test.go (1 hunks)
tests/core-providers/bedrock_test.go (1 hunks)
tests/core-providers/cerebras_test.go (1 hunks)
tests/core-providers/cohere_test.go (1 hunks)
tests/core-providers/config/account.go (12 hunks)
tests/core-providers/custom_test.go (1 hunks)
tests/core-providers/gemini_test.go (1 hunks)
tests/core-providers/groq_test.go (1 hunks)
tests/core-providers/mistral_test.go (1 hunks)
tests/core-providers/ollama_test.go (1 hunks)
tests/core-providers/openai_test.go (1 hunks)
tests/core-providers/openrouter_test.go (1 hunks)
tests/core-providers/parasail_test.go (1 hunks)
tests/core-providers/scenarios/chat_completion_stream.go (1 hunks)
tests/core-providers/scenarios/responses_stream.go (1 hunks)
tests/core-providers/scenarios/text_completion_stream.go (1 hunks)
tests/core-providers/scenarios/utils.go (0 hunks)
tests/core-providers/sgl_test.go (1 hunks)
tests/core-providers/tests.go (2 hunks)
tests/core-providers/vertex_test.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (5)

core/schemas/providers/mistral/types.go
core/schemas/providers/mistral/embedding.go
tests/core-providers/scenarios/utils.go
plugins/telemetry/main.go
transports/bifrost-http/handlers/server.go

🚧 Files skipped from review as they are similar to previous changes (15)

core/schemas/providers/gemini/chat.go
tests/core-providers/tests.go
tests/core-providers/scenarios/chat_completion_stream.go
tests/core-providers/gemini_test.go
core/providers/gemini.go
tests/core-providers/openrouter_test.go
core/providers/ollama.go
tests/core-providers/openai_test.go
core/providers/sgl.go
core/schemas/chatcompletions.go
tests/core-providers/groq_test.go
transports/bifrost-http/integrations/utils.go
tests/core-providers/parasail_test.go
tests/core-providers/scenarios/text_completion_stream.go
tests/core-providers/vertex_test.go

🧰 Additional context used

🧬 Code graph analysis (16)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (2)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeError (1685-1685)

core/providers/utils.go (2)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/bifrost.go (4)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesStreamRequest (107-107)

tests/core-providers/scenarios/responses_stream.go (3)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/responses.go (20)

ResponsesMessage (280-292)

ResponsesMessageContent (573-576)

ResponsesParameters (33-60)

ResponsesStreamResponseType (1620-1620)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesStreamResponseTypeContentPartAdded (1632-1632)

ResponsesStreamResponseTypeCreated (1623-1623)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesTool (1341-1356)

ResponsesToolFunction (1359-1362)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesParametersReasoning (210-214)

ResponsesStreamResponseTypeReasoningSummaryPartAdded (1653-1653)

ResponsesStreamResponseTypeReasoningSummaryTextDelta (1655-1655)

ResponsesMessageTypeReasoning (273-273)

ResponsesResponse (85-114)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostStream (383-386)

BifrostError (409-418)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

SGL (64-64)

core/providers/openrouter.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (4)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

BifrostStream (383-386)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/providers/vertex.go (5)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (2)

Key (8-17)

VertexKeyConfig (35-39)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Vertex (60-60)

BifrostResponsesRequest (197-203)

core/schemas/providers/anthropic/chat.go (1)

ToAnthropicChatCompletionRequest (365-596)

core/utils.go (1)

Ptr (12-14)

core/providers/parasail.go (2)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/mistral.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Mistral (61-61)

BifrostResponsesRequest (197-203)

core/providers/cerebras.go (2)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/schemas/transcriptions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/providers/groq.go (2)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/openai.go (5)

core/schemas/bifrost.go (17)

TextCompletionStreamRequest (103-103)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

ErrorField (426-433)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostErrorExtraFields (435-439)

EmbeddingRequest (108-108)

core/schemas/provider.go (2)

PostHookRunner (191-191)

Provider (194-219)

core/schemas/chatcompletions.go (1)

ChatStreamOptions (43-46)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesStreamResponseTypeCompleted (1625-1625)

core/schemas/mux.go (3)

core/schemas/bifrost.go (3)

BifrostResponse (238-254)

RequestType (99-99)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (16)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesMessage (280-292)

ResponsesMessageTypeMessage (256-256)

ResponsesInputMessageRoleAssistant (566-566)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeRefusalDelta (1639-1639)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesToolMessage (692-715)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeIncomplete (1627-1627)

ResponsesStreamResponseTypeOutputItemDone (1630-1630)

ResponsesStreamResponseStruct (1719-1722)

ResponsesResponseUsage (236-239)

ResponsesExtendedResponseUsage (229-234)

core/schemas/chatcompletions.go (1)

BifrostStreamResponseChoice (376-378)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

core/providers/azure.go (2)

core/schemas/bifrost.go (9)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

Azure (56-56)

EmbeddingRequest (108-108)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/responses.go (2)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/bifrost.go (1)

LogProbs (341-346)

🔇 Additional comments (40)

tests/core-providers/sgl_test.go (1)

28-28: Flag rename is correct
The CompletionStream flag consistently replaces ChatCompletionStream across tests and matches the TestScenarios definition for both chat and responses streaming; no change needed.

core/providers/anthropic.go (1)

398-398: LGTM: Timeout handling updated to standard library.

The migration from fasthttp.ErrTimeout to http.ErrHandlerTimeout aligns with the repository-wide timeout handling standardization. The logic remains unchanged—only the error sentinel has been swapped.

Also applies to: 425-425

core/providers/cohere.go (1)

380-380: LGTM: Timeout handling migrated to http package.

The replacement of fasthttp.ErrTimeout with http.ErrHandlerTimeout is consistent with the timeout handling standardization across providers.

Also applies to: 408-408

core/providers/bedrock.go (1)

158-158: LGTM: Complete migration from fasthttp timeout handling.

The removal of the fasthttp import and migration to http.ErrHandlerTimeout completes the timeout handling standardization for this provider.

Also applies to: 473-473

tests/core-providers/config/account.go (2)

22-24: LGTM: Test configuration fields updated consistently.

The addition of TextCompletionStream and CompletionStream (replacing ChatCompletionStream) aligns with the broader streaming surface expansion across the test suite.

466-768: LGTM: Provider configurations updated consistently.

All provider test configurations have been consistently updated to use CompletionStream instead of ChatCompletionStream, with appropriate values for each provider's streaming capabilities.

core/schemas/embedding.go (1)

97-145: LGTM: Embedding types and JSON handling implemented correctly.

The BifrostEmbedding and BifrostEmbeddingResponse types with their JSON marshaling methods provide flexible representation for embedding data (string, 1D array, or 2D array).

Note: A previous review raised a concern about UnmarshalJSON not resetting fields before decoding, which could cause stale state if instances are reused. The maintainer confirmed that instances are not reused in the current codebase, which addresses the concern.

tests/core-providers/ollama_test.go (1)

27-27: LGTM: Test configuration field updated.

The change from ChatCompletionStream to CompletionStream is consistent with the test configuration refactoring across all provider tests.

tests/core-providers/cohere_test.go (1)

28-28: LGTM: Test configuration field updated.

The field rename from ChatCompletionStream to CompletionStream matches the unified streaming flag naming across the test suite.

tests/core-providers/bedrock_test.go (1)

33-33: LGTM: Test configuration field updated.

The change from ChatCompletionStream to CompletionStream completes the consistent test configuration migration across all provider tests.

tests/core-providers/mistral_test.go (1)

28-28: LGTM on field rename.

Consistent with the new TestScenarios API.

core/schemas/speech.go (2)

16-19: No semantic change.

Whitespace-only adjustments.

86-95: BifrostSpeech struct verified; no duplicates found
Audio ([]byte) base64-encodes as expected; embedded *BifrostSpeechStreamResponse omits nil fields.

tests/core-providers/anthropic_test.go (1)

28-28: LGTM on field rename.

Matches updated TestScenarios schema.

tests/core-providers/custom_test.go (1)

31-31: LGTM on CompletionStream usage.

Consistent with other providers’ tests.

core/providers/mistral.go (1)

130-137: ResponsesStream wiring looks good.

Delegation via getResponsesChunkConverterCombinedPostHookRunner is consistent with other providers.

tests/core-providers/scenarios/responses_stream.go (1)

318-325: Verify tool call name path.

Does ResponsesMessage expose Name at the top-level? If Name is nested (e.g., via ResponsesToolMessage), adjust access accordingly to avoid nil/compile issues.

core/providers/vertex.go (1)

452-459: ResponsesStream wiring looks good.

Delegation via combined post-hook is consistent with other providers.

core/providers/parasail.go (2)

101-114: LGTM on streaming handler swap.

OpenAI-compatible streaming path and headers are correct.

116-128: Responses/ResponsesStream/Embedding surfaces look consistent.

Responses adapts Chat->Responses; ResponsesStream delegates with post-hook; Embedding correctly returns unsupported.

Also applies to: 130-137, 139-142

core/providers/cerebras.go (2)

120-132: LGTM on streaming handler swap.

Switch to handleOpenAIChatCompletionStreaming is consistent.

134-146: Responses/ResponsesStream/Embedding surfaces look consistent.

Adapters mirror other providers; unsupported Embedding is explicit.

Also applies to: 148-155, 157-160

core/providers/azure.go (3)

320-331: LGTM on chat streaming refactor.

Handler swap preserves Azure auth header handling and deployment URL.

334-349: Responses and ResponsesStream wiring look good.

Uses Chat path conversion and combined post-hook runner as elsewhere.

Also applies to: 351-358

360-394: Embedding implementation looks correct.

Conversion, completeRequest reuse, metadata, and optional RawResponse handled properly.

core/schemas/bifrost.go (1)

253-253: LGTM: Clean addition for streaming responses support.

The embedded *ResponsesStreamResponse pointer follows the existing pattern and enables the Responses API streaming surface without breaking changes. The nil-gated pointer ensures backward compatibility.

core/schemas/responses.go (1)

88-88: LGTM: Error field addition is well-structured.

Adding Error *ResponsesResponseError to ResponsesResponse enables proper error handling in non-streaming responses. The pointer type correctly allows optional presence.

core/providers/openai.go (3)

259-259: LGTM: Comprehensive timeout error handling.

The additions of http.ErrHandlerTimeout checks alongside context.DeadlineExceeded provide proper coverage for both context-based and HTTP client timeouts across all streaming handlers. This prevents silent failures and ensures consistent error reporting.

Also applies to: 587-587, 936-936, 963-963, 1296-1296, 1323-1323, 1575-1575, 1602-1602

303-305: LGTM: Modern Go idiom for prefix handling.

Using strings.CutPrefix with the ok-check is the preferred Go 1.20+ idiom. It's more efficient and explicit than TrimPrefix and properly distinguishes between "prefix present" and "prefix absent" cases.

Also applies to: 658-663

511-533: LGTM: Well-structured new public API methods.

The new and refactored methods follow consistent patterns:

ChatCompletionStream properly delegates to shared handleOpenAIChatCompletionStreaming

Responses and ResponsesStream follow the same patterns as chat methods

Embedding uses the standard request handler pattern

All methods properly validate operations and set metadata

The refactoring to shared handlers reduces duplication and improves maintainability.

Also applies to: 772-789, 867-886, 1083-1104

core/providers/groq.go (4)

160-171: Streaming handler switch looks good.

Using handleOpenAIChatCompletionStreaming aligns with the unified streaming path.

174-186: Responses path conversion is correct.

Chat→Responses conversion with metadata set (RequestType/Provider/Model) is consistent.

188-195: ResponsesStream delegation is appropriate.

Reusing Chat stream with the combined post-hook converter is the right pattern for non-OpenAI providers.

30-34: Confirm config field for MaxConnsPerHost core/providers/groq.go:33 uses BufferSize, but providers like OpenRouter use Concurrency—ensure this matches the intended connection limit.

core/providers/openrouter.go (3)

118-130: Streaming handler switch looks good.

handleOpenAIChatCompletionStreaming usage aligns with unified streaming logic.

146-153: ResponsesStream delegation is appropriate.

Reusing Chat stream with responses converter matches the non-OpenAI pattern.

132-144: No change needed: OpenRouter’s Responses endpoint should use “/alpha/responses” as per the official docs.

core/schemas/transcriptions.go (1)

56-81: New transcription schemas look consistent.

Composition of common/non-stream/stream fields is clean and aligns with existing schema patterns.

tests/core-providers/cerebras_test.go (1)

26-29: Test flags updated appropriately.

Switch to TextCompletionStream and unified CompletionStream looks correct.

core/schemas/mux.go (1)

885-1027: Approve ToResponsesStream conversion Covers role/text/refusal/tool-call deltas, finish reasons, usage mapping, sets RequestType, clears chat fields; no duplicate definitions found.

core/providers/vertex.go

core/schemas/textcompletions.go

tests/core-providers/azure_test.go

tests/core-providers/scenarios/responses_stream.go

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

transports/bifrost-http/handlers/inference.go (1)
694-701: Fix partial file read and enforce size guard.

Single Read may return fewer bytes; also no size guard is enforced. Use io.ReadAll and reject oversize files.
-	// Read file data
-	fileData := make([]byte, fileHeader.Size)
-	if _, err := file.Read(fileData); err != nil {
-		SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger)
-		return
-	}
+	// Enforce size limit and read file data safely
+	if fileHeader.Size > MaxFileSize {
+		SendError(ctx, fasthttp.StatusBadRequest, fmt.Sprintf("File size exceeds maximum limit of %d MB", MaxFileSize/1024/1024), h.logger)
+		return
+	}
+	fileData, err := io.ReadAll(file)
+	if err != nil {
+		SendError(ctx, fasthttp.StatusInternalServerError, fmt.Sprintf("Failed to read uploaded file: %v", err), h.logger)
+		return
+	}
core/providers/openai.go (3)
278-283: Increase Scanner buffer to avoid truncated SSE chunks.

Default 64KB limit can fail on large deltas/logprobs. Set a larger buffer.

Apply:
-		scanner := bufio.NewScanner(resp.Body)
+		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
1342-1346: SpeechStream: enlarge Scanner buffer.

Audio SSE chunks can be large; increase max token size.
-		scanner := bufio.NewScanner(resp.Body)
+		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
1621-1625: TranscriptionStream: enlarge Scanner buffer.

Avoid scan errors on long chunks.
-		scanner := bufio.NewScanner(resp.Body)
+		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)

♻️ Duplicate comments (8)

tests/core-providers/azure_test.go (1)

28-28: Rename to CompletionStream looks good.

Matches the config struct rename; no issues spotted here.

tests/core-providers/anthropic_test.go (1)

28-28: LGTM on CompletionStream flag.

Consistent with the field rename across tests.
tests/core-providers/config/account.go (1)
20-25: Struct rename/additions look good (CompletionStream, TextCompletionStream)

Ensure all scenario helpers reference Scenarios.CompletionStream (not ChatCompletionStream).
#!/bin/bash
# Confirm no lingering ChatCompletionStream references
rg -nP '\bChatCompletionStream\b' -S -g '!**/vendor/**'
tests/core-providers/scenarios/responses_stream.go (1)
529-535: Fix dereference: read fields from lastResponse.BifrostResponse.

BifrostStream wraps BifrostResponse. Accessing fields directly on lastResponse won’t compile.
-	if lastResponse != nil && lastResponse.BifrostResponse != nil {
-		consolidatedResponse.Usage = lastResponse.Usage
-		consolidatedResponse.Model = lastResponse.Model
-		consolidatedResponse.ID = lastResponse.ID
-		consolidatedResponse.Created = lastResponse.Created
-	}
+	if lastResponse != nil && lastResponse.BifrostResponse != nil {
+		consolidatedResponse.Usage = lastResponse.BifrostResponse.Usage
+		consolidatedResponse.Model = lastResponse.BifrostResponse.Model
+		consolidatedResponse.ID = lastResponse.BifrostResponse.ID
+		consolidatedResponse.Created = lastResponse.BifrostResponse.Created
+	}
core/providers/openai.go (4)
1029-1054: Good: Responses stream error metadata and code forwarded.

ExtraFields (RequestType/Provider/Model) and Error.Code are now included; enables correct SSE emission and error handling.

671-672: Fix TODO typo and clarify/remove TODO.

Typo “WITHING” → “WITHIN”. Consider replacing with actionable comment or removing.
-			//TODO: FIX THIS WITHING RESPONSE STRUCT
+			// TODO: Fix this within response struct (clarify what needs fixing or remove)
632-636: Same: enlarge Scanner buffer for chat streaming.

Prevent scan errors on big tool-call/logprob chunks.
-		scanner := bufio.NewScanner(resp.Body)
+		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)
982-986: Responses streaming: enlarge Scanner buffer.

Large response events can exceed default buffer.
-		scanner := bufio.NewScanner(resp.Body)
+		scanner := bufio.NewScanner(resp.Body)
+		scanner.Buffer(make([]byte, 0, 64*1024), 4*1024*1024)

🧹 Nitpick comments (4)

transports/bifrost-http/handlers/inference.go (1)
603-606: Remove duplicate initialization of SpeechParameters.

Lines 603-606 repeat the same initialization done at Lines 599-601. Drop the second block to avoid confusion.
-	// Extract extra params
-	if req.SpeechParameters == nil {
-		req.SpeechParameters = &schemas.SpeechParameters{}
-	}
core/providers/mistral.go (1)
96-114: Guard Authorization header on presence of key.

Safer and consistent with other providers; omit the header if key is empty.
-	return handleOpenAIChatCompletionStreaming(
+	authHeader := map[string]string{}
+	if key.Value != "" {
+		authHeader["Authorization"] = "Bearer " + key.Value
+	}
+	return handleOpenAIChatCompletionStreaming(
 		ctx,
 		provider.streamClient,
 		provider.networkConfig.BaseURL+"/v1/chat/completions",
 		request,
-		map[string]string{"Authorization": "Bearer " + key.Value},
+		authHeader,
 		provider.networkConfig.ExtraHeaders,
 		provider.sendBackRawResponse,
 		schemas.Mistral,
 		postHookRunner,
 		provider.logger,
 	)
core/providers/vertex.go (1)
224-227: Prefer context/net timeout checks over http.ErrHandlerTimeout.

http.ErrHandlerTimeout is rarely surfaced here. Use context.DeadlineExceeded or net.Error Timeout().
-	if errors.Is(err, http.ErrHandlerTimeout) || errors.Is(err, context.DeadlineExceeded) {
+	if errors.Is(err, context.DeadlineExceeded) {
 		return nil, newBifrostOperationError(schemas.ErrProviderRequestTimedOut, err, schemas.Vertex)
 	}
Optionally:
var nerr net.Error
if errors.As(err, &nerr) && nerr.Timeout() { /* treat as timeout */ }
Apply similarly to the other occurrences noted in the ranges.

Also applies to: 261-263, 515-518, 546-548
core/providers/parasail.go (1)
102-114: Set Authorization header only when key is provided.

Avoid sending "Bearer " when no key is configured.
-	return handleOpenAIChatCompletionStreaming(
+	authHeader := map[string]string{}
+	if key.Value != "" {
+		authHeader["Authorization"] = "Bearer " + key.Value
+	}
+	return handleOpenAIChatCompletionStreaming(
 		ctx,
 		provider.streamClient,
 		provider.networkConfig.BaseURL+"/v1/chat/completions",
 		request,
-		map[string]string{"Authorization": "Bearer " + key.Value},
+		authHeader,
 		provider.networkConfig.ExtraHeaders,
 		provider.sendBackRawResponse,
 		schemas.Parasail,
 		postHookRunner,
 		provider.logger,
 	)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39465aa and 19cab6c.

📒 Files selected for processing (51)

core/providers/anthropic.go (2 hunks)
core/providers/azure.go (2 hunks)
core/providers/bedrock.go (2 hunks)
core/providers/cerebras.go (2 hunks)
core/providers/cohere.go (2 hunks)
core/providers/gemini.go (5 hunks)
core/providers/groq.go (2 hunks)
core/providers/mistral.go (2 hunks)
core/providers/ollama.go (2 hunks)
core/providers/openai.go (16 hunks)
core/providers/openrouter.go (2 hunks)
core/providers/parasail.go (2 hunks)
core/providers/sgl.go (2 hunks)
core/providers/utils.go (1 hunks)
core/providers/vertex.go (6 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/mux.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
tests/core-providers/anthropic_test.go (1 hunks)
tests/core-providers/azure_test.go (1 hunks)
tests/core-providers/bedrock_test.go (1 hunks)
tests/core-providers/cerebras_test.go (1 hunks)
tests/core-providers/cohere_test.go (1 hunks)
tests/core-providers/config/account.go (12 hunks)
tests/core-providers/custom_test.go (1 hunks)
tests/core-providers/gemini_test.go (1 hunks)
tests/core-providers/groq_test.go (1 hunks)
tests/core-providers/mistral_test.go (1 hunks)
tests/core-providers/ollama_test.go (1 hunks)
tests/core-providers/openai_test.go (1 hunks)
tests/core-providers/openrouter_test.go (1 hunks)
tests/core-providers/parasail_test.go (1 hunks)
tests/core-providers/scenarios/chat_completion_stream.go (1 hunks)
tests/core-providers/scenarios/responses_stream.go (1 hunks)
tests/core-providers/scenarios/text_completion_stream.go (1 hunks)
tests/core-providers/scenarios/utils.go (0 hunks)
tests/core-providers/sgl_test.go (1 hunks)
tests/core-providers/tests.go (2 hunks)
tests/core-providers/vertex_test.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (5)

plugins/telemetry/main.go
transports/bifrost-http/handlers/server.go
tests/core-providers/scenarios/utils.go
core/schemas/providers/mistral/types.go
core/schemas/providers/mistral/embedding.go

🚧 Files skipped from review as they are similar to previous changes (21)

core/schemas/textcompletions.go
core/schemas/chatcompletions.go
tests/core-providers/custom_test.go
tests/core-providers/scenarios/chat_completion_stream.go
tests/core-providers/vertex_test.go
core/providers/bedrock.go
tests/core-providers/gemini_test.go
core/providers/utils.go
tests/core-providers/openai_test.go
core/providers/sgl.go
core/schemas/embedding.go
tests/core-providers/mistral_test.go
tests/core-providers/scenarios/text_completion_stream.go
tests/core-providers/tests.go
core/providers/gemini.go
core/providers/cohere.go
tests/core-providers/groq_test.go
core/providers/groq.go
core/schemas/transcriptions.go
tests/core-providers/cerebras_test.go
core/schemas/providers/gemini/chat.go

🧰 Additional context used

🧬 Code graph analysis (15)

core/providers/openrouter.go (2)

core/schemas/bifrost.go (4)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

BifrostStream (383-386)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/providers/ollama.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Ollama (62-62)

BifrostResponsesRequest (197-203)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (2)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeError (1685-1685)

core/providers/mistral.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Mistral (61-61)

BifrostResponsesRequest (197-203)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

tests/core-providers/scenarios/responses_stream.go (3)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/responses.go (22)

ResponsesMessage (280-292)

ResponsesInputMessageRoleUser (567-567)

ResponsesMessageContent (573-576)

ResponsesParameters (33-60)

ResponsesStreamResponseType (1620-1620)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesStreamResponseTypeContentPartAdded (1632-1632)

ResponsesStreamResponseTypeCreated (1623-1623)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesTool (1341-1356)

ResponsesToolFunction (1359-1362)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesParametersReasoning (210-214)

ResponsesStreamResponseTypeReasoningSummaryPartAdded (1653-1653)

ResponsesStreamResponseTypeReasoningSummaryTextDelta (1655-1655)

ResponsesMessageTypeReasoning (273-273)

ResponsesResponse (85-114)

ResponsesMessageTypeMessage (256-256)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostStream (383-386)

BifrostError (409-418)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

SGL (64-64)

core/schemas/mux.go (3)

core/schemas/bifrost.go (3)

BifrostResponse (238-254)

RequestType (99-99)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (16)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesMessage (280-292)

ResponsesMessageTypeMessage (256-256)

ResponsesInputMessageRoleAssistant (566-566)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeRefusalDelta (1639-1639)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesToolMessage (692-715)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeIncomplete (1627-1627)

ResponsesStreamResponseTypeOutputItemDone (1630-1630)

ResponsesStreamResponseStruct (1719-1722)

ResponsesResponseUsage (236-239)

ResponsesExtendedResponseUsage (229-234)

core/schemas/chatcompletions.go (1)

BifrostStreamResponseChoice (376-378)

core/providers/cerebras.go (2)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/providers/azure.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (8)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

Azure (56-56)

EmbeddingRequest (108-108)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/vertex.go (5)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (2)

Key (8-17)

VertexKeyConfig (35-39)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Vertex (60-60)

BifrostResponsesRequest (197-203)

core/schemas/providers/anthropic/chat.go (1)

ToAnthropicChatCompletionRequest (365-596)

core/utils.go (1)

Ptr (12-14)

core/providers/parasail.go (2)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

core/providers/openai.go (5)

core/schemas/bifrost.go (19)

TextCompletionStreamRequest (103-103)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

ErrorField (426-433)

RequestCancelled (378-378)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostErrorExtraFields (435-439)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/provider.go (3)

PostHookRunner (191-191)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/utils.go (1)

Ptr (12-14)

core/schemas/providers/openai/responses.go (1)

ToOpenAIResponsesRequest (30-51)

core/schemas/responses.go (3)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesStreamResponseTypeCompleted (1625-1625)

🔇 Additional comments (32)

core/providers/anthropic.go (1)

398-398: LGTM! Timeout error handling correctly aligned with http.Client usage.

The changes from fasthttp.ErrTimeout to http.ErrHandlerTimeout are correct since the handleAnthropicStreaming function uses http.Client for streaming requests, not fasthttp.Client. The timeout handling remains robust with the existing context.DeadlineExceeded checks.

Also applies to: 425-425

core/schemas/speech.go (1)

86-95: LGTM: Speech streaming types added.

The new BifrostSpeech and BifrostSpeechStreamResponse types extend the speech API to support streaming. The pointer embedding of BifrostSpeechStreamResponse follows Go conventions for optional fields and aligns with the broader streaming infrastructure additions across the PR.

core/schemas/responses.go (3)

88-88: Error field addition looks good.

The new Error field in ResponsesResponse appropriately extends error handling capabilities for the Responses API.

1635-1637: Verify necessity of .added event type constants.

The code defines both .added and .delta variants for output text and function call arguments:

ResponsesStreamResponseTypeOutputTextAdded (.added)

ResponsesStreamResponseTypeOutputTextDelta (.delta)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (.added)

ResponsesStreamResponseTypeFunctionCallArgumentsDelta (.delta)

Based on past review discussions, OpenAI's Responses API uses .delta for streaming chunks. Having both variants could be correct if the API emits distinct event types at different stages, but it may also indicate that the .added variants were supposed to be removed when .delta was introduced.

Confirm whether both event types are emitted by the OpenAI Responses API or if only .delta variants should be retained for streaming events.

Also applies to: 1643-1645

1688-1722: Streaming response structures look comprehensive.

The ResponsesStreamResponse and ResponsesStreamResponseStruct types provide a well-structured foundation for handling streaming events with appropriate fields for different event types (deltas, items, parts, annotations, etc.).

tests/core-providers/ollama_test.go (1)

27-27: Field rename aligns with streaming refactor.

The rename from ChatCompletionStream to CompletionStream is consistent with the broader refactoring across all provider tests to unify streaming flag naming.

tests/core-providers/openrouter_test.go (1)

32-32: Consistent with streaming flag refactor.

The CompletionStream rename maintains consistency across all provider test configurations.

tests/core-providers/parasail_test.go (1)

27-27: Test configuration updated consistently.

The streaming flag rename follows the same pattern applied across all provider tests.

tests/core-providers/cohere_test.go (1)

28-28: Cohere test config aligned with refactor.

The field rename maintains consistency with the unified CompletionStream naming convention.

tests/core-providers/sgl_test.go (1)

28-28: SGL test scenarios updated.

The streaming flag rename is consistent with the broader test suite refactoring.

tests/core-providers/bedrock_test.go (1)

33-33: Bedrock test config updated consistently.

The rename completes the unified CompletionStream naming convention across all provider test configurations.

transports/bifrost-http/integrations/utils.go (1)

568-609: SSE formatting by request type and DONE gating look correct

Emits event: lines for Responses streams and standard data: lines otherwise.

Suppresses [DONE] for Responses streaming; uses stream close. Good.

Also applies to: 619-626

core/schemas/mux.go (1)

885-1028: Single ToResponsesStream implementation confirmed Only one ToResponsesStream definition exists across the codebase.

transports/bifrost-http/handlers/inference.go (3)

631-634: Speech SSE gating: LGTM.

Streaming only when stream_format == "sse" is clear and safe.

891-898: Correct: don’t emit [DONE] for Responses streams.

This matches OpenAI Responses API behavior.

840-846: LGTM: RequestType set on first chunk The utils.go post-hook converter and openai.go’s manual assignments ensure ExtraFields.RequestType is initialized on the initial chunk for all streaming providers.

core/providers/openrouter.go (3)

118-130: Chat streaming migration: LGTM.

Uses shared handleOpenAIChatCompletionStreaming with proper headers and provider key.

146-154: Verified: converter sets request type and stream response
getResponsesChunkConverterCombinedPostHookRunner calls ToResponsesStream() to populate ResponsesStreamResponse and assigns ExtraFields.RequestType = ResponsesStreamRequest on both results and errors.

132-144: Responses endpoint path is correct. The default BaseURL (https://openrouter.ai/api) plus /alpha/responses yields https://openrouter.ai/api/alpha/responses, matching the official API.

core/providers/ollama.go (1)

152-160: ResponsesStream delegation: LGTM.

Post-hook reuse keeps behavior consistent.

core/providers/mistral.go (1)

130-138: ResponsesStream delegation: LGTM.

Combined post-hook approach is consistent.

core/providers/vertex.go (1)

418-435: Auth header usage: LGTM.

Uses computed authHeader allowing OAuth client to manage Authorization when key.Value is empty.

core/providers/parasail.go (1)

116-138: Responses methods: LGTM.

Non-streaming wraps chat; streaming uses combined post-hook.

core/providers/cerebras.go (4)

120-131: Good switch to specialized chat streaming wrapper.

Delegation and params look correct.

134-146: Responses path wiring LGTM.

Conversion + ToResponsesOnly + ExtraFields set correctly.

148-155: ResponsesStream delegation LGTM.

Using combined post-hook converter is the right approach.

157-161: Clear unsupported Embedding surface.

Appropriate explicit error for Cerebras.

core/providers/azure.go (4)

320-332: Chat streaming wrapper usage LGTM.

Headers/auth preserved; consistent with other providers.

334-349: Responses path LGTM.

Correct conversion and metadata population.

351-358: ResponsesStream delegation LGTM.

Combined post-hook converter is appropriate.

360-393: Embedding implementation LGTM.

Correct URL construction via completeRequest; ExtraFields set; RawResponse gated by flag.

core/schemas/bifrost.go (1)

253-254: Schema extension for Responses streaming LGTM.

Clean embedding of stream variant into BifrostResponse.

core/providers/ollama.go

core/schemas/mux.go

transports/bifrost-http/handlers/inference.go

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

core/schemas/mux.go (1)

918-935: Avoid fallthrough; handle role+content explicitly.

fallthrough executes the next case body without condition re-checks, which can misclassify role-only chunks and preclude handling of other fields in the same chunk. Replace with an explicit nested content check.

-  case delta.Role != nil:
+  case delta.Role != nil:
     // Role initialization - typically the first chunk
     streamResp.Type = ResponsesStreamResponseTypeOutputItemAdded
     streamResp.Item = &ResponsesMessage{
       Type: Ptr(ResponsesMessageTypeMessage),
       Role: Ptr(ResponsesInputMessageRoleAssistant),
     }
     if *delta.Role == "assistant" {
       streamResp.Item.Role = Ptr(ResponsesInputMessageRoleAssistant)
     }
-    fallthrough
-
-  case delta.Content != nil && *delta.Content != "":
-    if delta.Content != nil && *delta.Content != "" { // Need this check again because of the fallthrough
-      // Text content delta
-      streamResp.Type = ResponsesStreamResponseTypeOutputTextDelta
-      streamResp.Delta = delta.Content
-    }
+    // If content is also present in this chunk, emit text delta
+    if delta.Content != nil && *delta.Content != "" {
+      streamResp.Type = ResponsesStreamResponseTypeOutputTextDelta
+      streamResp.Delta = delta.Content
+    }
+    // No implicit fallthrough
+
+  case delta.Content != nil && *delta.Content != "":
+    // Text content delta
+    streamResp.Type = ResponsesStreamResponseTypeOutputTextDelta
+    streamResp.Delta = delta.Content

core/schemas/textcompletions.go (1)

72-78: Allow nulls in logprobs arrays.

OpenAI can return nulls for token_logprobs and elements of top_logprobs; current types will fail to unmarshal.

 type TextCompletionLogProb struct {
   TextOffset    []int                `json:"text_offset"`
-  TokenLogProbs []float64            `json:"token_logprobs"`
+  TokenLogProbs []*float64           `json:"token_logprobs"`
   Tokens        []string             `json:"tokens"`
-  TopLogProbs   []map[string]float64 `json:"top_logprobs"`
+  TopLogProbs   []*map[string]float64 `json:"top_logprobs"`
 }

Ensure readers handle nil entries safely.

tests/core-providers/scenarios/responses_stream.go (1)

529-535: Use explicit field access through BifrostResponse.

While Go's embedded pointer allows lastResponse.Usage as shorthand, the explicit path is clearer and avoids potential confusion. This mirrors the past review concern at the same lines.

Apply this diff for clarity:
-    consolidatedResponse.Usage = lastResponse.Usage
-    consolidatedResponse.Model = lastResponse.Model
-    consolidatedResponse.ID = lastResponse.ID
-    consolidatedResponse.Created = lastResponse.Created
+    consolidatedResponse.Usage = lastResponse.BifrostResponse.Usage
+    consolidatedResponse.Model = lastResponse.BifrostResponse.Model
+    consolidatedResponse.ID = lastResponse.BifrostResponse.ID
+    consolidatedResponse.Created = lastResponse.BifrostResponse.Created

🧹 Nitpick comments (4)

tests/core-providers/scenarios/chat_completion_stream.go (1)
17-20: Align log text with renamed flag.

Message still says “Chat completion stream...”. Update to “Completion stream...” for consistency.
-    t.Logf("Chat completion stream not supported for provider %s", testConfig.Provider)
+    t.Logf("Completion stream not supported for provider %s", testConfig.Provider)
core/schemas/mux.go (1)
893-896: Also set RequestType when early-returning.

If ResponsesStreamResponse is already present, mark the request type before returning to ensure transports format SSE correctly.
-  if br.ResponsesStreamResponse != nil {
-    br.Choices = nil
-    return
-  }
+  if br.ResponsesStreamResponse != nil {
+    br.ExtraFields.RequestType = ResponsesStreamRequest
+    br.Choices = nil
+    return
+  }
core/schemas/chatcompletions.go (1)
351-363: Consider adding runtime validation for union-style embedded pointers.

The comment correctly states that only one of the three embedded pointers should be non-nil at a time, but there's no runtime enforcement of this invariant. If multiple pointers are set simultaneously, the JSON marshaling could produce unexpected results due to field overlaps or overwriting.

Consider adding a validation method:
// Validate ensures only one embedded pointer is non-nil
func (c *BifrostChatResponseChoice) Validate() error {
    count := 0
    if c.BifrostTextCompletionResponseChoice != nil {
        count++
    }
    if c.BifrostNonStreamResponseChoice != nil {
        count++
    }
    if c.BifrostStreamResponseChoice != nil {
        count++
    }
    if count > 1 {
        return fmt.Errorf("multiple embedded pointers set; only one should be non-nil")
    }
    return nil
}
core/schemas/transcriptions.go (1)
56-66: Consider adding runtime validation for union-style embedded pointers.

Similar to BifrostChatResponseChoice, this struct uses a union pattern with embedded pointers where only one should be non-nil at a time. Without runtime validation, setting both BifrostTranscribeNonStreamResponse and BifrostTranscribeStreamResponse simultaneously could lead to unexpected JSON serialization behavior.

Consider adding a validation method similar to the one suggested for BifrostChatResponseChoice:
// Validate ensures only one embedded pointer is non-nil
func (t *BifrostTranscribe) Validate() error {
    count := 0
    if t.BifrostTranscribeNonStreamResponse != nil {
        count++
    }
    if t.BifrostTranscribeStreamResponse != nil {
        count++
    }
    if count > 1 {
        return fmt.Errorf("both streaming and non-streaming pointers set; only one should be non-nil")
    }
    return nil
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19cab6c and 301ef3b.

📒 Files selected for processing (51)

core/providers/anthropic.go (2 hunks)
core/providers/azure.go (2 hunks)
core/providers/bedrock.go (2 hunks)
core/providers/cerebras.go (2 hunks)
core/providers/cohere.go (2 hunks)
core/providers/gemini.go (5 hunks)
core/providers/groq.go (2 hunks)
core/providers/mistral.go (2 hunks)
core/providers/ollama.go (3 hunks)
core/providers/openai.go (21 hunks)
core/providers/openrouter.go (2 hunks)
core/providers/parasail.go (2 hunks)
core/providers/sgl.go (3 hunks)
core/providers/utils.go (1 hunks)
core/providers/vertex.go (6 hunks)
core/schemas/bifrost.go (1 hunks)
core/schemas/chatcompletions.go (1 hunks)
core/schemas/embedding.go (1 hunks)
core/schemas/mux.go (1 hunks)
core/schemas/providers/gemini/chat.go (1 hunks)
core/schemas/providers/mistral/embedding.go (0 hunks)
core/schemas/providers/mistral/types.go (0 hunks)
core/schemas/responses.go (2 hunks)
core/schemas/speech.go (2 hunks)
core/schemas/textcompletions.go (1 hunks)
core/schemas/transcriptions.go (1 hunks)
plugins/telemetry/main.go (0 hunks)
tests/core-providers/anthropic_test.go (1 hunks)
tests/core-providers/azure_test.go (1 hunks)
tests/core-providers/bedrock_test.go (1 hunks)
tests/core-providers/cerebras_test.go (1 hunks)
tests/core-providers/cohere_test.go (1 hunks)
tests/core-providers/config/account.go (12 hunks)
tests/core-providers/custom_test.go (1 hunks)
tests/core-providers/gemini_test.go (1 hunks)
tests/core-providers/groq_test.go (1 hunks)
tests/core-providers/mistral_test.go (1 hunks)
tests/core-providers/ollama_test.go (1 hunks)
tests/core-providers/openai_test.go (1 hunks)
tests/core-providers/openrouter_test.go (1 hunks)
tests/core-providers/parasail_test.go (1 hunks)
tests/core-providers/scenarios/chat_completion_stream.go (1 hunks)
tests/core-providers/scenarios/responses_stream.go (1 hunks)
tests/core-providers/scenarios/text_completion_stream.go (1 hunks)
tests/core-providers/scenarios/utils.go (0 hunks)
tests/core-providers/sgl_test.go (1 hunks)
tests/core-providers/tests.go (2 hunks)
tests/core-providers/vertex_test.go (1 hunks)
transports/bifrost-http/handlers/inference.go (4 hunks)
transports/bifrost-http/handlers/server.go (0 hunks)
transports/bifrost-http/integrations/utils.go (3 hunks)

💤 Files with no reviewable changes (5)

transports/bifrost-http/handlers/server.go
core/schemas/providers/mistral/types.go
tests/core-providers/scenarios/utils.go
core/schemas/providers/mistral/embedding.go
plugins/telemetry/main.go

🚧 Files skipped from review as they are similar to previous changes (20)

tests/core-providers/groq_test.go
core/providers/anthropic.go
core/schemas/providers/gemini/chat.go
tests/core-providers/azure_test.go
core/providers/utils.go
core/providers/gemini.go
tests/core-providers/openrouter_test.go
tests/core-providers/custom_test.go
core/providers/mistral.go
core/providers/bedrock.go
tests/core-providers/bedrock_test.go
core/schemas/embedding.go
tests/core-providers/openai_test.go
core/providers/cohere.go
tests/core-providers/cohere_test.go
tests/core-providers/anthropic_test.go
tests/core-providers/scenarios/text_completion_stream.go
tests/core-providers/mistral_test.go
tests/core-providers/gemini_test.go
tests/core-providers/sgl_test.go

🧰 Additional context used

🧬 Code graph analysis (19)

transports/bifrost-http/integrations/utils.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

core/schemas/chatcompletions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/providers/vertex.go (5)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (2)

Key (8-17)

VertexKeyConfig (35-39)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Vertex (60-60)

BifrostResponsesRequest (197-203)

core/schemas/providers/anthropic/chat.go (1)

ToAnthropicChatCompletionRequest (365-596)

core/utils.go (1)

Ptr (12-14)

tests/core-providers/scenarios/responses_stream.go (4)

tests/core-providers/config/account.go (1)

ComprehensiveTestConfig (43-55)

core/schemas/responses.go (22)

ResponsesMessage (280-292)

ResponsesInputMessageRoleUser (567-567)

ResponsesMessageContent (573-576)

ResponsesParameters (33-60)

ResponsesStreamResponseType (1620-1620)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeOutputItemAdded (1629-1629)

ResponsesStreamResponseTypeContentPartAdded (1632-1632)

ResponsesStreamResponseTypeCreated (1623-1623)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesTool (1341-1356)

ResponsesToolFunction (1359-1362)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesParametersReasoning (210-214)

ResponsesStreamResponseTypeReasoningSummaryPartAdded (1653-1653)

ResponsesStreamResponseTypeReasoningSummaryTextDelta (1655-1655)

ResponsesMessageTypeReasoning (273-273)

ResponsesResponse (85-114)

ResponsesInputMessageRoleAssistant (566-566)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostStream (383-386)

BifrostError (409-418)

BifrostResponse (238-254)

ModelProvider (52-52)

BifrostResponseExtraFields (349-358)

SGL (64-64)

core/schemas/chatcompletions.go (1)

ToolFunctionParameters (73-79)

core/providers/cerebras.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/providers/groq.go (2)

core/schemas/bifrost.go (6)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/transcriptions.go (1)

core/schemas/bifrost.go (1)

LogProbs (341-346)

core/schemas/speech.go (1)

core/schemas/bifrost.go (1)

AudioLLMUsage (304-309)

core/providers/azure.go (2)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

tests/core-providers/tests.go (3)

tests/core-providers/scenarios/text_completion_stream.go (1)

RunTextCompletionStreamTest (16-399)

tests/core-providers/scenarios/chat_completion_stream.go (1)

RunChatCompletionStreamTest (16-298)

tests/core-providers/scenarios/responses_stream.go (1)

RunResponsesStreamTest (17-467)

core/providers/sgl.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

SGL (64-64)

BifrostResponsesRequest (197-203)

core/providers/ollama.go (3)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (5)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

Ollama (62-62)

BifrostResponsesRequest (197-203)

core/schemas/responses.go (1)

plugins/mocker/main.go (1)

Response (97-103)

core/schemas/bifrost.go (1)

core/schemas/responses.go (1)

ResponsesStreamResponse (1688-1717)

transports/bifrost-http/handlers/inference.go (2)

core/schemas/bifrost.go (4)

RequestType (99-99)

BifrostResponse (238-254)

BifrostError (409-418)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (2)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeError (1685-1685)

core/providers/openrouter.go (2)

core/schemas/bifrost.go (4)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

BifrostStream (383-386)

core/schemas/provider.go (1)

PostHookRunner (191-191)

core/providers/openai.go (4)

core/schemas/bifrost.go (18)

TextCompletionStreamRequest (103-103)

BifrostChatRequest (189-195)

BifrostStream (383-386)

BifrostError (409-418)

OpenAI (55-55)

ChatCompletionStreamRequest (105-105)

ModelProvider (52-52)

ErrorField (426-433)

LLMUsage (296-302)

BifrostContextKeyStreamEndIndicator (125-125)

BifrostResponse (238-254)

RequestType (99-99)

BifrostResponsesRequest (197-203)

ResponsesRequest (106-106)

ResponsesStreamRequest (107-107)

BifrostErrorExtraFields (435-439)

BifrostEmbeddingRequest (205-211)

EmbeddingRequest (108-108)

core/schemas/provider.go (3)

PostHookRunner (191-191)

DefaultStreamBufferSize (17-17)

Provider (194-219)

core/schemas/account.go (1)

Key (8-17)

core/schemas/responses.go (3)

ResponsesStreamResponse (1688-1717)

ResponsesStreamResponseTypeError (1685-1685)

ResponsesStreamResponseTypeCompleted (1625-1625)

core/providers/parasail.go (3)

core/schemas/account.go (1)

Key (8-17)

core/schemas/bifrost.go (7)

BifrostResponsesRequest (197-203)

BifrostResponse (238-254)

BifrostError (409-418)

RequestType (99-99)

ResponsesRequest (106-106)

BifrostStream (383-386)

BifrostEmbeddingRequest (205-211)

core/schemas/provider.go (2)

Provider (194-219)

PostHookRunner (191-191)

core/schemas/mux.go (4)

core/schemas/bifrost.go (3)

BifrostResponse (238-254)

RequestType (99-99)

ResponsesStreamRequest (107-107)

core/schemas/responses.go (15)

ResponsesStreamResponse (1688-1717)

ResponsesMessage (280-292)

ResponsesMessageTypeMessage (256-256)

ResponsesInputMessageRoleAssistant (566-566)

ResponsesStreamResponseTypeOutputTextDelta (1636-1636)

ResponsesStreamResponseTypeRefusalDelta (1639-1639)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded (1642-1642)

ResponsesMessageTypeFunctionCall (261-261)

ResponsesToolMessage (692-715)

ResponsesStreamResponseTypeCompleted (1625-1625)

ResponsesStreamResponseTypeIncomplete (1627-1627)

ResponsesStreamResponseTypeOutputItemDone (1630-1630)

ResponsesStreamResponseStruct (1719-1722)

ResponsesResponseUsage (236-239)

ResponsesExtendedResponseUsage (229-234)

core/schemas/chatcompletions.go (1)

BifrostStreamResponseChoice (376-378)

plugins/mocker/main.go (2)

Usage (138-142)

Response (97-103)

🔇 Additional comments (41)

core/schemas/speech.go (1)

86-95: New speech streaming surface looks good.

Types are minimal and compatible with existing BifrostResponse.Speech usage.

transports/bifrost-http/integrations/utils.go (1)

568-609: SSE branching by request type looks correct.

Uses Responses event line when type is responses_stream.

Falls back to standard SSE and emits [DONE] only for non-responses.

Confirm that all streaming responses set ExtraFields.RequestType; if any converter bypasses it, consider inferring from presence of ResponsesStreamResponse as a fallback.

tests/core-providers/ollama_test.go (1)

27-27: LGTM! Consistent field rename across test suite.

The rename from ChatCompletionStream to CompletionStream aligns with the broader refactoring effort to unify streaming terminology across all provider tests.

tests/core-providers/vertex_test.go (1)

28-28: LGTM! Consistent field rename across test suite.

The rename from ChatCompletionStream to CompletionStream maintains consistency with the broader test configuration refactoring.

tests/core-providers/parasail_test.go (1)

27-27: LGTM! Consistent field rename across test suite.

The rename from ChatCompletionStream to CompletionStream is consistent with the test configuration updates across all providers.

tests/core-providers/tests.go (2)

29-32: LGTM! New streaming test scenarios properly integrated.

The addition of RunTextCompletionStreamTest and RunResponsesStreamTest expands test coverage for different streaming formats. Based on the relevant code snippets, both test implementations properly handle streaming with timeouts, validation, and error handling.

71-71: LGTM! Summary updated to match field rename.

The test summary correctly references the renamed CompletionStream field, maintaining consistency with the test configuration changes.

tests/core-providers/cerebras_test.go (1)

26-28: LGTM! Text completion streaming enabled for Cerebras.

The addition of TextCompletionStream: true and the rename of ChatCompletionStream to CompletionStream enable comprehensive streaming test coverage for the Cerebras provider. This aligns with the broader test configuration refactoring.

tests/core-providers/config/account.go (3)

22-24: LGTM! Test configuration fields properly refactored.

The addition of TextCompletionStream and the rename of ChatCompletionStream to CompletionStream provide clearer separation between text completion streaming and chat completion streaming scenarios. This change is consistently propagated across all provider configurations in AllProviderConfigs.

466-468: LGTM! OpenAI configuration updated consistently.

The test configuration correctly disables TextCompletionStream (OpenAI doesn't support legacy text completions in newer models) and enables CompletionStream for chat completion streaming.

496-496: LGTM! All provider configurations updated consistently.

The CompletionStream field is consistently enabled across Anthropic, Bedrock, Cohere, Azure, Vertex, Ollama, Groq, OpenAI Custom, Gemini, and OpenRouter provider configurations, maintaining the same streaming capabilities as before the refactor.

Also applies to: 523-523, 550-550, 577-577, 604-604, 657-657, 684-684, 711-711, 741-741, 768-768

core/schemas/chatcompletions.go (1)

389-402: Drop duplication concern—LogProb and ContentLogProb are defined once in chatcompletions.go and merely referenced (not redefined) by LogProbs in bifrost.go.

Likely an incorrect or invalid review comment.

transports/bifrost-http/handlers/inference.go (1)

828-898: LGTM: Correct Responses API streaming detection and SSE formatting.

The implementation correctly:

Detects request type from the first chunk's ExtraFields

Emits event: <type> lines for Responses streaming (OpenAI Responses API format)

Uses standard data: ... format for other streaming APIs

Omits [DONE] marker for Responses API (per OpenAI spec)

Minor observation: If requestType is never populated (no chunks with BifrostResponse/BifrostError), it defaults to standard SSE format, which is a reasonable fallback.

core/providers/openrouter.go (1)

118-153: LGTM: Clean refactoring and new Responses API surfaces.

The changes correctly:

Switch to specialized handleOpenAIChatCompletionStreaming for chat completions

Add Responses method using OpenRouter's /alpha/responses endpoint

Add ResponsesStream method that delegates to ChatCompletionStream with a post-hook converter

The pattern is consistent with other providers in this PR.

core/providers/vertex.go (2)

348-436: LGTM: Correct streaming implementation with proper auth handling.

The implementation correctly:

Validates key config (projectID, region)

Routes Claude models to Anthropic streaming, others to OpenAI streaming

Sets Authorization header only when key.Value is non-empty (addresses past review concern about OAuth conflicts)

Handles auth client creation errors

The conditional auth header approach at lines 418-421 is appropriate: when using OAuth (via getAuthClient), key.Value would typically be empty, so authHeader remains empty and OAuth handles authentication.

452-459: LGTM: Standard ResponsesStream implementation.

Correctly delegates to ChatCompletionStream with a post-hook converter and request transformation.

core/providers/ollama.go (2)

94-136: LGTM: Correct auth handling for Ollama.

Passing nil as the auth header (lines 94, 129) is correct for Ollama, which typically doesn't require authentication for local deployments. This addresses the past review concern about sending "Authorization: Bearer " with an empty key causing 401 errors.

152-159: LGTM: Standard ResponsesStream implementation.

Correctly delegates to ChatCompletionStream with post-hook converter.

tests/core-providers/scenarios/responses_stream.go (1)

1-637: LGTM: Comprehensive Responses streaming test coverage.

The test file correctly implements:

Basic streaming with event type and sequence number tracking

Timeout handling with context cancellation

Tool calls streaming validation

Reasoning streaming validation

Proper event structure validation

Content accumulation across streaming chunks

The three test scenarios (basic, tools, reasoning) provide solid coverage for the Responses API streaming path.

core/providers/azure.go (1)

320-393: LGTM: Standard provider surfaces for Responses and Embedding.

The additions correctly:

Switch to specialized handleOpenAIChatCompletionStreaming for chat completions

Implement Responses by delegating to ChatCompletion and formatting the result

Implement ResponsesStream via ChatCompletionStream with post-hook converter

Implement Embedding using OpenAI converter and Azure's request infrastructure

All methods properly populate ExtraFields (Provider, RequestType, ModelRequested, Latency, RawResponse).

core/providers/parasail.go (1)

102-142: LGTM: Standard provider surfaces added.

The changes correctly:

Switch to specialized handleOpenAIChatCompletionStreaming

Add Responses and ResponsesStream following the standard pattern

Return unsupported operation error for Embedding (Parasail doesn't support embeddings)

core/providers/cerebras.go (1)

120-160: LGTM: Standard provider surfaces added.

The changes correctly:

Switch to specialized handleOpenAIChatCompletionStreaming

Add Responses and ResponsesStream following the standard pattern

Return unsupported operation error for Embedding (Cerebras doesn't support embeddings)

core/providers/sgl.go (3)

95-95: LGTM: Authorization handling aligned with provider requirements.

The change to pass nil for the Authorization header is consistent with how ChatCompletionStream handles authentication (line 130) and appears intentional for the SGL provider.

119-137: LGTM: Clean streaming implementation.

The ChatCompletionStream method correctly delegates to the shared handleOpenAIChatCompletionStreaming handler with all required parameters. The implementation follows the established pattern used across other providers.

153-160: LGTM: Correct responses streaming delegation.

The ResponsesStream implementation correctly delegates through ChatCompletionStream using the combined post-hook runner pattern. This approach converts chat streaming responses into the Responses API format, which is the expected pattern for non-OpenAI providers.

core/providers/groq.go (4)

160-172: LGTM: Refactored to use shared streaming handler.

The change to use handleOpenAIChatCompletionStreaming follows the established pattern and improves code maintainability by leveraging shared streaming logic.

174-186: LGTM: Proper Responses API delegation.

The Responses method correctly converts the Responses request to a chat request, processes it, and transforms the response using ToResponsesOnly(). The metadata fields are properly populated.

188-195: LGTM: Consistent responses streaming pattern.

The ResponsesStream implementation follows the same delegation pattern used in other providers, correctly routing through ChatCompletionStream with the converter post-hook runner.

197-200: LGTM: Proper unsupported operation handling.

The Embedding method correctly returns an unsupported operation error for Groq, maintaining interface compliance while clearly communicating the limitation.

core/schemas/responses.go (2)

88-88: LGTM: Error field addition for response error handling.

The new Error field in ResponsesResponse appropriately handles error scenarios in the Responses API format.

1635-1643: Verify dual event type constants for streaming.

The code now includes both .added and .delta variants for output text and function call arguments:

ResponsesStreamResponseTypeOutputTextAdded = "response.output_text.added" (line 1635)

ResponsesStreamResponseTypeOutputTextDelta = "response.output_text.delta" (line 1636)

ResponsesStreamResponseTypeFunctionCallArgumentsAdded = "response.function_call_arguments.added" (line 1642)

ResponsesStreamResponseTypeFunctionCallArgumentsDelta = "response.function_call_arguments.delta" (line 1643)

Previous review discussions suggested replacing .added with .delta, but both variants now exist. Please confirm:

Are both event types emitted by the OpenAI Responses API?

If only .delta events are emitted for incremental updates, should the .added constants be removed?

Based on past review discussions, please verify the actual OpenAI Responses API event names to ensure these constants match the streamed events.

core/schemas/bifrost.go (1)

253-253: LGTM: Embedded field for responses streaming support.

The addition of *ResponsesStreamResponse to BifrostResponse properly integrates the new responses streaming surface into the unified response structure.

core/providers/openai.go (9)

132-134: LGTM: Conditional authorization header handling.

The addition of if key.Value != "" checks before setting the Authorization header across all request methods properly handles scenarios where API keys may not be required (e.g., local deployments, authenticated proxies).

Also applies to: 475-477, 836-838, 1162-1164

227-230: LGTM: Safe authHeader handling in streaming methods.

The conditional check for authHeader != nil before copying headers prevents nil pointer issues and properly supports providers that don't require authentication.

Also applies to: 573-576, 928-931

264-267: LGTM: Comprehensive timeout error handling.

The addition of checks for both http.ErrHandlerTimeout and context.DeadlineExceeded ensures proper timeout detection across different failure scenarios in all streaming methods.

Also applies to: 596-599, 623-626, 951-954, 978-981, 1318-1321, 1345-1348, 1597-1600, 1624-1627

308-313: LGTM: Improved SSE parsing with CutPrefix.

The switch to strings.CutPrefix provides clearer semantics and better performance. The additional filtering for "event:" lines in responses streaming (lines 1020-1026) correctly handles the Responses API's event-based SSE format.

Also applies to: 668-673, 1018-1026

329-336: LGTM: Proper error handling in streaming responses.

The error handling correctly parses stream errors, marks the stream as ending with BifrostContextKeyStreamEndIndicator, and propagates the error through the response channel.

Also applies to: 690-697

747-750: LGTM: Nil-check prevents potential panic.

The addition of choice.BifrostStreamResponseChoice.Delta != nil guard prevents nil pointer dereferences when processing chunks that contain only role information or tool calls without delta content.

1044-1069: LGTM: Complete error handling for responses streaming.

The error handling properly detects error events, populates all necessary fields including ExtraFields with request metadata, and correctly propagates error codes, messages, and parameters. This addresses previous review feedback.

1080-1085: LGTM: Proper completion event handling.

The detection of ResponsesStreamResponseTypeCompleted events and the subsequent stream termination with final latency calculation correctly implements the terminal state handling for responses streaming.

1102-1201: LGTM: Complete embedding support implementation.

The new Embedding method and handleOpenAIEmbeddingRequest handler follow the established patterns for request handling, including proper error handling, response parsing, and metadata population. The implementation is consistent with other request types in the provider.

akshaydeo · 2025-10-13T09:55:24Z

Merge activity

Oct 13, 9:55 AM UTC: A user started a stack merge that includes this pull request via Graphite.
Oct 13, 9:55 AM UTC: @akshaydeo merged this pull request with Graphite.

Pratham-Mishra04 requested review from akshaydeo and danpiths October 11, 2025 11:35

Pratham-Mishra04 marked this pull request as ready for review October 11, 2025 11:36

coderabbitai bot requested changes Oct 11, 2025

View reviewed changes

core/schemas/embedding.go Show resolved Hide resolved

core/schemas/responses.go Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from d0f4414 to 2fe27b0 Compare October 11, 2025 11:53

coderabbitai bot requested changes Oct 11, 2025

View reviewed changes

core/providers/openai.go Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from 2fe27b0 to cc59aff Compare October 11, 2025 12:39

coderabbitai bot reviewed Oct 11, 2025

View reviewed changes

Pratham-Mishra04 changed the base branch from main to graphite-base/598 October 11, 2025 14:09

Pratham-Mishra04 force-pushed the graphite-base/598 branch from 298c854 to f9dda5e Compare October 11, 2025 14:09

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from cc59aff to 4a84b84 Compare October 11, 2025 14:09

Pratham-Mishra04 changed the base branch from graphite-base/598 to 10-11-enhancement_added_anthropic_stream_chunk_convertor October 11, 2025 14:09

Pratham-Mishra04 mentioned this pull request Oct 11, 2025

feat: improve Anthropic streaming and raw response support for streaming #600

Merged

18 tasks

coderabbitai bot requested changes Oct 11, 2025

View reviewed changes

coderabbitai bot reviewed Oct 11, 2025

View reviewed changes

Pratham-Mishra04 force-pushed the 10-11-enhancement_added_anthropic_stream_chunk_convertor branch from f9dda5e to fbe1158 Compare October 11, 2025 14:29

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from 4a84b84 to bb24cd9 Compare October 11, 2025 14:29

coderabbitai bot requested changes Oct 11, 2025

View reviewed changes

core/providers/openai.go Show resolved Hide resolved

core/schemas/chatcompletions.go Show resolved Hide resolved

coderabbitai bot reviewed Oct 11, 2025

View reviewed changes

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from bb24cd9 to b5451ac Compare October 11, 2025 15:48

Pratham-Mishra04 mentioned this pull request Oct 11, 2025

feat: add Anthropic responses streaming support #601

Merged

8 tasks

Pratham-Mishra04 force-pushed the 10-11-enhancement_added_anthropic_stream_chunk_convertor branch from fbe1158 to 7ceab9c Compare October 12, 2025 11:17

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from b5451ac to a5f25b3 Compare October 12, 2025 11:17

Pratham-Mishra04 force-pushed the 10-11-enhancement_added_anthropic_stream_chunk_convertor branch from 7ceab9c to a55ad25 Compare October 12, 2025 11:21

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from a5f25b3 to 1ba34d3 Compare October 12, 2025 11:21

coderabbitai bot requested changes Oct 12, 2025

View reviewed changes

core/providers/openai.go Outdated Show resolved Hide resolved

coderabbitai bot requested changes Oct 12, 2025

View reviewed changes

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from 1ba34d3 to 791aa46 Compare October 12, 2025 12:29

Pratham-Mishra04 mentioned this pull request Oct 12, 2025

feat: add Bedrock responses streaming support #604

Merged

8 tasks

akshaydeo changed the base branch from 10-11-enhancement_added_anthropic_stream_chunk_convertor to graphite-base/598 October 12, 2025 12:35

akshaydeo force-pushed the 10-11-feat_responses_streaming_added branch from 791aa46 to fe69930 Compare October 12, 2025 12:36

graphite-app bot changed the base branch from graphite-base/598 to main October 12, 2025 12:37

akshaydeo force-pushed the 10-11-feat_responses_streaming_added branch from fe69930 to ea5a3e4 Compare October 12, 2025 12:37

coderabbitai bot requested changes Oct 12, 2025

View reviewed changes

core/schemas/responses.go Show resolved Hide resolved

TejasGhatte mentioned this pull request Oct 13, 2025

feat: added responses stream in cohere #608

Merged

18 tasks

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from ea5a3e4 to 3f77d16 Compare October 13, 2025 05:23

coderabbitai bot requested changes Oct 13, 2025

View reviewed changes

core/providers/vertex.go Outdated Show resolved Hide resolved

tests/core-providers/openrouter_test.go Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch 2 times, most recently from bf5a247 to 43efca8 Compare October 13, 2025 06:20

coderabbitai bot requested changes Oct 13, 2025

View reviewed changes

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from 43efca8 to d4f79e1 Compare October 13, 2025 07:29

coderabbitai bot requested changes Oct 13, 2025

View reviewed changes

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch 2 times, most recently from 99da354 to 39465aa Compare October 13, 2025 08:16

coderabbitai bot requested changes Oct 13, 2025

View reviewed changes

core/providers/vertex.go Show resolved Hide resolved

core/schemas/textcompletions.go Show resolved Hide resolved

tests/core-providers/azure_test.go Show resolved Hide resolved

tests/core-providers/scenarios/responses_stream.go Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from 39465aa to 19cab6c Compare October 13, 2025 08:46

coderabbitai bot approved these changes Oct 13, 2025

View reviewed changes

coderabbitai bot requested changes Oct 13, 2025

View reviewed changes

core/providers/ollama.go Show resolved Hide resolved

core/schemas/mux.go Show resolved Hide resolved

transports/bifrost-http/handlers/inference.go Show resolved Hide resolved

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from 19cab6c to 4daa291 Compare October 13, 2025 09:09

feat: responses streaming added

301ef3b

Pratham-Mishra04 force-pushed the 10-11-feat_responses_streaming_added branch from 4daa291 to 301ef3b Compare October 13, 2025 09:16

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

coderabbitai bot approved these changes Oct 13, 2025

View reviewed changes

akshaydeo merged commit 18b7d89 into main Oct 13, 2025
3 checks passed

akshaydeo deleted the 10-11-feat_responses_streaming_added branch October 13, 2025 09:55

akshaydeo mentioned this pull request Oct 13, 2025

[Feature]: Implement OpenAI Responses streaming API #615

Closed

1 task

coderabbitai bot mentioned this pull request Oct 14, 2025

docs: documentation update for 1.3 #613

Merged

18 tasks

feat: added responses api streaming #598

feat: added responses api streaming #598

Uh oh!

Conversation

Pratham-Mishra04 commented Oct 11, 2025

Summary

Changes

Type of change

Affected areas

How to test

Breaking changes

Security considerations

Checklist

Uh oh!

Pratham-Mishra04 commented Oct 11, 2025 • edited by TejasGhatte Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pratham-Mishra04 commented Oct 11, 2025 •

edited by TejasGhatte

Loading

coderabbitai bot commented Oct 11, 2025 •

edited

Loading

akshaydeo commented Oct 13, 2025 •

edited

Loading