Skip to content

Conversation

@JohnC0de
Copy link
Contributor

@JohnC0de JohnC0de commented Jan 7, 2026

Summary

Adds a lightweight, off-by-default tracing system to help diagnose Windows crashes that persist after PR #543.

Background

After merging #543 (which fixed the Bun ShellInterpreter GC crash), crashes continued on Windows. Without visibility into what happens before crashes, debugging was difficult. This PR adds structured tracing to capture high-signal events.

Features

  • Ring buffer tracer with JSONL output (survives most crashes)
  • Traces key operations:
    • All spawn operations (session-notification, LSP, subagents)
    • Background agent lifecycle
    • Crash handlers (uncaughtException, unhandledRejection, signals)
  • debug-bundle CLI command for diagnostic collection
  • Automatic sensitive data redaction
  • OFF by default - enable via OMO_DEBUG=1 environment variable

Usage

# Enable tracing
$env:OMO_DEBUG = "1"
opencode

# After crash, create diagnostic bundle
bunx oh-my-opencode debug-bundle

Log file location: %TEMP%\oh-my-opencode-trace.jsonl

Files Changed

Category Files
Core Tracer src/shared/debug-tracer/* (5 files)
CLI Command src/cli/debug-bundle/index.ts
Integrations session-notification, background-agent, LSP client, call-omo-agent
Documentation docs/DEBUG_TRACING.md

Testing

  • TypeScript typecheck passes
  • Build succeeds
  • Manual verification: tracing captures events correctly

Related


Summary by cubic

Adds an off-by-default crash tracing system for Windows to capture high-signal events and produce JSONL logs for faster diagnosis of crashes persisting after PR #543. It records key spawn and lifecycle events and sets up crash handlers to preserve trace history.

  • New Features

    • Ring buffer tracer with JSONL output, periodic/emergency flush, and automatic redaction.
    • Instrumented spawns and lifecycles across notifications, background agents, and LSP.
    • Crash handlers for uncaughtException, unhandledRejection, and SIGINT/SIGTERM/SIGBREAK.
    • Type-safe span API to measure durations and correlate events.
  • Migration

    • Enable tracing by setting OMO_DEBUG=1; default log path is %TEMP%\oh-my-opencode-trace.jsonl (override with OMO_DEBUG_LOG).
    • After a crash, run: bunx oh-my-opencode debug-bundle to generate a shareable diagnostic bundle.

Written for commit f07f100. Summary will update on new commits.

…ging

Adds a lightweight, off-by-default tracing system to help diagnose Windows crashes.

Features:
- Ring buffer-based tracer with JSONL output
- Traces all spawn operations (session-notification, LSP, subagents)
- Crash handlers (uncaughtException, unhandledRejection, signals)
- debug-bundle CLI command for diagnostic collection
- Automatic sensitive data redaction
- Enable via OMO_DEBUG=1 environment variable

This helps identify the root cause of crashes that persist after PR code-yeongyu#543.

Refs: oven-sh/bun#25798, anomalyco/opencode#3267
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 16 files

Confidence score: 2/5

  • The src/shared/debug-tracer/tracer.ts handler flushes logs but keeps running after an uncaught exception, leaving the process in an undefined state and risking corrupt behavior for every request that follows.
  • src/shared/debug-tracer/traced-spawn.ts resolves silently when a spawn fails, so callers can’t distinguish between a broken child process and a clean exit, which will mask failures in production pipelines.
  • Additional correctness gaps (unbalanced span handling, hardcoded CLI version) show instrumentation logic isn’t closing the loop yet, compounding the overall risk in this PR.
  • Pay close attention to src/shared/debug-tracer/tracer.ts, src/shared/debug-tracer/traced-spawn.ts - both contain regression-level reliability issues that need fixes before merging.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/cli/debug-bundle/index.ts">

<violation number="1" location="src/cli/debug-bundle/index.ts:39">
P2: Version is hardcoded instead of being dynamically retrieved. This will get out of sync with the actual package version. Consider using `getCachedVersion()` from the auto-update-checker like other parts of the codebase.</violation>
</file>

<file name="src/shared/debug-tracer/tracer.ts">

<violation number="1" location="src/shared/debug-tracer/tracer.ts:435">
P2: The redaction check `lowerKey.includes("key")` is overly broad and will redact legitimate non-sensitive fields like `primaryKey`, `sortKey`, `keyboardLayout`, `keyframe`, etc. Consider using more specific patterns like `apiKey`, `secretKey`, or checking for `key` as a suffix/word boundary.</violation>

<violation number="2" location="src/shared/debug-tracer/tracer.ts:489">
P1: The `uncaughtException` handler doesn't terminate the process after flushing. After an uncaught exception, Node.js considers the application to be in an undefined state, and continuing execution is dangerous. The handler should call `process.exit(1)` after the emergency flush to prevent undefined behavior.</violation>
</file>

<file name="src/tools/call-omo-agent/tools.ts">

<violation number="1" location="src/tools/call-omo-agent/tools.ts:43">
P2: Span is started but never ended. The `spanId` returned by `startSpan()` should be passed to `endSpan()` when the operation completes (or errors). Without this, the span remains orphaned in `activeSpans` and no duration/completion event is recorded, defeating the purpose of the tracing system.</violation>
</file>

<file name="docs/DEBUG_TRACING.md">

<violation number="1" location="docs/DEBUG_TRACING.md:172">
P3: Inconsistent path separator: earlier in the doc uses backslash (`\`) for Windows paths, but this table entry uses forward slash (`/`). Should use `%TEMP%\oh-my-opencode-trace.jsonl` for consistency with line 53.</violation>
</file>

<file name="src/shared/debug-tracer/traced-spawn.ts">

<violation number="1" location="src/shared/debug-tracer/traced-spawn.ts:118">
P1: Spawn errors are silently swallowed - the error handler discards the error and resolves with null values. The caller cannot distinguish between a failed spawn and a process that exited normally. Consider rejecting the promise on error so callers can handle spawn failures appropriately.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


private setupCrashHandlers(): void {
// uncaughtException
process.on("uncaughtException", (error) => {
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The uncaughtException handler doesn't terminate the process after flushing. After an uncaught exception, Node.js considers the application to be in an undefined state, and continuing execution is dangerous. The handler should call process.exit(1) after the emergency flush to prevent undefined behavior.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/shared/debug-tracer/tracer.ts, line 489:

<comment>The `uncaughtException` handler doesn't terminate the process after flushing. After an uncaught exception, Node.js considers the application to be in an undefined state, and continuing execution is dangerous. The handler should call `process.exit(1)` after the emergency flush to prevent undefined behavior.</comment>

<file context>
@@ -0,0 +1,549 @@
+
+  private setupCrashHandlers(): void {
+    // uncaughtException
+    process.on("uncaughtException", (error) => {
+      this.trace("crash.handler", "uncaughtException", undefined, error)
+      this.emergencyFlush("uncaughtException")
</file context>
Fix with Cubic

resolve({ exitCode: code, signal })
})

proc.on("error", () => {
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Spawn errors are silently swallowed - the error handler discards the error and resolves with null values. The caller cannot distinguish between a failed spawn and a process that exited normally. Consider rejecting the promise on error so callers can handle spawn failures appropriately.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/shared/debug-tracer/traced-spawn.ts, line 118:

<comment>Spawn errors are silently swallowed - the error handler discards the error and resolves with null values. The caller cannot distinguish between a failed spawn and a process that exited normally. Consider rejecting the promise on error so callers can handle spawn failures appropriately.</comment>

<file context>
@@ -0,0 +1,237 @@
+      resolve({ exitCode: code, signal })
+    })
+    
+    proc.on("error", () => {
+      resolve({ exitCode: null, signal: null })
+    })
</file context>
Fix with Cubic

// Create bundle
const bundle: Record<string, unknown> = {
createdAt: new Date().toISOString(),
version: "2.14.1-debug.1",
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Version is hardcoded instead of being dynamically retrieved. This will get out of sync with the actual package version. Consider using getCachedVersion() from the auto-update-checker like other parts of the codebase.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/cli/debug-bundle/index.ts, line 39:

<comment>Version is hardcoded instead of being dynamically retrieved. This will get out of sync with the actual package version. Consider using `getCachedVersion()` from the auto-update-checker like other parts of the codebase.</comment>

<file context>
@@ -0,0 +1,178 @@
+  // Create bundle
+  const bundle: Record<string, unknown> = {
+    createdAt: new Date().toISOString(),
+    version: "2.14.1-debug.1",
+    tracing: {
+      wasEnabled: tracingWasEnabled,
</file context>
Fix with Cubic

lowerKey.includes("token") ||
lowerKey.includes("secret") ||
lowerKey.includes("password") ||
lowerKey.includes("key") ||
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The redaction check lowerKey.includes("key") is overly broad and will redact legitimate non-sensitive fields like primaryKey, sortKey, keyboardLayout, keyframe, etc. Consider using more specific patterns like apiKey, secretKey, or checking for key as a suffix/word boundary.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/shared/debug-tracer/tracer.ts, line 435:

<comment>The redaction check `lowerKey.includes("key")` is overly broad and will redact legitimate non-sensitive fields like `primaryKey`, `sortKey`, `keyboardLayout`, `keyframe`, etc. Consider using more specific patterns like `apiKey`, `secretKey`, or checking for `key` as a suffix/word boundary.</comment>

<file context>
@@ -0,0 +1,549 @@
+        lowerKey.includes("token") ||
+        lowerKey.includes("secret") ||
+        lowerKey.includes("password") ||
+        lowerKey.includes("key") ||
+        lowerKey.includes("auth") ||
+        lowerKey.includes("credential")
</file context>
Fix with Cubic

log(`[call_omo_agent] Starting with agent: ${args.subagent_type}, background: ${args.run_in_background}`)

// Trace agent invocation
const spanId = isTracingEnabled()
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Span is started but never ended. The spanId returned by startSpan() should be passed to endSpan() when the operation completes (or errors). Without this, the span remains orphaned in activeSpans and no duration/completion event is recorded, defeating the purpose of the tracing system.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/call-omo-agent/tools.ts, line 43:

<comment>Span is started but never ended. The `spanId` returned by `startSpan()` should be passed to `endSpan()` when the operation completes (or errors). Without this, the span remains orphaned in `activeSpans` and no duration/completion event is recorded, defeating the purpose of the tracing system.</comment>

<file context>
@@ -37,6 +38,16 @@ export function createCallOmoAgent(
       log(`[call_omo_agent] Starting with agent: ${args.subagent_type}, background: ${args.run_in_background}`)
+      
+      // Trace agent invocation
+      const spanId = isTracingEnabled() 
+        ? startSpan("subagent.start", `call_omo_agent.${args.subagent_type}`, {
+            agent: args.subagent_type,
</file context>
Fix with Cubic

| Variable | Description | Default |
|----------|-------------|---------|
| `OMO_DEBUG` | Enable tracing (`1` or `true`) | Not set (disabled) |
| `OMO_DEBUG_LOG` | Custom log file path | `%TEMP%/oh-my-opencode-trace.jsonl` |
Copy link

@cubic-dev-ai cubic-dev-ai bot Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Inconsistent path separator: earlier in the doc uses backslash (\) for Windows paths, but this table entry uses forward slash (/). Should use %TEMP%\oh-my-opencode-trace.jsonl for consistency with line 53.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/DEBUG_TRACING.md, line 172:

<comment>Inconsistent path separator: earlier in the doc uses backslash (`\`) for Windows paths, but this table entry uses forward slash (`/`). Should use `%TEMP%\oh-my-opencode-trace.jsonl` for consistency with line 53.</comment>

<file context>
@@ -0,0 +1,176 @@
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `OMO_DEBUG` | Enable tracing (`1` or `true`) | Not set (disabled) |
+| `OMO_DEBUG_LOG` | Custom log file path | `%TEMP%/oh-my-opencode-trace.jsonl` |
+
+## Version
</file context>
Fix with Cubic

JohnC0de added a commit to JohnC0de/oh-my-opencode that referenced this pull request Jan 7, 2026
- Add custom error classes (LSPServerUnavailableError, LSPServerExitedError) for graceful failure handling
- Track server availability with cooldown-based retry mechanism (5 min cooldown, 3 max retries)
- Wrap spawn() in try-catch to catch ENOENT/EACCES errors immediately
- Add safeWrite() method to handle stdin errors without crashing
- Track stdinWritable state to prevent writes to destroyed streams
- Handle warmup failures gracefully with console.warn instead of throwing
- Skip unavailable servers on subsequent requests to avoid repeated spawn failures

This prevents the crash discovered via debug tracing (PR code-yeongyu#571):
  spawn ENOENT -> uncaughtException -> ERR_STREAM_DESTROYED

Refs: code-yeongyu#571
@greptile-apps
Copy link

greptile-apps bot commented Jan 7, 2026

Greptile Summary

Adds a comprehensive off-by-default crash tracing system to help diagnose persistent Windows crashes. The implementation uses a ring buffer with JSONL output, automatic crash handlers, and sensitive data redaction.

Key changes:

  • Created src/shared/debug-tracer/ module with core tracer, spawn wrappers, and types
  • Integrated tracing into spawn operations, notifications, background agents, and LSP lifecycle
  • Added debug-bundle CLI command to collect diagnostics after crashes
  • Enabled via OMO_DEBUG=1 environment variable
  • Logs to %TEMP%\oh-my-opencode-trace.jsonl with periodic flush (5s) and emergency flush on crash

Architecture highlights:

  • Ring buffer (1000 events) survives most crashes
  • Crash handlers for uncaughtException, unhandledRejection, and signals
  • Automatic redaction of API keys, tokens, and home paths
  • Memory stats tracking on each event
  • Span-based duration tracking for operations

Issues found:

  • Memory leak in src/tools/call-omo-agent/tools.ts: span started but never ended, causing activeSpans Map to grow unbounded

Confidence Score: 4/5

  • This PR is generally safe to merge with one critical memory leak that needs fixing
  • The tracing system is well-designed with proper crash handlers, redaction, and off-by-default behavior. However, the memory leak in call-omo-agent where spans are never closed will cause the activeSpans Map to grow unbounded over long sessions, potentially contributing to the very crashes this PR aims to debug. The leak is localized and straightforward to fix.
  • Pay close attention to src/tools/call-omo-agent/tools.ts - the missing endSpan call must be fixed before merge

Important Files Changed

Filename Overview
src/shared/debug-tracer/tracer.ts Core tracer implementation with ring buffer, crash handlers, and JSONL logging - well-designed but has a minor memory leak risk in activeSpans
src/shared/debug-tracer/traced-spawn.ts Wrapper functions for spawn operations with automatic tracing and arg redaction - clean implementation with proper error handling
src/cli/debug-bundle/index.ts CLI command for generating diagnostic bundles - good redaction and user-friendly output
src/hooks/session-notification.ts Integrates tracing into notification system using tracedSpawnAsync wrapper - minimal changes with proper trace names
src/features/background-agent/manager.ts Adds tracing for background agent lifecycle events - clean integration with proper event tracking
src/tools/lsp/client.ts Adds tracing for LSP server lifecycle - captures spawn, exit, and error events with good detail
src/tools/call-omo-agent/tools.ts Adds span tracing for agent invocations - starts span but missing endSpan call on completion path

Sequence Diagram

sequenceDiagram
    participant App as oh-my-opencode
    participant Tracer as DebugTracer
    participant RingBuffer as Ring Buffer
    participant LogFile as trace.jsonl
    participant CrashHandler as Crash Handlers

    Note over App,Tracer: Initialization (OMO_DEBUG=1)
    App->>Tracer: initTracer()
    Tracer->>Tracer: Check OMO_DEBUG env var
    Tracer->>LogFile: Create WriteStream
    Tracer->>Tracer: Setup periodic flush (5s)
    Tracer->>CrashHandler: Register handlers (uncaughtException, etc)
    Tracer->>RingBuffer: trace("system.init")

    Note over App,LogFile: Normal Operation
    App->>Tracer: tracedSpawn(command, args)
    Tracer->>Tracer: startSpan("spawn.start")
    Tracer->>RingBuffer: Add event with memory stats
    App->>App: Execute spawn
    App->>Tracer: Process exits/errors
    Tracer->>Tracer: endSpan() with duration
    Tracer->>RingBuffer: Add completion event

    App->>Tracer: trace("notification.send")
    Tracer->>RingBuffer: Add event
    
    App->>Tracer: Background agent lifecycle
    Tracer->>Tracer: startSpan("subagent.start")
    Tracer->>RingBuffer: Add event
    
    loop Every 5 seconds
        Tracer->>Tracer: Periodic flush timer
        Tracer->>RingBuffer: Copy all events
        Tracer->>LogFile: Write JSONL (async)
        Tracer->>RingBuffer: Clear buffer
    end

    Note over App,CrashHandler: Crash Scenario
    App->>CrashHandler: uncaughtException/unhandledRejection
    CrashHandler->>Tracer: emergencyFlush()
    Tracer->>RingBuffer: Add crash marker
    Tracer->>LogFile: Synchronous writeFileSync
    CrashHandler->>App: Process exits

    Note over App,LogFile: Debug Bundle Creation
    App->>Tracer: debug-bundle command
    Tracer->>LogFile: Read JSONL
    Tracer->>Tracer: Parse last 500 events
    Tracer->>Tracer: Redact sensitive data
    Tracer->>App: Generate diagnostic JSON
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. src/tools/call-omo-agent/tools.ts, line 44-51 (link)

    logic: span started but never ended - causes memory leak in activeSpans Map

15 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@JohnC0de
Copy link
Contributor Author

JohnC0de commented Jan 7, 2026

Update: Root Cause Discovery

After further investigation, we discovered that the Windows crash was not caused by Bun's ShellInterpreter GC bug as originally suspected.

The Real Root Cause

The crash occurs when oh-my-opencode's session-notification runs alongside external notification plugins (e.g., @mohak34/opencode-notifier). Both plugins listen to session.idle and send concurrent notifications, causing resource contention on Windows.

Evidence:

  • User removed opencode-notifier → crashes stopped immediately
  • Release version (with original ctx.$) works fine when used alone
  • No widespread complaints from users without external notifiers

What This Means for This PR

This debug tracing system is still valuable for future debugging, but:

  1. The crash it was designed to diagnose had a different root cause
  2. The Bun ShellInterpreter GC issue referenced in this PR wasn't actually the culprit

The Proper Fix

PR #575 implements the actual fix:

See: #575 and docs/CRASH_INVESTIGATION_TIMELINE.md for full details.

Recommendation

This PR can still be merged if the tracing functionality is desired, but the context in the description should be updated to reflect that it's a general debugging tool, not a fix for the specific crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant