Skip to content

Conversation

@dsarno
Copy link
Collaborator

@dsarno dsarno commented Nov 1, 2025

Problem

The MCP server could become a zombie process when the client disconnected or the parent process exited, because it didn't reliably detect stdio detach or handle shutdown signals. (Noted in, and should fix #359)

Solution

Implemented robust shutdown coordination across all server entry points:

  • Signal handlers: handle SIGTERM, SIGINT, SIGPIPE (ignored), and SIGBREAK (Windows) to gracefully initiate shutdown
  • Stdin/parent monitor thread: background daemon thread that detects:
    • Stdin closure or invalid file descriptors
    • Parent process exit (with Windows-safe ValueError handling for unsupported os.kill(pid, 0))
  • Guarded exit timers: single threading.Event flag prevents multiple concurrent exit timers across shutdown paths
  • Force exit: simplified _force_exit() to direct os._exit() to bypass background threads and atexit handlers
  • Debug logging: added DEBUG-level logs in exception handlers to aid troubleshooting without impacting stability

Changes

  • Server/server.py: Added shutdown coordination with signal handlers, monitor thread, and guarded exit logic
  • UnityMcpBridge/UnityMcpServer~/src/server.py: Mirrored shutdown improvements
  • MCPForUnity/UnityMcpServer~/src/server.py: Mirrored shutdown improvements
  • tests/test_telemetry_server.py: Added autouse pytest fixture to chdir into UnityMcpServer~/src so telemetry can locate pyproject.toml for version detection

Testing

  • All existing tests pass (56 passed, 5 skipped, 7 xpassed)
  • Server now reliably exits when:
    • Client disconnects (stdin EOF)
    • Parent process terminates
    • SIGTERM/SIGINT received
    • Broken pipe occurs

Windows compatibility

  • Handles ValueError when os.kill(parent_pid, 0) is unsupported on Windows
  • Gracefully disables parent PID probing when platform doesn't support signal 0

This ensures the server exits cleanly in all scenarios, preventing zombie processes and improving resource cleanup.

Summary by CodeRabbit

  • New Features

    • Improved server shutdown mechanisms with enhanced signal handling for graceful termination.
    • Increased reliability of process lifecycle management across deployment environments.
  • Tests

    • Added comprehensive telemetry testing suite.
    • Removed legacy telemetry test module.

…r, forced exit)\nTests: move telemetry tests to tests/ and convert to asserts
…duplicates; fix Windows ValueError in parent monitor; tests: add autouse cwd fixture for telemetry to locate pyproject.toml
…guarded exit timers, and os._exit force-exit in UnityMcpServer~ entry points
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 1, 2025

Walkthrough

Signal handling and process lifecycle management have been added to multiple MCP server implementations. Three server.py files now include background monitors for stdin/parent-process detachment, signal handlers for SIGTERM/SIGINT, global shutdown coordination flags, and enhanced main() entry points. One telemetry test file has been removed and replaced with a new structured test module with enhanced fixture support.

Changes

Cohort / File(s) Summary
Signal handling & lifecycle enhancements
MCPForUnity/UnityMcpServer~/src/server.py, Server/server.py, UnityMcpBridge/UnityMcpServer~/src/server.py
Added global _shutdown_flag and _exit_timer_scheduled threading.Event primitives; introduced _force_exit(code: int = 0) for forceful process termination; added _signal_handler(signum, frame) to catch SIGTERM/SIGINT and initiate shutdown; added _monitor_stdin() daemon thread to detect parent process exit or stdin closure and trigger shutdown; enhanced main() to install signal handlers, start monitor thread, wrap mcp.run loop with exception handling (KeyboardInterrupt, SystemExit, BrokenPipeError), manage shutdown sequencing, and schedule forced exit as fallback.
Telemetry test migration
Server/test_telemetry.py (removed), tests/test_telemetry_server.py (added)
Removed Server/test_telemetry.py containing basic telemetry functional tests. Added tests/test_telemetry_server.py with test_telemetry_basic, test_telemetry_disabled, and test_data_storage functions; includes pyproject.toml fixture for workspace discovery and environment-aware telemetry reloading.

Sequence Diagram(s)

sequenceDiagram
    actor main as main()
    participant sig as Signal Handlers
    participant mon as Monitor Thread
    participant mcp as mcp.run()
    participant proc as OS

    main->>sig: Install signal handlers<br/>(SIGTERM, SIGINT, ...)
    main->>mon: Start _monitor_stdin()<br/>(daemon thread)
    par Monitor Loop
        mon->>proc: Poll stdin/parent
        Note over mon: Check for EOF or<br/>parent detachment
    and
        main->>mcp: Enter mcp.run(transport='stdio')
        proc->>sig: Send signal (SIGTERM/SIGINT)
        sig->>sig: _signal_handler triggered
        sig->>mon: Set _shutdown_flag
    end
    
    alt Graceful Path
        mon->>mcp: Detect shutdown_flag or stdin EOF
        mon->>mcp: Trigger controlled shutdown
        mcp->>main: Return normally
        main->>main: Schedule _force_exit() with short delay
    else Exception Path
        mcp->>main: Raise (KeyboardInterrupt, BrokenPipeError)
        main->>main: Catch exception, log, set _shutdown_flag
        main->>main: Call _force_exit(1) after delay
    else Timeout Path
        alt Not already scheduled
            main->>main: Schedule _force_exit via Timer
            Note over main: Ensures process exits<br/>if graceful path stalls
        end
    end
    
    main->>proc: os._exit(code)
    Note over proc: Forced abrupt termination
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

  • Key areas requiring attention:
    • Thread safety of global _shutdown_flag and _exit_timer_scheduled across three distinct server implementations—verify consistent usage patterns and edge cases around multiple signal arrivals
    • Signal handler registration differences per platform (SIGPIPE, SIGBREAK on Windows/POSIX systems); ensure portability
    • Monitor thread polling logic and stdin/parent process detection mechanism—subtle differences between Server/server.py and the other two implementations may exist
    • Integration with existing mcp.run() loop and telemetry timing; verify shutdown sequence doesn't interrupt active telemetry or logging
    • Redundancy and consistency: the same feature is applied to three server.py files in different paths—confirm all three are aligned and whether consolidation is planned

Possibly related PRs

Poem

🐰 Signal bells ring, and we listen with care,
Graceful goodbyes float on the air,
Monitors watching where parents may roam,
Timeout timers ensure we get home,
Clean exits promised, no stalling affair! 🛑✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning While the core shutdown signal handling changes across Server/server.py and its mirrored variants are squarely in scope, the PR includes test infrastructure changes that appear tangential to the primary objective. Specifically, the removal of Server/test_telemetry.py and addition of tests/test_telemetry_server.py represent telemetry test reorganization with fixture changes for working directory handling, which are unrelated to shutdown signal handling. These test changes should be addressed in a separate infrastructure or testing-focused PR rather than bundled with core shutdown logic.
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Server: Robust shutdown on stdio detach (signals, stdin/parent monitor, forced exit)" directly and accurately summarizes the main changes in the PR. It clearly identifies the three primary mechanisms introduced to address the shutdown problem: signal handlers, stdin/parent process monitoring, and forced exit logic. The title is concise, specific, and provides meaningful context for someone reviewing the commit history without overstating scope or including unnecessary noise.
Linked Issues Check ✅ Passed The PR implements robust shutdown coordination across multiple server implementations to address unreliable detection of stdio detach, which is noted as related to issue #359. The code changes include signal handlers for SIGTERM/SIGINT, a background thread to monitor stdin closure and parent process status, and forced exit logic with proper coordination. Testing results indicate 56 tests pass with existing test suites unbroken, and the PR summary confirms the server now reliably exits when clients disconnect, parent processes terminate, or signals are received. These implementations directly support the stated objective of preventing zombie server processes caused by undetected stdio detach events.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 040eb6d and 32b35c9.

📒 Files selected for processing (5)
  • MCPForUnity/UnityMcpServer~/src/server.py (3 hunks)
  • Server/server.py (3 hunks)
  • Server/test_telemetry.py (0 hunks)
  • UnityMcpBridge/UnityMcpServer~/src/server.py (3 hunks)
  • tests/test_telemetry_server.py (1 hunks)
💤 Files with no reviewable changes (1)
  • Server/test_telemetry.py
🧰 Additional context used
🪛 Ruff (0.14.2)
UnityMcpBridge/UnityMcpServer~/src/server.py

204-204: Unused function argument: frame

(ARG001)


241-241: Do not catch blind exception: Exception

(BLE001)


251-251: Do not catch blind exception: Exception

(BLE001)


264-266: try-except-pass detected, consider logging the exception

(S110)


264-264: Do not catch blind exception: Exception

(BLE001)

MCPForUnity/UnityMcpServer~/src/server.py

201-201: Unused function argument: frame

(ARG001)


238-238: Do not catch blind exception: Exception

(BLE001)


248-248: Do not catch blind exception: Exception

(BLE001)


262-264: try-except-pass detected, consider logging the exception

(S110)


262-262: Do not catch blind exception: Exception

(BLE001)

Server/server.py

201-201: Unused function argument: frame

(ARG001)


238-238: Do not catch blind exception: Exception

(BLE001)


248-248: Do not catch blind exception: Exception

(BLE001)


262-264: try-except-pass detected, consider logging the exception

(S110)


262-262: Do not catch blind exception: Exception

(BLE001)

Comment on lines +217 to +236
if parent_pid is not None:
try:
os.kill(parent_pid, 0)
except ValueError:
# Signal 0 unsupported on this platform (e.g., Windows); disable parent probing
parent_pid = None
except (ProcessLookupError, OSError):
logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
break

try:
if sys.stdin.closed:
logger.info("stdin.closed is True; client disconnected")
break
fd = sys.stdin.fileno()
if fd < 0:
logger.info("stdin fd invalid; client disconnected")
break
except (ValueError, OSError, AttributeError):
# Closed pipe or unavailable stdin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent false parent-death detection on Windows.

Identical to the bridge variant: os.kill(parent_pid, 0) throws PermissionError/EINVAL on Windows when signal 0 isn’t supported, yet this code treats every OSError as “parent vanished” and forces shutdown. Any Windows run will therefore exit immediately. Please gate on the errno, disable probing for the unsupported cases, and keep exiting only when we actually get ProcessLookupError.

@@
-import signal
-import sys
-import threading
+import errno
+import signal
+import sys
+import threading
@@
-                except ValueError:
-                    # Signal 0 unsupported on this platform (e.g., Windows); disable parent probing
-                    parent_pid = None
-                except (ProcessLookupError, OSError):
+                except ValueError:
+                    parent_pid = None
+                except ProcessLookupError:
                     logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
                     break
+                except OSError as exc:
+                    if exc.errno in (errno.EPERM, errno.EACCES, errno.EINVAL, errno.ENOSYS):
+                        logger.debug("Parent probe unsupported; disabling parent monitoring", exc_info=True)
+                        parent_pid = None
+                        continue
+                    logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
+                    break
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if parent_pid is not None:
try:
os.kill(parent_pid, 0)
except ValueError:
# Signal 0 unsupported on this platform (e.g., Windows); disable parent probing
parent_pid = None
except (ProcessLookupError, OSError):
logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
break
try:
if sys.stdin.closed:
logger.info("stdin.closed is True; client disconnected")
break
fd = sys.stdin.fileno()
if fd < 0:
logger.info("stdin fd invalid; client disconnected")
break
except (ValueError, OSError, AttributeError):
# Closed pipe or unavailable stdin
if parent_pid is not None:
try:
os.kill(parent_pid, 0)
except ValueError:
parent_pid = None
except ProcessLookupError:
logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
break
except OSError as exc:
if exc.errno in (errno.EPERM, errno.EACCES, errno.EINVAL, errno.ENOSYS):
logger.debug("Parent probe unsupported; disabling parent monitoring", exc_info=True)
parent_pid = None
continue
logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
break
try:
if sys.stdin.closed:
logger.info("stdin.closed is True; client disconnected")
break
fd = sys.stdin.fileno()
if fd < 0:
logger.info("stdin fd invalid; client disconnected")
break
except (ValueError, OSError, AttributeError):
# Closed pipe or unavailable stdin
🤖 Prompt for AI Agents
In MCPForUnity/UnityMcpServer~/src/server.py around lines 217-236, the
parent-probing code treats all OSErrors as "parent vanished" which causes
immediate exits on Windows; change the exception handling so PermissionError and
OSError with errno indicating signal-0 unsupported (e.g., errno.EINVAL or
errno.EPERM) do not trigger shutdown but instead disable parent probing by
setting parent_pid = None, while only ProcessLookupError should log "parent no
longer exists" and break; use the errno module to inspect e.errno and keep the
existing ValueError handling as-is.

Comment on lines +217 to +236
if parent_pid is not None:
try:
os.kill(parent_pid, 0)
except ValueError:
# Signal 0 unsupported on this platform (e.g., Windows); disable parent probing
parent_pid = None
except (ProcessLookupError, OSError):
logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
break

try:
if sys.stdin.closed:
logger.info("stdin.closed is True; client disconnected")
break
fd = sys.stdin.fileno()
if fd < 0:
logger.info("stdin fd invalid; client disconnected")
break
except (ValueError, OSError, AttributeError):
# Closed pipe or unavailable stdin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against unsupported signal 0 on Windows.

Here too, os.kill(parent_pid, 0) will raise PermissionError/EINVAL on Windows whenever signal 0 isn’t supported. Treating all OSError values like ProcessLookupError means the server kills itself immediately even though the parent is fine. Please special‑case those errnos, disable parent probing, and only shut down when we genuinely see ProcessLookupError.

@@
-import signal
-import sys
-import threading
+import errno
+import signal
+import sys
+import threading
@@
-                except ValueError:
-                    # Signal 0 unsupported on this platform (e.g., Windows); disable parent probing
-                    parent_pid = None
-                except (ProcessLookupError, OSError):
+                except ValueError:
+                    parent_pid = None
+                except ProcessLookupError:
                     logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
                     break
+                except OSError as exc:
+                    if exc.errno in (errno.EPERM, errno.EACCES, errno.EINVAL, errno.ENOSYS):
+                        logger.debug("Parent probe unsupported; disabling parent monitoring", exc_info=True)
+                        parent_pid = None
+                        continue
+                    logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
+                    break

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In Server/server.py around lines 217-236, the except handling for
os.kill(parent_pid, 0) treats all OSError the same and can cause premature
shutdown on Windows; change it to special-case signal-not-supported errors by
importing errno and checking the exception errno: if the caught exception is
PermissionError or has errno in (errno.EINVAL, errno.EPERM) then set parent_pid
= None (disable parent probing) and continue, but only log and break for
ProcessLookupError or errors with errno == errno.ESRCH that indicate the parent
truly doesn't exist; re-raise or handle other unexpected errors appropriately.

Comment on lines +218 to +237
break

if parent_pid is not None:
try:
os.kill(parent_pid, 0)
except ValueError:
# Signal 0 unsupported on this platform (e.g., Windows); disable parent probing
parent_pid = None
except (ProcessLookupError, OSError):
logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
break

try:
if sys.stdin.closed:
logger.info("stdin.closed is True; client disconnected")
break
fd = sys.stdin.fileno()
if fd < 0:
logger.info("stdin fd invalid; client disconnected")
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix parent-process probe on Windows to avoid false shutdowns.

On Windows, os.kill(parent_pid, 0) often raises PermissionError or OSError with errno=EINVAL/EPERM even when the parent is still alive (signal 0 simply isn’t supported). Because we route every OSError through the ProcessLookupError path, the server will immediately assume the parent died and tear itself down. That turns a healthy Windows deploy into an instant self‑shutdown. Please treat those “unsupported” errnos like the ValueError case and disable parent probing instead of exiting. Add the necessary errno import and keep shutting down only when we genuinely see ProcessLookupError (parent gone).

@@
-import signal
-import sys
-import threading
+import errno
+import signal
+import sys
+import threading
@@
-                except ValueError:
-                    # Signal 0 unsupported on this platform (e.g., Windows); disable parent probing
-                    parent_pid = None
-                except (ProcessLookupError, OSError):
+                except ValueError:
+                    parent_pid = None
+                except ProcessLookupError:
                     logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
                     break
+                except OSError as exc:
+                    if exc.errno in (errno.EPERM, errno.EACCES, errno.EINVAL, errno.ENOSYS):
+                        logger.debug("Parent probe unsupported; disabling parent monitoring", exc_info=True)
+                        parent_pid = None
+                        continue
+                    logger.info(f"Parent process {parent_pid} no longer exists; shutting down")
+                    break

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In UnityMcpBridge/UnityMcpServer~/src/server.py around lines 218 to 237, the
parent-process probe treats all OSError from os.kill(parent_pid, 0) as
process-not-found and triggers shutdown; modify the exception handling to treat
Windows "unsupported" errors (PermissionError or OSError with errno
EINVAL/EPERM) the same as the existing ValueError case by disabling parent
probing instead of exiting, and only shut down when ProcessLookupError (parent
gone) is raised; add an import for errno at the top and check exception.errno
against errno.EINVAL and errno.EPERM (or handle PermissionError separately) to
decide to set parent_pid = None rather than logging shutdown and breaking.

@dsarno dsarno merged commit ca01fc7 into CoplayDev:main Nov 1, 2025
1 check passed
dsarno added a commit to dsarno/unity-mcp that referenced this pull request Nov 1, 2025
dsarno added a commit that referenced this pull request Nov 1, 2025
@dsarno dsarno mentioned this pull request Nov 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

initialization timeout

1 participant