Skip to content

Conversation

justin808
Copy link
Member

@justin808 justin808 commented Oct 5, 2025

Summary

  • Added process group management for better child process cleanup
  • Added warning logging when process group kill fails with EPERM errors
  • Improved server readiness check to only accept 200-399 HTTP status codes (rejecting 404 and 5xx)

Benefits

  • Better debugging: Process termination issues are now visible in logs
  • More reliable tests: Server readiness checks won't pass prematurely on 404 or error responses
  • Fewer flaky tests: Proper health checks reduce timing-related failures

Changes

  1. Process Group Kill with Error Logging (lib/cypress_on_rails/server.rb:144-153)

    • Captures process group ID on spawn
    • Attempts to kill entire process group first
    • Falls back to single process kill on EPERM
    • Logs warnings when process group kill fails
  2. HTTP-Based Server Readiness Check (lib/cypress_on_rails/server.rb:125-132)

    • Changed from TCP socket check to HTTP response validation
    • Only accepts 200-399 status codes as "ready"
    • Rejects 404 and 5xx responses
    • More accurately determines when app is ready to serve requests

Test Plan

  • Existing RSpec suite passes (3 pre-existing failures unrelated to changes)
  • Manual testing with Rails server startup/shutdown cycles
  • Verify process cleanup in production use

Fixes #198

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Expose server readiness options: configurable readiness path and timeout for HTTP readiness checks.
  • Bug Fixes

    • Improved reliability of the embedded server lifecycle with HTTP-based readiness checks.
    • Graceful shutdown of server and child processes to reduce hangs and orphaned processes.
    • Safer termination with error handling when no PID or group termination fails; reduced start/stop flakiness.
  • Chores

    • Improved shutdown logging and environment cleanup for more consistent runs.

- Add process group management for better child process cleanup
- Add warning logging when process group kill fails with EPERM
- Improve server readiness check to accept only 200-399 HTTP status codes
- Reject 404 and 5xx responses as "not ready" for more reliable health checks

These improvements provide better debugging visibility and reduce flaky tests
by ensuring the server is truly ready before running tests.

Fixes #198

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

coderabbitai bot commented Oct 5, 2025

Warning

Rate limit exceeded

@justin808 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 15 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1867d and 501b0b4.

📒 Files selected for processing (3)
  • lib/cypress_on_rails/configuration.rb (2 hunks)
  • lib/cypress_on_rails/server.rb (3 hunks)
  • spec/cypress_on_rails/configuration_spec.rb (2 hunks)

Walkthrough

Adds HTTP-based readiness checks and process-group-aware server spawning/termination to lib/cypress_on_rails/server.rb; records PID/PGID, uses Net::HTTP for health checks (accepting 200–399), adds safe_kill and send_term_signal fallbacks, and exposes readiness config via configuration.rb.

Changes

Cohort / File(s) Summary
Server lifecycle and readiness
lib/cypress_on_rails/server.rb
Add require 'net/http'; spawn stores @server_pid/@server_pgid; replace TCP loop with server_responding? (HTTP GET, accept 200–399); wait_for_server uses readiness path/timeout; stop_server handles nil pid, calls send_term_signal with ESRCH/EPERM handling and falls back to safe_kill_process; ensure ENV cleanup and guarded transaction rollback.
Configuration: readiness options
lib/cypress_on_rails/configuration.rb
Add attr_accessor :server_readiness_path and :server_readiness_timeout; initialize from CYPRESS_RAILS_READINESS_PATH (default "/") and CYPRESS_RAILS_READINESS_TIMEOUT (default 5) in reset.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester as Tester
  participant Server as CypressOnRails::Server
  participant OS as OS/Processes
  participant App as App Server (HTTP)

  Tester->>Server: start_server()
  Server->>OS: spawn process (record pid & pgid)
  OS-->>Server: pid, pgid
  loop wait_for_server (timeout)
    Server->>App: HTTP GET base_url + readiness_path
    App-->>Server: 2xx/3xx or other/error
    alt Ready (200–399)
      Server-->>Tester: ready
    else Not ready
      Server-->>Server: sleep/retry
    end
  end
Loading
sequenceDiagram
  autonumber
  actor Tester as Tester
  participant Server as CypressOnRails::Server
  participant OS as OS/Signals

  Tester->>Server: stop_server()
  alt server_pgid present
    Server->>OS: TERM -pgid
    alt ESRCH/EPERM
      Server->>OS: log warning
      Server->>OS: TERM pid (safe_kill_process)
    end
  else only pid
    Server->>OS: TERM pid (safe_kill_process)
  end
  Server-->>Tester: stopped/cleaned
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Poem

A rabbit taps the server log,
Spawns with care, records each prog;
Checks the site—two hundred cheers,
If signals fail, it calms your fears.
PIDs hugged close, tests hop along. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning In addition to the cleanup and HTTP readiness improvements specified in issue #198, the PR adds new configuration attributes for server_readiness_path and server_readiness_timeout along with environment-driven initialization, which were not requested by the linked issue objectives. Remove the unrelated configuration additions from this PR or update the linked issue to explicitly include these new server_readiness_path and server_readiness_timeout options.
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary changes in the PR by highlighting improvements to server process cleanup and readiness checks, matching the modifications to process group termination and HTTP-based health checks.
Linked Issues Check ✅ Passed The implemented changes fully address the objectives of issue #198 by introducing send_term_signal with warning logs on Errno::EPERM and fallback to single-process kill, and replacing the TCP readiness loop with an HTTP-based server_responding? check that accepts only 2xx–3xx responses and handles connection errors.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

claude bot commented Oct 5, 2025

Pull Request Review: Improve server process cleanup and readiness checks

Overview

This PR addresses process management and server readiness detection issues identified in #198. The changes improve debugging capabilities and test reliability.

Strengths

Code Quality

  • Clean implementation: The process group management is well-structured with proper error handling
  • Good separation of concerns: Breaking out send_term_signal and safe_kill_process improves readability
  • Explicit state management: Storing @server_pgid alongside @server_pid makes the intent clear

Improved Reliability

  • Better health checks: The HTTP-based readiness check (200-399) is more accurate than TCP socket checking
  • Graceful degradation: Fallback from process group to single process kill is well-handled
  • Helpful logging: Warning messages will aid in debugging production issues

Issues and Suggestions

1. Potential Race Condition in spawn_server (lib/cypress_on_rails/server.rb:109-111)

Issue: There is a small window where the process could terminate before Process.getpgid is called, which would raise Errno::ESRCH.

Suggestion: Add error handling around Process.getpgid call.

2. Incomplete Exception Handling in server_responding? (lib/cypress_on_rails/server.rb:125-132)

Issue: The method does not handle all possible HTTP exceptions. Missing cases include Errno::ETIMEDOUT, Net::OpenTimeout, Net::ReadTimeout, and Net::HTTPBadResponse.

Suggestion: Add these exception types to the rescue clause.

3. Missing Instance Variable Initialization

Issue: @server_pgid is set in spawn_server but never initialized in initialize.

Suggestion: Initialize @server_pid and @server_pgid to nil in the initialize method.

4. Inconsistent Error Recovery in send_term_signal (lib/cypress_on_rails/server.rb:144-153)

Issue: When Process.kill fails with EPERM, the fallback only kills the single process. Any child processes will be orphaned.

Severity: Medium - Could lead to resource leaks in CI environments

Suggestion: Document this limitation with a comment explaining the orphaned process concern.

5. Missing Timeout in server_responding?

Issue: Net::HTTP.get_response has a default timeout of 60 seconds. Slow server starts could block the health check loop.

Suggestion: Add explicit short timeouts (2 seconds recommended) for both open_timeout and read_timeout.

Test Coverage

Missing: No tests were added for the new functionality.

Recommended test cases:

  1. Process group kill success
  2. Process group kill with EPERM fallback
  3. Server readiness with 200 response (ready)
  4. Server readiness with 404 response (not ready)
  5. Server readiness with 500 response (not ready)
  6. Server readiness with connection refused (not ready)

Security Considerations

No security concerns identified. The changes are defensive and improve reliability.

Performance Considerations

Positive: HTTP-based health check validates application layer, not just socket layer.

Minor: Each check makes a full HTTP request vs TCP connection (adds approximately 1-5ms per check).

Summary

This is a solid improvement to the gem reliability. Main recommendations:

  1. Must fix: Add broader exception handling to server_responding?
  2. Should fix: Add timeout configuration to HTTP requests
  3. Nice to have: Add tests for the new functionality
  4. Nice to have: Initialize instance variables in initialize

The core logic is sound and addresses real production issues. With the exception handling improvements, this is ready to merge.

Recommendation: Approve with minor revisions


Review generated with assistance from Claude Code

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
lib/cypress_on_rails/server.rb (1)

134-142: Consider adding timeout to process wait.

The Process.wait(pid) on line 139 will block indefinitely if the server process ignores the TERM signal. While this might match the previous behavior, adding a timeout with a fallback to KILL would make the shutdown more robust.

 def stop_server(pid)
   return unless pid

   puts "Stopping Rails server (PID: #{pid})"
   send_term_signal
-  Process.wait(pid)
+  
+  # Wait up to 10 seconds for graceful shutdown
+  begin
+    Timeout.timeout(10) do
+      Process.wait(pid)
+    end
+  rescue Timeout::Error
+    puts "Warning: Server did not terminate after TERM signal, sending KILL"
+    safe_kill_process('KILL', pid)
+    Process.wait(pid) rescue Errno::ESRCH
+  end
 rescue Errno::ESRCH
   # Process already terminated
 end
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a37c4c and 1672356.

📒 Files selected for processing (1)
  • lib/cypress_on_rails/server.rb (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: claude-review
  • GitHub Check: rails_8
  • GitHub Check: rails_7_2
  • GitHub Check: rails_6_1
🔇 Additional comments (4)
lib/cypress_on_rails/server.rb (4)

4-4: LGTM!

Adding Net::HTTP from the standard library enables the HTTP-based readiness check without introducing external dependencies.


117-118: LGTM!

The HTTP-based readiness check correctly replaces the previous TCP socket approach, with a reasonable 0.1-second polling interval. The timeout is properly handled by the outer Timeout.timeout block.


144-153: LGTM with one observation!

The process group termination logic correctly:

  • Uses negative PGID (-@server_pgid) to signal the entire process group.
  • Falls back to single-process kill when PGID is unavailable.
  • Logs warnings when process group termination fails (per PR objectives).

Note: The Errno::EPERM fallback to single-process kill (line 152) may not succeed if you lack permission to kill the process group, but the primary goal is logging for debugging, which is achieved.


155-159: LGTM!

The implementation correctly guards against nil PIDs and gracefully handles the case where the process has already terminated.

- Initialize @server_pid and @server_pgid in constructor to avoid nil issues
- Add error handling for Process.getpgid race condition when process terminates immediately
- Add comprehensive exception handling in server_responding? (ETIMEDOUT, Net::OpenTimeout, Net::ReadTimeout, Net::HTTPBadResponse)
- Add configurable server_readiness_path with ENV var support (CYPRESS_RAILS_READINESS_PATH)
- Add explicit HTTP timeouts (1s open_timeout, 1s read_timeout) to prevent hangs
- Improve URI handling to support custom readiness check paths

These fixes address potential edge cases and race conditions in server lifecycle management.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

claude bot commented Oct 6, 2025

Code Review - PR #199: Improve server process cleanup and readiness checks

Overview

This PR addresses important reliability issues in server lifecycle management. The changes are well-motivated by issue #198 and improve both observability and correctness of the server startup/shutdown process.


✅ Strengths

  1. Well-scoped changes - Focused on specific, high-value improvements
  2. Good error handling - Proper fallback logic with informative warnings
  3. Configurable readiness path - The server_readiness_path configuration option adds valuable flexibility
  4. Better health checks - HTTP-based checks are more reliable than TCP socket checks

🔍 Code Quality & Best Practices

Good:

  • Clear separation of concerns with send_term_signal and safe_kill_process methods
  • Proper exception handling for process-related errors (ESRCH, EPERM)
  • Comprehensive rescue clauses in server_responding? covering various network failure scenarios

Suggestions:

  1. Instance variable initialization (lib/cypress_on_rails/server.rb:20-21)

    • Good addition, but consider documenting why these need to be instance variables
    • Could add a comment explaining the process group management approach
  2. Error message clarity (lib/cypress_on_rails/server.rb:165)

    • Consider using the configured logger instead of puts for consistency with the rest of the gem
  3. URI path handling (lib/cypress_on_rails/server.rb:138)

    • This logic is correct but could be simplified since uri.path should already be / by default

🐛 Potential Issues

Critical:

  1. Race condition in process group ID capture (lib/cypress_on_rails/server.rb:112-117)

    • Issue: If the process terminates before stop_server is called, @server_pgid could be stale or refer to a recycled PID
    • Suggestion: Consider validating the process still exists in send_term_signal before attempting to kill
  2. Missing timeout handling (lib/cypress_on_rails/server.rb:132-145)

    • The server_responding? method has 1-second timeouts which may be too short for slow CI systems
    • Suggestion: Consider making timeouts configurable

Medium Priority:

  1. Silent state mutation (lib/cypress_on_rails/server.rb:73, 95)

    • The start_server method mixes instance variables and parameters inconsistently
    • Suggestion: Either use instance variables throughout or keep it pure
  2. Incomplete error recovery (lib/cypress_on_rails/server.rb:164-167)

    • When process group kill fails with EPERM, single process kill may also fail
    • Suggestion: Add Errno::EPERM to the rescue clause in safe_kill_process as well

⚡ Performance Considerations

  1. Polling interval - sleep 0.1 checks 10 times per second, reasonable but could use exponential backoff
  2. HTTP request overhead - Minimal, and worth it for more accurate readiness detection

🔒 Security Concerns

  1. Process group kill security ✅ - Correctly implemented with proper fallback
  2. Environment variable for readiness path - Low risk (test env only), but document that this should be a static path
  3. Process signal handling ✅ - Using TERM signal is appropriate

🧪 Test Coverage

Missing tests for:

  1. Process group creation and cleanup logic
  2. HTTP-based readiness checks with various response codes (200, 301, 404, 500)
  3. Timeout behavior in wait_for_server
  4. EPERM error handling in process termination
  5. Configuration of server_readiness_path

Recommendation: Add a spec/cypress_on_rails/server_spec.rb file with unit tests


📋 Recommendations

Must Address:

  1. Add unit tests for the new logic (especially HTTP readiness checks)
  2. Consider using the configured logger instead of puts for consistency
  3. Add Errno::EPERM handling in safe_kill_process

Should Consider:

  1. Document the process group management approach (add code comments)
  2. Make HTTP timeout values configurable
  3. Add process existence validation before kill attempts
  4. Refactor to use either instance variables or parameters consistently

Nice to Have:

  1. Add exponential backoff to server polling
  2. Document security considerations for server_readiness_path in README

🎯 Verdict

Approval Status:Approve with minor changes

This is a solid improvement that addresses real reliability issues. The code is well-structured and the error handling is thoughtful. The missing test coverage is the main concern - these changes touch critical infrastructure code and should have corresponding tests to prevent regressions.

The suggestions above are mostly about polish and defensive programming. The core logic is sound and ready to merge once tests are added.

Estimated effort to address: 2-3 hours for comprehensive test coverage

Great work on this PR! 🚀

Code quality improvements:
- Add documentation comment for process management instance variables
- Replace puts with logger.warn for consistency with gem conventions
- Simplify URI path handling (remove redundant empty check)
- Add process_exists? helper to validate PID before kill operations

Error handling improvements:
- Add EPERM to safe_kill_process rescue clause for complete error coverage
- Validate process existence before attempting process group kill to avoid stale PGID issues
- Prevent attempts to kill recycled PIDs

Configuration improvements:
- Add configurable server_readiness_timeout (default: 5 seconds)
- Support CYPRESS_RAILS_READINESS_TIMEOUT environment variable
- Addresses potential timeout issues on slow CI systems

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

claude bot commented Oct 6, 2025

PR Review: Improve server process cleanup and readiness checks

Overall Assessment

This is a solid improvement to server lifecycle management. The changes enhance reliability and debugging capabilities. A few issues need attention before merging.

Critical Issues

1. Incorrect PID variable in stop_server (lib/cypress_on_rails/server.rb:150-158)

The stop_server(pid) method receives a parameter pid but uses instance variable @server_pid in send_term_signal and process_exists?.

Problem: The method is called with server_pid (local variable from line 74), but send_term_signal references @server_pid. If these differ, you will kill the wrong process or fail silently.

Fix: Either use the parameter throughout, or pass the parameter to send_term_signal

2. Race condition in spawn_server (lib/cypress_on_rails/server.rb:112-119)

Between spawning and calling Process.getpgid, the process might terminate, causing ESRCH. You catch this but set @server_pgid = nil, which might cause issues later.

Recommendation: Consider adding a small delay or retry logic, and document that @server_pgid may be nil in edge cases.

Security & Best Practices

3. HTTP timeout configuration (lib/cypress_on_rails/server.rb:133-148)

The server_readiness_timeout (default 5s) is used as both open_timeout and read_timeout in Net::HTTP.start. During the wait_for_server loop (which runs with 0.1s sleep), if a single HTTP request takes 5s to timeout, the polling is ineffective.

Recommendation: Use shorter timeouts for readiness checks (e.g., 1-2 seconds) since they are called repeatedly.

4. HTTP response code validation (lib/cypress_on_rails/server.rb:144)

3xx redirects might indicate the app is not fully ready. Consider being more strict and only accepting 200-299, or documenting why 3xx is acceptable.

Code Quality

5. Missing URI path handling (lib/cypress_on_rails/server.rb:140)

If uri.path is empty, this sends GET empty string instead of GET slash.

6. Missing test coverage

No tests for server_responding? method, send_term_signal, process group killing, or edge cases like @server_pgid = nil. Add unit tests for these critical paths.

Positive Observations

  1. Excellent logging for debugging process termination issues
  2. Graceful degradation from process group to single process kill
  3. Clear comments documenting process management intent
  4. Useful configuration flexibility with new options
  5. Proper exception handling

Recommendations Before Merging

  1. Fix the stop_server PID parameter issue (Critical)
  2. Add tests for new server_responding? logic
  3. Consider shorter HTTP timeouts for readiness checks
  4. Complete manual testing checklist
  5. Document new ENV variables in README

Great work improving reliability!

Critical fixes:
1. Fix PID parameter passing in stop_server/send_term_signal
   - Previously stop_server(pid) received parameter but used instance variable
   - Now consistently passes pid parameter through the call chain
   - Prevents potential bugs if pid values diverge

2. Fix HTTP timeout in readiness polling loop
   - Cap individual check timeout at 2 seconds max
   - Prevents 5s timeouts from blocking the 0.1s polling loop
   - Maintains responsiveness while supporting slow CI via configuration

3. Fix URI path handling edge case
   - Ensure GET path defaults to '/' if uri.path is empty
   - Prevents invalid HTTP requests with empty path

Documentation improvements:
- Document that @server_pgid may be nil in edge cases
- Clarify that 3xx redirects are considered valid "ready" state
- Add comments explaining timeout behavior in polling context

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@justin808
Copy link
Member Author

@claude review this pr.

Copy link

claude bot commented Oct 6, 2025

Claude finished @justin808's task —— View job


PR Review: Improve server process cleanup and readiness checks

Overall Assessment

This PR makes valuable improvements to server lifecycle management with better process cleanup and more reliable readiness checks. The implementation is generally solid, but there are several issues to address before merging.


✅ Strengths

  1. Well-motivated changes - Addresses real production issues documented in Enhancement: Improve server process cleanup and readiness checks #198
  2. Good error handling - Proper fallback logic with informative warnings for process termination failures
  3. Configurable readiness - The server_readiness_path and server_readiness_timeout configuration options add valuable flexibility
  4. Better health checks - HTTP-based checks (200-399 status codes) are more accurate than TCP socket checks
  5. Clear intent - Good code comments explaining the process management approach
  6. Comprehensive exception handling - The server_responding? method handles various network failure scenarios

🔴 Critical Issues

1. Race condition in PGID capture (lib/cypress_on_rails/server.rb:113-120)

Issue: Good catch on handling Errno::ESRCH when the process terminates immediately, but there's a subtle issue. The error handling sets @server_pgid = nil implicitly, but later code assumes if it was captured it's still valid.

Impact: Low - The fallback logic handles this, but it could be clearer.

Recommendation: The current implementation is actually correct. The error handling is appropriate and the warning is helpful.

2. Timeout configuration could be confusing (lib/cypress_on_rails/server.rb:140)

Issue: The line timeout = [config.server_readiness_timeout || 5, 2].min caps the timeout at 2 seconds, which means setting CYPRESS_RAILS_READINESS_TIMEOUT to anything > 2 has no effect per-request. This might confuse users.

Impact: Medium - Users may expect the env var to control per-request timeout.

Recommendation: Either:

  • Document this behavior clearly (the configured timeout is for overall wait, not per-request)
  • Use a separate config for per-request timeout
  • Remove the .min(2) and trust the configured value

3. 3xx redirect handling may be too lenient (lib/cypress_on_rails/server.rb:150)

Issue: Accepting 3xx redirects as "ready" assumes the redirect target is valid. A misconfigured app might redirect indefinitely.

Impact: Low - Most apps won't have this issue, but worth documenting.

Recommendation: Add a comment explaining why 3xx is acceptable (current comment is good) or consider being stricter with 200-299 only.


🟡 Medium Priority Issues

4. Missing validation in send_term_signal (lib/cypress_on_rails/server.rb:167-168)

Issue: The method checks process_exists?(pid) before killing the process group, but @server_pgid could be stale if the process has been recycled.

Impact: Low - Process groups are less likely to be recycled, but theoretically possible.

Recommendation: This is acceptable as-is. The error handling catches any issues.

5. Stop server might wait indefinitely (lib/cypress_on_rails/server.rb:161)

Issue: Process.wait(pid) blocks indefinitely if the server doesn't respond to TERM signal.

Impact: Medium - Could hang test runs if server doesn't shut down gracefully.

Recommendation: Consider adding a timeout with fallback to KILL:

def stop_server(pid)
  return unless pid

  puts "Stopping Rails server (PID: #{pid})"
  send_term_signal(pid)
  
  begin
    Timeout.timeout(10) do
      Process.wait(pid)
    end
  rescue Timeout::Error
    puts "Warning: Server did not terminate after TERM signal, sending KILL"
    safe_kill_process('KILL', pid)
    Process.wait(pid) rescue Errno::ESRCH
  end
rescue Errno::ESRCH
  # Process already terminated
end

🟢 Code Quality Observations

Good Practices

  1. Instance variable initialization (lib/cypress_on_rails/server.rb:21-22) - Explicit initialization of @server_pid and @server_pgid is good practice
  2. Defensive programming - Multiple nil checks and early returns
  3. Separation of concerns - send_term_signal, process_exists?, and safe_kill_process are well-factored
  4. Clear logging - Warning messages will help debug production issues

Minor Improvements

  1. Empty path handling (lib/cypress_on_rails/server.rb:145) - The check for empty path is good, though URI should already handle this. Defensive programming is fine here.

  2. Configuration defaults (lib/cypress_on_rails/configuration.rb:67-68) - Good use of ENV.fetch with sensible defaults


🔒 Security & Safety

No security concerns identified. The changes are defensive and improve reliability:

  • Process group killing is properly implemented with fallback
  • No command injection risks (all args are controlled)
  • Error handling prevents information leakage

🧪 Test Coverage

Critical Gap: No tests were added for the new functionality.

Recommended test cases:

  1. Process group creation and cleanup
  2. HTTP-based readiness checks with various response codes (200, 301, 404, 500)
  3. Timeout behavior in wait_for_server
  4. Errno::EPERM error handling in process termination
  5. Errno::ESRCH handling when process terminates immediately
  6. Configuration of server_readiness_path
  7. Empty/nil path handling

Note: The PR description mentions "3 pre-existing failures unrelated to changes" - these should be investigated or documented.


📋 Additional Recommendations

Documentation

  1. README update needed - Document the new environment variables:

    • CYPRESS_RAILS_READINESS_PATH (default: /)
    • CYPRESS_RAILS_READINESS_TIMEOUT (default: 5)
  2. Behavior clarification - Document that CYPRESS_RAILS_READINESS_TIMEOUT is the overall wait timeout, not per-request timeout (which is capped at 2 seconds)

Code Comments

The code comments are good, but consider adding:

  • Why 3xx redirects are acceptable in readiness checks
  • Why per-request timeout is capped at 2 seconds

🎯 Verdict

Recommendation: Approve with minor changes

This is a solid improvement that addresses real reliability issues. The core logic is sound and the error handling is thoughtful.

Must address before merge:

  1. Add unit tests for the new logic (especially HTTP readiness checks and process group handling)
  2. Consider adding timeout to Process.wait in stop_server to prevent hangs
  3. Document the new ENV variables in README

Should consider:

  1. Clarify the timeout configuration behavior (per-request vs overall)
  2. Document why 3xx responses are considered "ready"

Nice to have:

  1. Investigate and document the 3 pre-existing test failures
  2. Add exponential backoff to server polling (instead of fixed 0.1s sleep)

Estimated effort to address: 2-3 hours for comprehensive test coverage and documentation.

Great work on improving the reliability of this gem! 🚀


Job Run | Branch

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1672356 and 5a1867d.

📒 Files selected for processing (2)
  • lib/cypress_on_rails/configuration.rb (2 hunks)
  • lib/cypress_on_rails/server.rb (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lib/cypress_on_rails/server.rb (1)
lib/cypress_on_rails/configuration.rb (1)
  • configuration (80-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: claude-review
  • GitHub Check: rails_7_2
  • GitHub Check: rails_6_1
  • GitHub Check: rails_8
🔇 Additional comments (10)
lib/cypress_on_rails/configuration.rb (2)

25-26: LGTM!

The new configuration attributes properly expose readiness check parameters. The naming is clear and follows the existing convention.


67-68: LGTM!

Environment variable initialization with sensible defaults is appropriate. The root path '/' and 5-second timeout are reasonable starting points.

lib/cypress_on_rails/server.rb (8)

4-4: LGTM!

Adding net/http is necessary for the HTTP-based readiness checks.


20-22: LGTM!

Process tracking variables are properly initialized with clear documentation of their purpose.


112-121: LGTM!

Process group creation and error handling for the race condition (process dying before getpgid) is properly implemented. The fallback to nil PGID is safe since send_term_signal handles this case.


124-133: LGTM!

The polling loop with a 30-second overall timeout properly uses the new HTTP-based readiness check. The 0.1-second sleep interval provides quick feedback without excessive CPU usage.


156-164: LGTM!

Server termination logic is correct with proper guards and error handling. The early return prevents attempting to stop a non-existent server.


166-175: LGTM!

Process termination with process group support and fallback logic is well-implemented. The warning log on line 173 meets the PR objective of making termination issues visible for debugging. The rescue block correctly catches errors from the process group kill and falls back to single-process termination.


177-183: LGTM!

Using Process.kill(0, pid) to check process existence is the standard approach. The error handling is appropriate for the use case, where both "process doesn't exist" (ESRCH) and "no permission" (EPERM) can be treated as "cannot proceed with kill operation."


185-189: LGTM!

The safe process kill helper properly guards against nil PIDs and silently handles expected errors (process already terminated or permission denied), which is appropriate for cleanup operations.

justin808 and others added 2 commits October 5, 2025 20:18
- Remove timeout cap that prevented honoring user-specified timeouts
- Previously capped at 2 seconds with [config.server_readiness_timeout || 5, 2].min
- Now respects full configured timeout value (default: 5 seconds)
- Remove redundant uri.path.empty? check since readiness_path defaults to '/'

This allows users to configure longer timeouts for slow environments while
maintaining clean, simple code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add 10-second timeout to Process.wait to prevent indefinite hangs
- Send KILL signal if server doesn't respond to TERM within timeout
- Prevents test runs from hanging if server doesn't shut down gracefully
- Logs warning when escalating from TERM to KILL

This ensures robust cleanup even when servers are unresponsive to
graceful shutdown signals.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

claude bot commented Oct 6, 2025

Code Review for PR 199: Improve server process cleanup and readiness checks

Overall Assessment: Approved with minor suggestions

This PR addresses important reliability issues in server lifecycle management. The changes are well-motivated and align with issue 198.

Code Quality & Best Practices

Strengths:

  1. Well-structured process management with @server_pid and @server_pgid instance variables
  2. Defensive error handling with multiple rescue blocks for edge cases
  3. Clear comments explaining non-obvious behavior (lines 116-118)
  4. Graceful degradation from process group kill to single process kill

Suggestions:

  1. Configuration naming consistency (configuration.rb:67-68)
    The CYPRESS_RAILS_ prefix is used, but the gem now supports both Cypress and Playwright. Consider adding deprecation warnings or documenting that these vars apply to both frameworks.

  2. Timeout semantics (server.rb:138)
    The server_readiness_timeout (5s) applies to EACH HTTP request during polling (every 0.1s). With a 30s outer timeout, this could be confusing. Consider using 1-2 seconds for individual requests, or document this relationship clearly.

Potential Bugs & Issues

Minor Issues:

  1. Race condition in spawn_server (server.rb:113-120)
    Process.getpgid could raise Errno::EPERM in addition to ESRCH. Consider catching both.

  2. send_term_signal logic (server.rb:163-172)
    The process_exists? check before killing the process group adds little value since you rescue and fall back anyway. Consider simplifying.

Performance Considerations

Good:

  • HTTP-based readiness checks are more accurate with negligible performance impact
  • Process group management is efficient

Optimization opportunity: Make the 0.1s polling interval configurable.

Security Concerns

No significant security issues found:

  • Proper use of TERM signal for graceful shutdown
  • No shell injection risks (direct process spawning)
  • Localhost binding by default

Minor suggestion: Validate server_readiness_path starts with / to prevent malformed URIs.

Test Coverage

Missing tests - Recommendations:

  1. Add unit tests for new configuration options (server_readiness_path, server_readiness_timeout)
  2. Add tests for server_responding? with mocked HTTP responses (200, 301, 404, 500)
  3. Add tests for process management edge cases (immediate exit, EPERM, missing PID)
  4. Add integration test to verify process group is set correctly

Additional Notes

Documentation needed:

  1. README update for CYPRESS_RAILS_READINESS_PATH and CYPRESS_RAILS_READINESS_TIMEOUT
  2. Migration guide for custom health check endpoints
  3. Changelog entry

Logging consistency: Lines 118 and 170 use the logger, but other lines use puts. Consider using logger throughout for better production control.

Summary

This is a solid improvement addressing real production issues. Main recommendations:

  1. Must-have: Add unit tests for configuration options
  2. Should-have: Clarify timeout semantics
  3. Nice-to-have: Replace puts with logger for consistency

Great work!

Copy link

claude bot commented Oct 6, 2025

Code Review - PR #199: Improve server process cleanup and readiness checks

Overall Assessment

This is a well-executed PR that meaningfully improves the reliability and debuggability of server process management. The changes directly address the issues described in #198 with thoughtful error handling and configurability.


✅ Strengths

1. Robust Error Handling

  • Comprehensive exception handling in server_responding? (lib/cypress_on_rails/server.rb:148-150) covers all network-related edge cases
  • Graceful handling of race conditions (e.g., process terminating before getpgid on line 115)
  • Proper timeout handling with KILL fallback (lines 160-167) prevents indefinite hangs

2. Improved Observability

3. Better Health Checks

  • HTTP-based readiness validation (lines 135-151) is more accurate than TCP socket checks
  • Correctly rejects 404 and 5xx responses while accepting 200-399 range
  • Comment on line 146 clearly explains why 3xx redirects are considered ready

4. Configurability

  • server_readiness_path and server_readiness_timeout (configuration.rb:25-26, 67-68) allow customization for different environments
  • Environment variable support (CYPRESS_RAILS_READINESS_PATH, CYPRESS_RAILS_READINESS_TIMEOUT) aids CI/CD integration

5. Process Group Management

  • Proper use of pgroup: true in spawn (line 112) ensures child processes are cleaned up
  • Fallback from process group to single process kill (lines 172-181) handles permission edge cases

🔍 Areas for Improvement

1. Missing Test Coverage ⚠️ High Priority

The PR lacks tests for the new functionality. Critical scenarios that need coverage:

  • server_responding? with various HTTP status codes (200, 301, 404, 500)
  • Process group kill with EPERM errors
  • Timeout behavior in stop_server
  • Race condition in Process.getpgid
  • Configuration of server_readiness_path and server_readiness_timeout

Recommendation: Add specs to spec/cypress_on_rails/configuration_spec.rb for new config options and create spec/cypress_on_rails/server_spec.rb for process management logic.

2. Potential Timeout Issue in server_responding?

The timeout parameter (line 138) is used for both open_timeout and read_timeout (line 141), which could cause issues in the polling loop (line 127):

  • If server_readiness_timeout is 5 seconds, a single HTTP request could block for up to 5 seconds
  • The polling loop expects 0.1s checks, but a single attempt could consume most of the 30s wait_for_server timeout
  • This might make the server appear slower to start than it actually is

Recommendation: Consider capping individual HTTP request timeouts to maintain polling responsiveness while still respecting user configuration for slow environments.

3. Logger vs puts Inconsistency

The code mixes logger.warn (lines 118, 164, 179) with puts (lines 110, 156). Consider standardizing on one approach for consistency.

4. Documentation Gap

The new configuration options lack inline documentation. Consider adding comments above the attr_accessors in configuration.rb.


🔒 Security Considerations

✅ No security concerns identified

  • Process signaling is properly scoped to child processes
  • No injection vulnerabilities in URI construction (paths are from config)
  • Timeout values prevent resource exhaustion

⚡ Performance Considerations


📝 Suggested Next Steps

  1. Add test coverage for new functionality (high priority)
  2. Consider timeout capping in server_responding? to maintain polling responsiveness
  3. Add inline documentation for new configuration options
  4. Verify manual testing in production-like environment

Summary

This PR makes meaningful improvements to reliability and debuggability. The code quality is good with thoughtful error handling. The main gap is test coverage - I recommend adding tests before merging to prevent regressions and document expected behavior.

Recommendation: ✅ Approve with minor changes (add tests, consider timeout adjustment)

Great work addressing issue #198!

- Add inline documentation for server_readiness_path and server_readiness_timeout
- Document default values and environment variable names
- Add test coverage for default values
- Add test coverage for custom configuration

This improves discoverability and ensures configuration options are properly tested.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link

claude bot commented Oct 6, 2025

Code Review - PR #199

Thank you for this improvement! This PR addresses important reliability issues with server lifecycle management.

Strengths

  1. Well-documented changes with clear benefits
  2. Backward compatible with sensible defaults
  3. Good error handling for edge cases (EPERM, ESRCH)
  4. HTTP-based readiness checks are more accurate than TCP socket checks

Code Quality

Configuration (lib/cypress_on_rails/configuration.rb:25-30)

  • Good use of environment variables
  • Clear inline documentation
  • Sensible defaults

Server Process Management (lib/cypress_on_rails/server.rb:112-122)

  • Process group creation with pgroup: true is correct
  • Good edge case handling for immediate process termination
  • Minor: Instance variables @server_pid and @server_pgid are set in spawn_server but PID is also stored in local variable server_pid in start_server:74, creating two sources of truth

HTTP Readiness Check (lib/cypress_on_rails/server.rb:135-151)

  • Accepting 200-399 status codes is appropriate
  • Good exception handling for network errors
  • Potential issue: Method does not handle query parameters. If readiness_path contains query params, they might not be preserved when using uri.path

Potential Issues

1. Query Parameters in Readiness Path (server.rb:142)
Current code uses uri.path which drops query parameters.
Recommendation: Use uri.request_uri to preserve query parameters

2. Dual PID Storage (server.rb:74 vs 112)
Local variable server_pid used for cleanup while @server_pid is also set. If exception occurs between lines 74-96, cleanup might fail.
Recommendation: Use @server_pid consistently by calling stop_server(@server_pid)

3. Timeout Handling in stop_server (server.rb:159-167)
Code waits 10 seconds for TERM, then sends KILL. However, if KILL does not work, Process.wait could hang indefinitely.
Recommendation: Add timeout to KILL wait as well (5 seconds recommended)

Security

Low Risk - No significant security issues identified. Process management handled defensively.

Performance

  • HTTP Readiness Timeout: Default 5-second timeout is reasonable
  • Sleep Interval: 0.1 seconds (10 checks/sec) is appropriate

Test Coverage

Existing: Configuration specs properly test new attributes
Missing (recommended):

  • No tests for Server class itself
  • No tests for process group cleanup behavior
  • No tests for HTTP readiness check logic
  • No tests for timeout scenarios

Consider adding integration tests for server lifecycle, process cleanup scenarios, and HTTP status code handling.

Recommendations Summary

High Priority:

  1. Fix query parameter handling (use uri.request_uri)
  2. Consolidate PID storage to use instance variable

Medium Priority:
3. Add timeout to KILL signal wait
4. Consider adding integration tests for server lifecycle

Overall Assessment

This is a solid improvement addressing real reliability issues. Code quality is good with proper error handling and sensible defaults. Main suggestions are minor defensive programming improvements.

Recommendation: Approve with minor suggestions

PR is ready to merge after addressing the query parameter issue and PID storage consolidation (both quick fixes).

Great work improving the test infrastructure!

@justin808 justin808 merged commit abb505c into master Oct 6, 2025
5 checks passed
@justin808 justin808 deleted the justin808/improve-server-cleanup branch October 6, 2025 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: Improve server process cleanup and readiness checks
1 participant