-
-
Notifications
You must be signed in to change notification settings - Fork 62
Improve server process cleanup and readiness checks #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add process group management for better child process cleanup - Add warning logging when process group kill fails with EPERM - Improve server readiness check to accept only 200-399 HTTP status codes - Reject 404 and 5xx responses as "not ready" for more reliable health checks These improvements provide better debugging visibility and reduce flaky tests by ensuring the server is truly ready before running tests. Fixes #198 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Warning Rate limit exceeded@justin808 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 15 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
WalkthroughAdds HTTP-based readiness checks and process-group-aware server spawning/termination to lib/cypress_on_rails/server.rb; records PID/PGID, uses Net::HTTP for health checks (accepting 200–399), adds safe_kill and send_term_signal fallbacks, and exposes readiness config via configuration.rb. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester as Tester
participant Server as CypressOnRails::Server
participant OS as OS/Processes
participant App as App Server (HTTP)
Tester->>Server: start_server()
Server->>OS: spawn process (record pid & pgid)
OS-->>Server: pid, pgid
loop wait_for_server (timeout)
Server->>App: HTTP GET base_url + readiness_path
App-->>Server: 2xx/3xx or other/error
alt Ready (200–399)
Server-->>Tester: ready
else Not ready
Server-->>Server: sleep/retry
end
end
sequenceDiagram
autonumber
actor Tester as Tester
participant Server as CypressOnRails::Server
participant OS as OS/Signals
Tester->>Server: stop_server()
alt server_pgid present
Server->>OS: TERM -pgid
alt ESRCH/EPERM
Server->>OS: log warning
Server->>OS: TERM pid (safe_kill_process)
end
else only pid
Server->>OS: TERM pid (safe_kill_process)
end
Server-->>Tester: stopped/cleaned
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request Review: Improve server process cleanup and readiness checksOverviewThis PR addresses process management and server readiness detection issues identified in #198. The changes improve debugging capabilities and test reliability. StrengthsCode Quality
Improved Reliability
Issues and Suggestions1. Potential Race Condition in spawn_server (lib/cypress_on_rails/server.rb:109-111)Issue: There is a small window where the process could terminate before Process.getpgid is called, which would raise Errno::ESRCH. Suggestion: Add error handling around Process.getpgid call. 2. Incomplete Exception Handling in server_responding? (lib/cypress_on_rails/server.rb:125-132)Issue: The method does not handle all possible HTTP exceptions. Missing cases include Errno::ETIMEDOUT, Net::OpenTimeout, Net::ReadTimeout, and Net::HTTPBadResponse. Suggestion: Add these exception types to the rescue clause. 3. Missing Instance Variable InitializationIssue: @server_pgid is set in spawn_server but never initialized in initialize. Suggestion: Initialize @server_pid and @server_pgid to nil in the initialize method. 4. Inconsistent Error Recovery in send_term_signal (lib/cypress_on_rails/server.rb:144-153)Issue: When Process.kill fails with EPERM, the fallback only kills the single process. Any child processes will be orphaned. Severity: Medium - Could lead to resource leaks in CI environments Suggestion: Document this limitation with a comment explaining the orphaned process concern. 5. Missing Timeout in server_responding?Issue: Net::HTTP.get_response has a default timeout of 60 seconds. Slow server starts could block the health check loop. Suggestion: Add explicit short timeouts (2 seconds recommended) for both open_timeout and read_timeout. Test CoverageMissing: No tests were added for the new functionality. Recommended test cases:
Security ConsiderationsNo security concerns identified. The changes are defensive and improve reliability. Performance ConsiderationsPositive: HTTP-based health check validates application layer, not just socket layer. Minor: Each check makes a full HTTP request vs TCP connection (adds approximately 1-5ms per check). SummaryThis is a solid improvement to the gem reliability. Main recommendations:
The core logic is sound and addresses real production issues. With the exception handling improvements, this is ready to merge. Recommendation: Approve with minor revisions Review generated with assistance from Claude Code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
lib/cypress_on_rails/server.rb (1)
134-142
: Consider adding timeout to process wait.The
Process.wait(pid)
on line 139 will block indefinitely if the server process ignores the TERM signal. While this might match the previous behavior, adding a timeout with a fallback to KILL would make the shutdown more robust.def stop_server(pid) return unless pid puts "Stopping Rails server (PID: #{pid})" send_term_signal - Process.wait(pid) + + # Wait up to 10 seconds for graceful shutdown + begin + Timeout.timeout(10) do + Process.wait(pid) + end + rescue Timeout::Error + puts "Warning: Server did not terminate after TERM signal, sending KILL" + safe_kill_process('KILL', pid) + Process.wait(pid) rescue Errno::ESRCH + end rescue Errno::ESRCH # Process already terminated end
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/cypress_on_rails/server.rb
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: claude-review
- GitHub Check: rails_8
- GitHub Check: rails_7_2
- GitHub Check: rails_6_1
🔇 Additional comments (4)
lib/cypress_on_rails/server.rb (4)
4-4
: LGTM!Adding
Net::HTTP
from the standard library enables the HTTP-based readiness check without introducing external dependencies.
117-118
: LGTM!The HTTP-based readiness check correctly replaces the previous TCP socket approach, with a reasonable 0.1-second polling interval. The timeout is properly handled by the outer
Timeout.timeout
block.
144-153
: LGTM with one observation!The process group termination logic correctly:
- Uses negative PGID (
-@server_pgid
) to signal the entire process group.- Falls back to single-process kill when PGID is unavailable.
- Logs warnings when process group termination fails (per PR objectives).
Note: The
Errno::EPERM
fallback to single-process kill (line 152) may not succeed if you lack permission to kill the process group, but the primary goal is logging for debugging, which is achieved.
155-159
: LGTM!The implementation correctly guards against nil PIDs and gracefully handles the case where the process has already terminated.
- Initialize @server_pid and @server_pgid in constructor to avoid nil issues - Add error handling for Process.getpgid race condition when process terminates immediately - Add comprehensive exception handling in server_responding? (ETIMEDOUT, Net::OpenTimeout, Net::ReadTimeout, Net::HTTPBadResponse) - Add configurable server_readiness_path with ENV var support (CYPRESS_RAILS_READINESS_PATH) - Add explicit HTTP timeouts (1s open_timeout, 1s read_timeout) to prevent hangs - Improve URI handling to support custom readiness check paths These fixes address potential edge cases and race conditions in server lifecycle management. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Code Review - PR #199: Improve server process cleanup and readiness checksOverviewThis PR addresses important reliability issues in server lifecycle management. The changes are well-motivated by issue #198 and improve both observability and correctness of the server startup/shutdown process. ✅ Strengths
🔍 Code Quality & Best PracticesGood:
Suggestions:
🐛 Potential IssuesCritical:
Medium Priority:
⚡ Performance Considerations
🔒 Security Concerns
🧪 Test CoverageMissing tests for:
Recommendation: Add a spec/cypress_on_rails/server_spec.rb file with unit tests 📋 RecommendationsMust Address:
Should Consider:
Nice to Have:
🎯 VerdictApproval Status: ✅ Approve with minor changes This is a solid improvement that addresses real reliability issues. The code is well-structured and the error handling is thoughtful. The missing test coverage is the main concern - these changes touch critical infrastructure code and should have corresponding tests to prevent regressions. The suggestions above are mostly about polish and defensive programming. The core logic is sound and ready to merge once tests are added. Estimated effort to address: 2-3 hours for comprehensive test coverage Great work on this PR! 🚀 |
Code quality improvements: - Add documentation comment for process management instance variables - Replace puts with logger.warn for consistency with gem conventions - Simplify URI path handling (remove redundant empty check) - Add process_exists? helper to validate PID before kill operations Error handling improvements: - Add EPERM to safe_kill_process rescue clause for complete error coverage - Validate process existence before attempting process group kill to avoid stale PGID issues - Prevent attempts to kill recycled PIDs Configuration improvements: - Add configurable server_readiness_timeout (default: 5 seconds) - Support CYPRESS_RAILS_READINESS_TIMEOUT environment variable - Addresses potential timeout issues on slow CI systems 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
PR Review: Improve server process cleanup and readiness checksOverall AssessmentThis is a solid improvement to server lifecycle management. The changes enhance reliability and debugging capabilities. A few issues need attention before merging. Critical Issues1. Incorrect PID variable in stop_server (lib/cypress_on_rails/server.rb:150-158)The stop_server(pid) method receives a parameter pid but uses instance variable @server_pid in send_term_signal and process_exists?. Problem: The method is called with server_pid (local variable from line 74), but send_term_signal references @server_pid. If these differ, you will kill the wrong process or fail silently. Fix: Either use the parameter throughout, or pass the parameter to send_term_signal 2. Race condition in spawn_server (lib/cypress_on_rails/server.rb:112-119)Between spawning and calling Process.getpgid, the process might terminate, causing ESRCH. You catch this but set @server_pgid = nil, which might cause issues later. Recommendation: Consider adding a small delay or retry logic, and document that @server_pgid may be nil in edge cases. Security & Best Practices3. HTTP timeout configuration (lib/cypress_on_rails/server.rb:133-148)The server_readiness_timeout (default 5s) is used as both open_timeout and read_timeout in Net::HTTP.start. During the wait_for_server loop (which runs with 0.1s sleep), if a single HTTP request takes 5s to timeout, the polling is ineffective. Recommendation: Use shorter timeouts for readiness checks (e.g., 1-2 seconds) since they are called repeatedly. 4. HTTP response code validation (lib/cypress_on_rails/server.rb:144)3xx redirects might indicate the app is not fully ready. Consider being more strict and only accepting 200-299, or documenting why 3xx is acceptable. Code Quality5. Missing URI path handling (lib/cypress_on_rails/server.rb:140)If uri.path is empty, this sends GET empty string instead of GET slash. 6. Missing test coverageNo tests for server_responding? method, send_term_signal, process group killing, or edge cases like @server_pgid = nil. Add unit tests for these critical paths. Positive Observations
Recommendations Before Merging
Great work improving reliability! |
Critical fixes: 1. Fix PID parameter passing in stop_server/send_term_signal - Previously stop_server(pid) received parameter but used instance variable - Now consistently passes pid parameter through the call chain - Prevents potential bugs if pid values diverge 2. Fix HTTP timeout in readiness polling loop - Cap individual check timeout at 2 seconds max - Prevents 5s timeouts from blocking the 0.1s polling loop - Maintains responsiveness while supporting slow CI via configuration 3. Fix URI path handling edge case - Ensure GET path defaults to '/' if uri.path is empty - Prevents invalid HTTP requests with empty path Documentation improvements: - Document that @server_pgid may be nil in edge cases - Clarify that 3xx redirects are considered valid "ready" state - Add comments explaining timeout behavior in polling context 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
@claude review this pr. |
Claude finished @justin808's task —— View job PR Review: Improve server process cleanup and readiness checksOverall AssessmentThis PR makes valuable improvements to server lifecycle management with better process cleanup and more reliable readiness checks. The implementation is generally solid, but there are several issues to address before merging. ✅ Strengths
🔴 Critical Issues1. Race condition in PGID capture (lib/cypress_on_rails/server.rb:113-120)Issue: Good catch on handling Impact: Low - The fallback logic handles this, but it could be clearer. Recommendation: The current implementation is actually correct. The error handling is appropriate and the warning is helpful. 2. Timeout configuration could be confusing (lib/cypress_on_rails/server.rb:140)Issue: The line Impact: Medium - Users may expect the env var to control per-request timeout. Recommendation: Either:
3. 3xx redirect handling may be too lenient (lib/cypress_on_rails/server.rb:150)Issue: Accepting 3xx redirects as "ready" assumes the redirect target is valid. A misconfigured app might redirect indefinitely. Impact: Low - Most apps won't have this issue, but worth documenting. Recommendation: Add a comment explaining why 3xx is acceptable (current comment is good) or consider being stricter with 200-299 only. 🟡 Medium Priority Issues4. Missing validation in send_term_signal (lib/cypress_on_rails/server.rb:167-168)Issue: The method checks Impact: Low - Process groups are less likely to be recycled, but theoretically possible. Recommendation: This is acceptable as-is. The error handling catches any issues. 5. Stop server might wait indefinitely (lib/cypress_on_rails/server.rb:161)Issue: Impact: Medium - Could hang test runs if server doesn't shut down gracefully. Recommendation: Consider adding a timeout with fallback to KILL: def stop_server(pid)
return unless pid
puts "Stopping Rails server (PID: #{pid})"
send_term_signal(pid)
begin
Timeout.timeout(10) do
Process.wait(pid)
end
rescue Timeout::Error
puts "Warning: Server did not terminate after TERM signal, sending KILL"
safe_kill_process('KILL', pid)
Process.wait(pid) rescue Errno::ESRCH
end
rescue Errno::ESRCH
# Process already terminated
end 🟢 Code Quality ObservationsGood Practices
Minor Improvements
🔒 Security & SafetyNo security concerns identified. The changes are defensive and improve reliability:
🧪 Test CoverageCritical Gap: No tests were added for the new functionality. Recommended test cases:
Note: The PR description mentions "3 pre-existing failures unrelated to changes" - these should be investigated or documented. 📋 Additional RecommendationsDocumentation
Code CommentsThe code comments are good, but consider adding:
🎯 VerdictRecommendation: Approve with minor changes This is a solid improvement that addresses real reliability issues. The core logic is sound and the error handling is thoughtful. Must address before merge:
Should consider:
Nice to have:
Estimated effort to address: 2-3 hours for comprehensive test coverage and documentation. Great work on improving the reliability of this gem! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/cypress_on_rails/configuration.rb
(2 hunks)lib/cypress_on_rails/server.rb
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lib/cypress_on_rails/server.rb (1)
lib/cypress_on_rails/configuration.rb (1)
configuration
(80-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: claude-review
- GitHub Check: rails_7_2
- GitHub Check: rails_6_1
- GitHub Check: rails_8
🔇 Additional comments (10)
lib/cypress_on_rails/configuration.rb (2)
25-26
: LGTM!The new configuration attributes properly expose readiness check parameters. The naming is clear and follows the existing convention.
67-68
: LGTM!Environment variable initialization with sensible defaults is appropriate. The root path '/' and 5-second timeout are reasonable starting points.
lib/cypress_on_rails/server.rb (8)
4-4
: LGTM!Adding
net/http
is necessary for the HTTP-based readiness checks.
20-22
: LGTM!Process tracking variables are properly initialized with clear documentation of their purpose.
112-121
: LGTM!Process group creation and error handling for the race condition (process dying before getpgid) is properly implemented. The fallback to nil PGID is safe since
send_term_signal
handles this case.
124-133
: LGTM!The polling loop with a 30-second overall timeout properly uses the new HTTP-based readiness check. The 0.1-second sleep interval provides quick feedback without excessive CPU usage.
156-164
: LGTM!Server termination logic is correct with proper guards and error handling. The early return prevents attempting to stop a non-existent server.
166-175
: LGTM!Process termination with process group support and fallback logic is well-implemented. The warning log on line 173 meets the PR objective of making termination issues visible for debugging. The rescue block correctly catches errors from the process group kill and falls back to single-process termination.
177-183
: LGTM!Using
Process.kill(0, pid)
to check process existence is the standard approach. The error handling is appropriate for the use case, where both "process doesn't exist" (ESRCH) and "no permission" (EPERM) can be treated as "cannot proceed with kill operation."
185-189
: LGTM!The safe process kill helper properly guards against nil PIDs and silently handles expected errors (process already terminated or permission denied), which is appropriate for cleanup operations.
- Remove timeout cap that prevented honoring user-specified timeouts - Previously capped at 2 seconds with [config.server_readiness_timeout || 5, 2].min - Now respects full configured timeout value (default: 5 seconds) - Remove redundant uri.path.empty? check since readiness_path defaults to '/' This allows users to configure longer timeouts for slow environments while maintaining clean, simple code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add 10-second timeout to Process.wait to prevent indefinite hangs - Send KILL signal if server doesn't respond to TERM within timeout - Prevents test runs from hanging if server doesn't shut down gracefully - Logs warning when escalating from TERM to KILL This ensures robust cleanup even when servers are unresponsive to graceful shutdown signals. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Code Review for PR 199: Improve server process cleanup and readiness checks Overall Assessment: Approved with minor suggestions This PR addresses important reliability issues in server lifecycle management. The changes are well-motivated and align with issue 198. Code Quality & Best Practices Strengths:
Suggestions:
Potential Bugs & Issues Minor Issues:
Performance Considerations Good:
Optimization opportunity: Make the 0.1s polling interval configurable. Security Concerns No significant security issues found:
Minor suggestion: Validate server_readiness_path starts with / to prevent malformed URIs. Test Coverage Missing tests - Recommendations:
Additional Notes Documentation needed:
Logging consistency: Lines 118 and 170 use the logger, but other lines use puts. Consider using logger throughout for better production control. Summary This is a solid improvement addressing real production issues. Main recommendations:
Great work! |
Code Review - PR #199: Improve server process cleanup and readiness checksOverall AssessmentThis is a well-executed PR that meaningfully improves the reliability and debuggability of server process management. The changes directly address the issues described in #198 with thoughtful error handling and configurability. ✅ Strengths1. Robust Error Handling
2. Improved Observability
3. Better Health Checks
4. Configurability
5. Process Group Management
🔍 Areas for Improvement1. Missing Test Coverage The PR lacks tests for the new functionality. Critical scenarios that need coverage:
Recommendation: Add specs to spec/cypress_on_rails/configuration_spec.rb for new config options and create spec/cypress_on_rails/server_spec.rb for process management logic. 2. Potential Timeout Issue in server_responding? The timeout parameter (line 138) is used for both open_timeout and read_timeout (line 141), which could cause issues in the polling loop (line 127):
Recommendation: Consider capping individual HTTP request timeouts to maintain polling responsiveness while still respecting user configuration for slow environments. 3. Logger vs puts Inconsistency The code mixes logger.warn (lines 118, 164, 179) with puts (lines 110, 156). Consider standardizing on one approach for consistency. 4. Documentation Gap The new configuration options lack inline documentation. Consider adding comments above the attr_accessors in configuration.rb. 🔒 Security Considerations✅ No security concerns identified
⚡ Performance Considerations
📝 Suggested Next Steps
SummaryThis PR makes meaningful improvements to reliability and debuggability. The code quality is good with thoughtful error handling. The main gap is test coverage - I recommend adding tests before merging to prevent regressions and document expected behavior. Recommendation: ✅ Approve with minor changes (add tests, consider timeout adjustment) Great work addressing issue #198! |
- Add inline documentation for server_readiness_path and server_readiness_timeout - Document default values and environment variable names - Add test coverage for default values - Add test coverage for custom configuration This improves discoverability and ensures configuration options are properly tested. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Code Review - PR #199Thank you for this improvement! This PR addresses important reliability issues with server lifecycle management. Strengths
Code QualityConfiguration (lib/cypress_on_rails/configuration.rb:25-30)
Server Process Management (lib/cypress_on_rails/server.rb:112-122)
HTTP Readiness Check (lib/cypress_on_rails/server.rb:135-151)
Potential Issues1. Query Parameters in Readiness Path (server.rb:142) 2. Dual PID Storage (server.rb:74 vs 112) 3. Timeout Handling in stop_server (server.rb:159-167) SecurityLow Risk - No significant security issues identified. Process management handled defensively. Performance
Test CoverageExisting: Configuration specs properly test new attributes
Consider adding integration tests for server lifecycle, process cleanup scenarios, and HTTP status code handling. Recommendations SummaryHigh Priority:
Medium Priority: Overall AssessmentThis is a solid improvement addressing real reliability issues. Code quality is good with proper error handling and sensible defaults. Main suggestions are minor defensive programming improvements. Recommendation: Approve with minor suggestions PR is ready to merge after addressing the query parameter issue and PID storage consolidation (both quick fixes). Great work improving the test infrastructure! |
Summary
Benefits
Changes
Process Group Kill with Error Logging (lib/cypress_on_rails/server.rb:144-153)
HTTP-Based Server Readiness Check (lib/cypress_on_rails/server.rb:125-132)
Test Plan
Fixes #198
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Chores