-
Notifications
You must be signed in to change notification settings - Fork 760
feat: robust app deploy retry #536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughIntroduces a configurable deploy timeout across CLI and API layers, adds post-deploy status polling with a baseline snapshot, augments retry logic to infer success on network errors, and enables console output routing via an optional console override in UX utilities. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI Deploy Command
participant UX as UX Console
participant API as MCP App API Client
participant Svc as MCP Service
User->>CLI: invoke deploy_config(..., deploy_timeout)
CLI->>UX: print_info("Starting deploy", console_override)
CLI->>API: deploy_app(app_id, metadata, timeout=deploy_timeout)
alt Network/Timeout error
CLI->>Svc: GET app status (poll x3 @2s)
alt Status indicates ONLINE (updatedAt/server changed)
CLI->>UX: print_success("Deployment inferred successful", console_override)
CLI-->>User: exit success
else Status inconclusive
CLI->>UX: print_warning("Status check failed", console_override)
CLI->>API: retry deploy_app(..., timeout)
end
else Deploy call returns
CLI->>Svc: GET app status (compare to baseline)
alt ONLINE and changed
CLI->>UX: print_success("Deployment successful", console_override)
CLI-->>User: exit success
else Not updated/online
CLI->>UX: print_error("Deployment not online", console_override)
CLI-->>User: exit failure
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/mcp_agent/cli/utils/ux.py (1)
40-46
: Document the new console override parameter.Line 45’s docstring still lists only the original arguments, so callers won’t discover the new
console_override
hook. Please add it to the documented parameters for consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mcp_agent/cli/cloud/commands/deploy/main.py
(10 hunks)src/mcp_agent/cli/mcp_app/api_client.py
(2 hunks)src/mcp_agent/cli/utils/ux.py
(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/mcp_agent/cli/cloud/commands/deploy/main.py (4)
src/mcp_agent/cli/utils/ux.py (3)
print_warning
(70-83)print_success
(54-67)print_info
(32-51)src/mcp_agent/cli/mcp_app/api_client.py (2)
get_app
(166-203)deploy_app
(310-347)src/mcp_agent/cli/mcp_app/mock_client.py (1)
get_app
(47-84)src/mcp_agent/cli/exceptions.py (1)
CLIError
(4-10)
deploy_task, | ||
description=f"✅ MCP App deployed successfully{attempt_suffix}!", | ||
) | ||
return app | ||
except (httpx.ReadTimeout, httpx.RemoteProtocolError, httpx.ConnectError) as e: | ||
progress.update( | ||
deploy_task, description=f"⚠️ Connection lost{attempt_suffix}" | ||
) | ||
error_type = type(e).__name__ | ||
print_warning( | ||
f"Lost connection to deployment API ({error_type}). " | ||
f"The deployment may still be in progress on the server.", | ||
console_override=log_console, | ||
) | ||
|
||
# Check deployment status 3 times with 2-second intervals | ||
print_info( | ||
"Checking deployment status (3 attempts with 2s intervals)...", | ||
console_override=log_console, | ||
) | ||
for check_attempt in range(1, 4): | ||
print_info( | ||
f"Status check {check_attempt}/3...", | ||
console_override=log_console, | ||
) | ||
status_app = await _check_deployment_status( | ||
log_console=log_console | ||
) | ||
if status_app: | ||
return status_app | ||
|
||
if check_attempt < 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update progress output after status-check success.
When the status probe detects a successful deploy after a connection loss (Line 549), we return immediately with the spinner still showing “
status_app = await _check_deployment_status(
log_console=log_console
)
if status_app:
+ progress.update(
+ deploy_task,
+ description=f"✅ MCP App deployed successfully{attempt_suffix}!",
+ )
return status_app
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
deploy_task, | |
description=f"✅ MCP App deployed successfully{attempt_suffix}!", | |
) | |
return app | |
except (httpx.ReadTimeout, httpx.RemoteProtocolError, httpx.ConnectError) as e: | |
progress.update( | |
deploy_task, description=f"⚠️ Connection lost{attempt_suffix}" | |
) | |
error_type = type(e).__name__ | |
print_warning( | |
f"Lost connection to deployment API ({error_type}). " | |
f"The deployment may still be in progress on the server.", | |
console_override=log_console, | |
) | |
# Check deployment status 3 times with 2-second intervals | |
print_info( | |
"Checking deployment status (3 attempts with 2s intervals)...", | |
console_override=log_console, | |
) | |
for check_attempt in range(1, 4): | |
print_info( | |
f"Status check {check_attempt}/3...", | |
console_override=log_console, | |
) | |
status_app = await _check_deployment_status( | |
log_console=log_console | |
) | |
if status_app: | |
return status_app | |
if check_attempt < 3: | |
for check_attempt in range(1, 4): | |
print_info( | |
f"Status check {check_attempt}/3...", | |
console_override=log_console, | |
) | |
status_app = await _check_deployment_status( | |
log_console=log_console | |
) | |
if status_app: | |
progress.update( | |
deploy_task, | |
description=f"✅ MCP App deployed successfully{attempt_suffix}!", | |
) | |
return status_app | |
if check_attempt < 3: | |
... |
🤖 Prompt for AI Agents
In src/mcp_agent/cli/cloud/commands/deploy/main.py around lines 523-554 the code
returns status_app from the status-check loop after a connection loss without
updating the progress task description, leaving the spinner showing "⚠️
Connection lost…"; before returning when status_app is truthy, call
progress.update(deploy_task, description=f"✅ MCP App deployed
successfully{attempt_suffix}!") (and optionally refresh the progress display) so
the CLI shows the successful deployment message prior to returning.
TL;DR
Added robust error handling for deployment timeouts and connection issues, with automatic status verification to detect successful deployments despite connection failures.
What changed?
--deploy-timeout
parameter (30-1200 seconds, default 300) to customize HTTP timeout for deployment API callshttpx.ReadTimeout
,httpx.RemoteProtocolError
, andhttpx.ConnectError
deploy_app
method of the API clientHow to test?
Why make this change?
Deployment operations can sometimes take longer than the default timeout, especially for larger applications. When timeouts or connection issues occur, it's difficult to determine if the deployment actually succeeded or failed. This change improves reliability by:
These improvements reduce deployment failures and eliminate the need for users to manually check if their deployment succeeded when timeouts occur.
Summary by CodeRabbit