Skip to content

Conversation

alienzach
Copy link
Contributor

@alienzach alienzach commented Oct 1, 2025

TL;DR

Added robust error handling for deployment timeouts and connection issues, with automatic status verification to detect successful deployments despite connection failures.

What changed?

Screenshot 2025-10-01 at 5 37 47 PM
  • Added a new --deploy-timeout parameter (30-1200 seconds, default 300) to customize HTTP timeout for deployment API calls
  • Implemented connection error handling for httpx.ReadTimeout, httpx.RemoteProtocolError, and httpx.ConnectError
  • Added a deployment status verification mechanism that checks if a deployment succeeded even when the connection was lost
  • The system now captures a baseline app snapshot before deployment to detect changes
  • Added automatic retry logic with 3 status checks at 2-second intervals when connection issues occur
  • Made the timeout configurable in the deploy_app method of the API client

How to test?

  1. Deploy an app with the default timeout:
  2. Deploy with a custom timeout:
  3. Test error recovery by deploying a large app that might trigger timeouts or connection issues

Why make this change?

Deployment operations can sometimes take longer than the default timeout, especially for larger applications. When timeouts or connection issues occur, it's difficult to determine if the deployment actually succeeded or failed. This change improves reliability by:

  1. Allowing users to customize the timeout based on their app's needs
  2. Automatically detecting successful deployments even when the connection was lost
  3. Providing clearer feedback about deployment status during connection issues

These improvements reduce deployment failures and eliminate the need for users to manually check if their deployment succeeded when timeouts occur.

Summary by CodeRabbit

  • New Features
    • Configurable deployment timeout for CLI/API (default 5 minutes).
    • More resilient deployments with automatic status checks after network timeouts to confirm success and reduce false failures.
    • Baseline snapshot and post-deploy verification to ensure the app is truly online before reporting.
    • Clearer progress and status messages during deployment.
    • Optional console override to direct CLI messages to a specific console, improving output control.

Copy link

coderabbitai bot commented Oct 1, 2025

Walkthrough

Introduces a configurable deploy timeout across CLI and API layers, adds post-deploy status polling with a baseline snapshot, augments retry logic to infer success on network errors, and enables console output routing via an optional console override in UX utilities.

Changes

Cohort / File(s) Summary
Deploy CLI flow (timeout, status checks, retries)
src/mcp_agent/cli/cloud/commands/deploy/main.py
Adds deploy_timeout parameter to deploy_config and _deploy_with_retry; passes timeout to deploy calls; captures baseline state; implements _check_deployment_status polling; on HTTPX timeouts/connect/protocol errors performs multi-check inference before retries; final status check after retries; integrates explicit console for progress output; preserves tags/secrets behavior.
MCP App API client (timeout propagation)
src/mcp_agent/cli/mcp_app/api_client.py
Updates deploy_app signature to accept timeout and passes it to HTTP request; updates docstring; deploymentMetadata handling unchanged.
UX console overrides
src/mcp_agent/cli/utils/ux.py
Adds console_override to print_info/print_success/print_warning/print_error; when console_output is True, routes to override console if provided; logging behavior unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI Deploy Command
  participant UX as UX Console
  participant API as MCP App API Client
  participant Svc as MCP Service

  User->>CLI: invoke deploy_config(..., deploy_timeout)
  CLI->>UX: print_info("Starting deploy", console_override)
  CLI->>API: deploy_app(app_id, metadata, timeout=deploy_timeout)
  alt Network/Timeout error
    CLI->>Svc: GET app status (poll x3 @2s)
    alt Status indicates ONLINE (updatedAt/server changed)
      CLI->>UX: print_success("Deployment inferred successful", console_override)
      CLI-->>User: exit success
    else Status inconclusive
      CLI->>UX: print_warning("Status check failed", console_override)
      CLI->>API: retry deploy_app(..., timeout)
    end
  else Deploy call returns
    CLI->>Svc: GET app status (compare to baseline)
    alt ONLINE and changed
      CLI->>UX: print_success("Deployment successful", console_override)
      CLI-->>User: exit success
    else Not updated/online
      CLI->>UX: print_error("Deployment not online", console_override)
      CLI-->>User: exit failure
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • jtcorbett
  • rholinshead
  • saqadri

Poem

I tapped my paws and set the clock, ⏱️
A timeout guard for every block.
I sniffed the wind—status online?
Three hops to check, and all is fine.
Console shines with gentle cheer—
Deploy is up; the path is clear! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “feat: robust app deploy retry” concisely conveys the primary enhancement of the pull request, namely the addition of more resilient retry logic for application deployments. It highlights the main focus—making deployments more robust—without extraneous detail or vagueness. The phrasing is clear and specific enough for team members to understand that improved retry behavior is the key change.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 10-01-feat_robust_app_deploy_retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@alienzach alienzach marked this pull request as ready for review October 1, 2025 21:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/mcp_agent/cli/utils/ux.py (1)

40-46: Document the new console override parameter.

Line 45’s docstring still lists only the original arguments, so callers won’t discover the new console_override hook. Please add it to the documented parameters for consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78f9a88 and b12619f.

📒 Files selected for processing (3)
  • src/mcp_agent/cli/cloud/commands/deploy/main.py (10 hunks)
  • src/mcp_agent/cli/mcp_app/api_client.py (2 hunks)
  • src/mcp_agent/cli/utils/ux.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/mcp_agent/cli/cloud/commands/deploy/main.py (4)
src/mcp_agent/cli/utils/ux.py (3)
  • print_warning (70-83)
  • print_success (54-67)
  • print_info (32-51)
src/mcp_agent/cli/mcp_app/api_client.py (2)
  • get_app (166-203)
  • deploy_app (310-347)
src/mcp_agent/cli/mcp_app/mock_client.py (1)
  • get_app (47-84)
src/mcp_agent/cli/exceptions.py (1)
  • CLIError (4-10)

Comment on lines 523 to +554
deploy_task,
description=f"✅ MCP App deployed successfully{attempt_suffix}!",
)
return app
except (httpx.ReadTimeout, httpx.RemoteProtocolError, httpx.ConnectError) as e:
progress.update(
deploy_task, description=f"⚠️ Connection lost{attempt_suffix}"
)
error_type = type(e).__name__
print_warning(
f"Lost connection to deployment API ({error_type}). "
f"The deployment may still be in progress on the server.",
console_override=log_console,
)

# Check deployment status 3 times with 2-second intervals
print_info(
"Checking deployment status (3 attempts with 2s intervals)...",
console_override=log_console,
)
for check_attempt in range(1, 4):
print_info(
f"Status check {check_attempt}/3...",
console_override=log_console,
)
status_app = await _check_deployment_status(
log_console=log_console
)
if status_app:
return status_app

if check_attempt < 3:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update progress output after status-check success.

When the status probe detects a successful deploy after a connection loss (Line 549), we return immediately with the spinner still showing “⚠️ Connection lost…”. Users get conflicting signals. Touching up the task description before returning keeps the CLI feedback accurate.

                     status_app = await _check_deployment_status(
                         log_console=log_console
                     )
                     if status_app:
+                        progress.update(
+                            deploy_task,
+                            description=f"✅ MCP App deployed successfully{attempt_suffix}!",
+                        )
                         return status_app
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
deploy_task,
description=f"✅ MCP App deployed successfully{attempt_suffix}!",
)
return app
except (httpx.ReadTimeout, httpx.RemoteProtocolError, httpx.ConnectError) as e:
progress.update(
deploy_task, description=f"⚠️ Connection lost{attempt_suffix}"
)
error_type = type(e).__name__
print_warning(
f"Lost connection to deployment API ({error_type}). "
f"The deployment may still be in progress on the server.",
console_override=log_console,
)
# Check deployment status 3 times with 2-second intervals
print_info(
"Checking deployment status (3 attempts with 2s intervals)...",
console_override=log_console,
)
for check_attempt in range(1, 4):
print_info(
f"Status check {check_attempt}/3...",
console_override=log_console,
)
status_app = await _check_deployment_status(
log_console=log_console
)
if status_app:
return status_app
if check_attempt < 3:
for check_attempt in range(1, 4):
print_info(
f"Status check {check_attempt}/3...",
console_override=log_console,
)
status_app = await _check_deployment_status(
log_console=log_console
)
if status_app:
progress.update(
deploy_task,
description=f"✅ MCP App deployed successfully{attempt_suffix}!",
)
return status_app
if check_attempt < 3:
...
🤖 Prompt for AI Agents
In src/mcp_agent/cli/cloud/commands/deploy/main.py around lines 523-554 the code
returns status_app from the status-check loop after a connection loss without
updating the progress task description, leaving the spinner showing "⚠️
Connection lost…"; before returning when status_app is truthy, call
progress.update(deploy_task, description=f"✅ MCP App deployed
successfully{attempt_suffix}!") (and optionally refresh the progress display) so
the CLI shows the successful deployment message prior to returning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant