feat: robust app deploy retry #536

alienzach · 2025-10-01T20:48:41Z

TL;DR

Added robust error handling for deployment timeouts and connection issues, with automatic status verification to detect successful deployments despite connection failures.

What changed?

Added a new --deploy-timeout parameter (30-1200 seconds, default 300) to customize HTTP timeout for deployment API calls
Implemented connection error handling for httpx.ReadTimeout, httpx.RemoteProtocolError, and httpx.ConnectError
Added a deployment status verification mechanism that checks if a deployment succeeded even when the connection was lost
The system now captures a baseline app snapshot before deployment to detect changes
Added automatic retry logic with 3 status checks at 2-second intervals when connection issues occur
Made the timeout configurable in the deploy_app method of the API client

How to test?

Deploy an app with the default timeout:
Deploy with a custom timeout:
Test error recovery by deploying a large app that might trigger timeouts or connection issues

Why make this change?

Deployment operations can sometimes take longer than the default timeout, especially for larger applications. When timeouts or connection issues occur, it's difficult to determine if the deployment actually succeeded or failed. This change improves reliability by:

Allowing users to customize the timeout based on their app's needs
Automatically detecting successful deployments even when the connection was lost
Providing clearer feedback about deployment status during connection issues

These improvements reduce deployment failures and eliminate the need for users to manually check if their deployment succeeded when timeouts occur.

Summary by CodeRabbit

New Features
- Configurable deployment timeout for CLI/API (default 5 minutes).
- More resilient deployments with automatic status checks after network timeouts to confirm success and reduce false failures.
- Baseline snapshot and post-deploy verification to ensure the app is truly online before reporting.
- Clearer progress and status messages during deployment.
- Optional console override to direct CLI messages to a specific console, improving output control.

coderabbitai · 2025-10-01T20:48:49Z

Walkthrough

Introduces a configurable deploy timeout across CLI and API layers, adds post-deploy status polling with a baseline snapshot, augments retry logic to infer success on network errors, and enables console output routing via an optional console override in UX utilities.

Changes

Cohort / File(s)	Summary
Deploy CLI flow (timeout, status checks, retries) `src/mcp_agent/cli/cloud/commands/deploy/main.py`	Adds deploy_timeout parameter to deploy_config and _deploy_with_retry; passes timeout to deploy calls; captures baseline state; implements _check_deployment_status polling; on HTTPX timeouts/connect/protocol errors performs multi-check inference before retries; final status check after retries; integrates explicit console for progress output; preserves tags/secrets behavior.
MCP App API client (timeout propagation) `src/mcp_agent/cli/mcp_app/api_client.py`	Updates deploy_app signature to accept timeout and passes it to HTTP request; updates docstring; deploymentMetadata handling unchanged.
UX console overrides `src/mcp_agent/cli/utils/ux.py`	Adds console_override to print_info/print_success/print_warning/print_error; when console_output is True, routes to override console if provided; logging behavior unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI Deploy Command
  participant UX as UX Console
  participant API as MCP App API Client
  participant Svc as MCP Service

  User->>CLI: invoke deploy_config(..., deploy_timeout)
  CLI->>UX: print_info("Starting deploy", console_override)
  CLI->>API: deploy_app(app_id, metadata, timeout=deploy_timeout)
  alt Network/Timeout error
    CLI->>Svc: GET app status (poll x3 @2s)
    alt Status indicates ONLINE (updatedAt/server changed)
      CLI->>UX: print_success("Deployment inferred successful", console_override)
      CLI-->>User: exit success
    else Status inconclusive
      CLI->>UX: print_warning("Status check failed", console_override)
      CLI->>API: retry deploy_app(..., timeout)
    end
  else Deploy call returns
    CLI->>Svc: GET app status (compare to baseline)
    alt ONLINE and changed
      CLI->>UX: print_success("Deployment successful", console_override)
      CLI-->>User: exit success
    else Not updated/online
      CLI->>UX: print_error("Deployment not online", console_override)
      CLI-->>User: exit failure
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

update deploy to retry if deployment failed #477 — Also modifies deploy_config and _deploy_with_retry in the same file, focusing on retry behavior.
Allow specifying app name optionally in deploy command #527 — Updates deploy_config signature and deployment flow in the same module, overlapping with current changes.

Suggested reviewers

jtcorbett
rholinshead
saqadri

Poem

I tapped my paws and set the clock, ⏱️
A timeout guard for every block.
I sniffed the wind—status online?
Three hops to check, and all is fine.
Console shines with gentle cheer—
Deploy is up; the path is clear! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “feat: robust app deploy retry” concisely conveys the primary enhancement of the pull request, namely the addition of more resilient retry logic for application deployments. It highlights the main focus—making deployments more robust—without extraneous detail or vagueness. The phrasing is clear and specific enough for team members to understand that improved retry behavior is the key change.
Docstring Coverage	✅ Passed	Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch 10-01-feat_robust_app_deploy_retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

alienzach · 2025-10-01T20:48:57Z

feat: robust app deploy retry #536 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/mcp_agent/cli/utils/ux.py (1)

40-46: Document the new console override parameter.

Line 45’s docstring still lists only the original arguments, so callers won’t discover the new console_override hook. Please add it to the documented parameters for consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78f9a88 and b12619f.

📒 Files selected for processing (3)

src/mcp_agent/cli/cloud/commands/deploy/main.py (10 hunks)
src/mcp_agent/cli/mcp_app/api_client.py (2 hunks)
src/mcp_agent/cli/utils/ux.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/mcp_agent/cli/cloud/commands/deploy/main.py (4)

src/mcp_agent/cli/utils/ux.py (3)

print_warning (70-83)

print_success (54-67)

print_info (32-51)

src/mcp_agent/cli/mcp_app/api_client.py (2)

get_app (166-203)

deploy_app (310-347)

src/mcp_agent/cli/mcp_app/mock_client.py (1)

get_app (47-84)

src/mcp_agent/cli/exceptions.py (1)

CLIError (4-10)

coderabbitai · 2025-10-01T21:47:26Z

src/mcp_agent/cli/cloud/commands/deploy/main.py

                    deploy_task,
                    description=f"✅ MCP App deployed successfully{attempt_suffix}!",
                )
                return app
+            except (httpx.ReadTimeout, httpx.RemoteProtocolError, httpx.ConnectError) as e:
+                progress.update(
+                    deploy_task, description=f"⚠️  Connection lost{attempt_suffix}"
+                )
+                error_type = type(e).__name__
+                print_warning(
+                    f"Lost connection to deployment API ({error_type}). "
+                    f"The deployment may still be in progress on the server.",
+                    console_override=log_console,
+                )
+
+                # Check deployment status 3 times with 2-second intervals
+                print_info(
+                    "Checking deployment status (3 attempts with 2s intervals)...",
+                    console_override=log_console,
+                )
+                for check_attempt in range(1, 4):
+                    print_info(
+                        f"Status check {check_attempt}/3...",
+                        console_override=log_console,
+                    )
+                    status_app = await _check_deployment_status(
+                        log_console=log_console
+                    )
+                    if status_app:
+                        return status_app
+
+                    if check_attempt < 3:


⚠️ Potential issue | 🟡 Minor

Update progress output after status-check success.

When the status probe detects a successful deploy after a connection loss (Line 549), we return immediately with the spinner still showing “⚠️ Connection lost…”. Users get conflicting signals. Touching up the task description before returning keeps the CLI feedback accurate.

status_app = await _check_deployment_status( log_console=log_console ) if status_app: + progress.update( + deploy_task, + description=f"✅ MCP App deployed successfully{attempt_suffix}!", + ) return status_app

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

deploy_task,

description=f"✅ MCP App deployed successfully{attempt_suffix}!",

)

return app

except (httpx.ReadTimeout, httpx.RemoteProtocolError, httpx.ConnectError) as e:

progress.update(

deploy_task, description=f"⚠️ Connection lost{attempt_suffix}"

)

error_type = type(e).__name__

print_warning(

f"Lost connection to deployment API ({error_type}). "

f"The deployment may still be in progress on the server.",

console_override=log_console,

)

# Check deployment status 3 times with 2-second intervals

print_info(

"Checking deployment status (3 attempts with 2s intervals)...",

console_override=log_console,

)

for check_attempt in range(1, 4):

print_info(

f"Status check {check_attempt}/3...",

console_override=log_console,

)

status_app = await _check_deployment_status(

log_console=log_console

)

if status_app:

return status_app

if check_attempt < 3:

for check_attempt in range(1, 4):

print_info(

f"Status check {check_attempt}/3...",

console_override=log_console,

)

status_app = await _check_deployment_status(

log_console=log_console

)

if status_app:

progress.update(

deploy_task,

description=f"✅ MCP App deployed successfully{attempt_suffix}!",

)

return status_app

if check_attempt < 3:

...

🤖 Prompt for AI Agents

In src/mcp_agent/cli/cloud/commands/deploy/main.py around lines 523-554 the code returns status_app from the status-check loop after a connection loss without updating the progress task description, leaving the spinner showing "⚠️ Connection lost…"; before returning when status_app is truthy, call progress.update(deploy_task, description=f"✅ MCP App deployed successfully{attempt_suffix}!") (and optionally refresh the progress display) so the CLI shows the successful deployment message prior to returning.

feat: robust app deploy retry

d3ef0c5

fix: logging reformatting

b12619f

alienzach marked this pull request as ready for review October 1, 2025 21:40

alienzach requested review from rholinshead and saqadri October 1, 2025 21:40

coderabbitai bot reviewed Oct 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: robust app deploy retry #536

feat: robust app deploy retry #536

Uh oh!

alienzach commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

alienzach commented Oct 1, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: robust app deploy retry #536

Are you sure you want to change the base?

feat: robust app deploy retry #536

Uh oh!

Conversation

alienzach commented Oct 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What changed?

How to test?

Why make this change?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

alienzach commented Oct 1, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alienzach commented Oct 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 1, 2025 •

edited

Loading