Skip to content

Conversation

muhammad-ali-e
Copy link
Contributor

What

  • Added write_only=True parameter to KombuManager in log_consumer worker
  • Prevents creation of entries in Redis unacked hash for emit-only workers

Why

  • Redis unacked hash was consuming 145 MB with 110K entries in integration environment
  • Log consumer worker only emits events (never receives), but was tracking message delivery
  • Kombu creates unacked entries to track message acknowledgments, but has no TTL
  • Integration environment (110K entries) vs Staging (2.6K entries) due to higher activity
  • Memory leak caused by messages from disconnected clients never being cleaned up

How

  • Set write_only=True in socketio.KombuManager() constructor in workers/log_consumer/tasks.py
  • This tells the worker: "You only publish, don't track delivery"
  • Prevents future accumulation while maintaining message delivery to UI clients
  • Backend (which receives WebSocket connections) continues to use default write_only=False

Can this PR break any existing features. If yes, please list possible items. If no, please explain why.

No, this PR cannot break existing features:

  1. Message delivery unchanged: Log consumer emits to Redis, backend receives and forwards to UI
  2. Architecture preserved: write_only=True only affects the worker's internal tracking, not the message flow
  3. Backend unaffected: Backend SocketIO still receives and delivers messages normally
  4. WebSocket clients unaffected: UI continues to receive real-time logs via WebSocket
  5. Single-line configuration change: No logic changes, only optimization of message tracking
  6. Tested pattern: write_only=True is the recommended configuration for emit-only workers per python-socketio documentation

Database Migrations

  • None

Env Config

  • No new environment variables required
  • No changes to existing configuration

Relevant Docs

Related Issues or PRs

  • Addresses Redis memory leak issue with unacked hash (internal investigation)
  • Related to UN-2470: Remove Django dependency from Celery workers

Dependencies Versions

  • No dependency changes
  • Uses existing python-socketio library

Notes on Testing

Testing strategy:

  1. Deploy to staging environment first
  2. Verify UI still receives real-time logs in workflow execution
  3. Monitor Redis unacked hash size - should stop growing
  4. After 24-hour validation in staging, deploy to integration
  5. Monitor integration for 48 hours to confirm leak is stopped

Success criteria:

  • UI receives logs in real-time (no regression)
  • unacked hash stops growing beyond current 110K entries
  • Memory usage stabilizes

Rollback plan:

  • Remove write_only=True parameter if any issues detected
  • Single-line change makes rollback trivial

Screenshots

N/A - Backend optimization, no UI changes

Checklist

✅ I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

Added write_only=True parameter to KombuManager in log_consumer worker
to prevent accumulation of unacked messages in Redis.
Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

Summary by CodeRabbit

  • Chores
    • Improved reliability and performance of background event delivery by streamlining how messages are written to the message broker.
    • Reduces overhead and connection churn on workers, leading to more consistent real-time updates under high traffic.
    • Minimizes transient delivery hiccups and lowers resource usage for steadier operation.
    • No changes to UI or workflows; behavior remains the same, with enhanced stability.

Walkthrough

Updated Socket.IO client manager initialization in workers/log_consumer/tasks.py to instantiate KombuManager with write_only=True. No other logic or control flow changes.

Changes

Cohort / File(s) Summary of Changes
Socket.IO KombuManager init
`workers/log_consumer/tasks.py`
Set KombuManager initialization to use write_only=True for broker interactions; no other code paths modified.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title directly reflects the main change—addressing the Redis unacked hash memory leak in the log consumer worker—and includes the related ticket ID. It is concise, specific, and avoids extraneous details, making it clear to reviewers what the pull request accomplishes.
Description Check ✅ Passed The pull request description adheres to the repository template by supplying complete sections for What, Why, How, impact on existing features, database migrations, environment configuration, relevant documentation, related issues, dependency versions, testing strategy, screenshots, and a checklist. Each section is populated with clear, detailed information and actionable criteria, ensuring reviewers have all necessary context.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/UN-2857-MISC_fix-redis-unacked-memory-leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
workers/log_consumer/tasks.py (2)

31-31: Consider clarifying the comment about configuration difference.

The comment states "uses same KombuManager as backend" but this worker now uses write_only=True while the backend uses the default write_only=False. Consider updating the comment to reflect this configuration difference.

Apply this diff to clarify the comment:

-# Socket.IO client for emitting events (uses same KombuManager as backend)
+# Socket.IO client for emitting events (uses KombuManager with write_only=True for emit-only worker)

40-40: Approve write-only KombuManager for emit-only worker

  • write_only=True is correct since this worker only emits events (no handlers) and prevents the Redis unacked hash memory leak.
  • Optional: update the module comment to reflect the write-only configuration.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 26f423c and 281de1d.

📒 Files selected for processing (1)
  • workers/log_consumer/tasks.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build

Copy link
Contributor

github-actions bot commented Oct 8, 2025

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{11}}$$ $$\textcolor{#23d18b}{\tt{11}}$$

Copy link

sonarqubecloud bot commented Oct 8, 2025

@chandrasekharan-zipstack
Copy link
Contributor

@muhammad-ali-e will this change cause messages to be dropped by any chance? Will simply adding a TTL for these messages help instead?

@muhammad-ali-e
Copy link
Contributor Author

@muhammad-ali-e will this change cause messages to be dropped by any chance? Will simply adding a TTL for these messages help instead?

Tested this with integration. Also it seems socketio creating single hashmap instaed of each key. So TTL might not work on each internal elements.

@ritwik-g ritwik-g merged commit 435759b into main Oct 10, 2025
6 checks passed
@ritwik-g ritwik-g deleted the feat/UN-2857-MISC_fix-redis-unacked-memory-leak branch October 10, 2025 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants