-
Notifications
You must be signed in to change notification settings - Fork 246
fix(sequencers/single): deterministic queue #2938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
Summary of ChangesHello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Claude finished @alpe's task in 4m 6s —— View job Code Review CompleteI've completed a comprehensive review of the deterministic queue implementation. This PR makes an important architectural improvement by switching from content-hash-based keys to sequence-number-based keys, ensuring deterministic queue restoration after restarts. SummaryOverall Assessment: APPROVED with recommendations The implementation is well-designed and addresses a critical issue with queue state persistence. The code is production-ready with good test coverage. My main recommendations focus on improving observability (structured logging), error handling resilience, and addressing a few edge cases. 1. Code Quality & Architecture ✅Strengths:
Concerns:
2. Error Handling & Resilience
|
| Priority | Issue | Location | Recommendation |
|---|---|---|---|
| HIGH | Structured logging missing | Lines 160, 189, 199, 228, 234, 239 | Inject logger into BatchQueue |
| HIGH | Deletion failures cause state drift | Line 158-161 | Track failures or return error |
| HIGH | Breaking change not documented | PR description | Document migration behavior |
| MEDIUM | Migration failures silently drop data | Lines 233-240 | Return error or track failures |
| MEDIUM | Missing error path test coverage | N/A | Add datastore failure injection tests |
Minor Suggestions
-
Line 186: Consider extracting magic number
16to a constant:const seqKeyLength = 16 // hex-encoded uint64
-
Line 208: The validation logic is buried in a loop - consider extracting:
func isValidSeqKey(key string) (uint64, bool) { if len(key) != seqKeyLength { return 0, false } seq, err := strconv.ParseUint(key, 16, 64) return seq, err == nil }
-
Prepend comment (line 100): The comment says "NOTE: Prepend intentionally bypasses..." but doesn't explain why. Consider adding rationale (prevents transaction loss when batch size limits are hit).
Migration Consideration (Response to PR Author's Question)
"I am not sure if we need a migration."
Answer: You already have automatic migration ✅
The code in Load() (lines 186-246) implements automatic migration:
- Detects legacy keys (non-16-char hex strings)
- Re-persists with new sequence-based keys
- Deletes old hash-based keys
- Updates sequence counters
This is the correct approach. Manual migration scripts would be more complex and error-prone.
However, consider adding:
- Migration observability: Emit a log/metric when migration happens with count of migrated items
- Migration validation: After migration, verify all legacy keys are gone (add to test)
- Rollback documentation: If migration fails and operators need to rollback, document that they'd lose the deterministic queue benefit
Reference: Issue #2795
Per @julienrbrt's comment, this relates to #2795 for tracking legacy code removal before v1. The legacy key detection code (lines 205-223) should be tracked for eventual removal once all deployments have migrated.
Final Recommendation
APPROVE with the following action items before merge:
- MUST: Add structured logging (addresses gemini-code-assist feedback + production needs)
- SHOULD: Improve deletion error handling in
Next()to prevent state drift - SHOULD: Document the breaking change and migration behavior for operators
- NICE-TO-HAVE: Add error injection tests to improve coverage to >90%
The core logic is sound and the tests demonstrate correctness. The issues identified are about production robustness and observability.
Great work on solving the deterministic queue restoration problem! The sequence number approach is much more robust than content hashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the persistent queue to use deterministic sequence numbers for keys instead of content-based hashes. This is a significant improvement for robustness, ensuring the queue state can be reliably restored after a restart. The changes are well-implemented, including updating the Load function to correctly handle the new key scheme and resume sequencing. A new test case has been added to validate this new logic. My main feedback is to replace fmt.Printf calls with a structured logger for better observability in production.
| for result := range results.Next() { | ||
| if result.Error != nil { | ||
| fmt.Printf("Error reading entry from datastore: %v\n", result.Error) | ||
| continue | ||
| } | ||
| // We care about the last part of the key (the sequence number) | ||
| // ds.Key usually has a leading slash. | ||
| key := ds.NewKey(result.Key).Name() | ||
|
|
||
| pbBatch := &pb.Batch{} | ||
| err := proto.Unmarshal(result.Value, pbBatch) | ||
| if err != nil { | ||
| fmt.Printf("Error decoding batch for key '%s': %v. Skipping entry.\n", result.Key, err) | ||
| fmt.Printf("Error decoding batch for key '%s': %v. Skipping entry.\n", key, err) | ||
| continue | ||
| } | ||
| bq.queue = append(bq.queue, coresequencer.Batch{Transactions: pbBatch.Txs}) | ||
|
|
||
| batch := coresequencer.Batch{Transactions: pbBatch.Txs} | ||
| bq.queue = append(bq.queue, queuedItem{Batch: batch, Key: key}) | ||
|
|
||
| // Update sequences based on loaded keys to avoid collisions | ||
| if seq, err := strconv.ParseUint(key, 16, 64); err == nil { | ||
| if seq >= bq.nextAddSeq { | ||
| bq.nextAddSeq = seq + 1 | ||
| } | ||
| if seq <= bq.nextPrependSeq { | ||
| bq.nextPrependSeq = seq - 1 | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throughout the Load and Next functions, errors are logged using fmt.Printf. In a library or production service, it's better to use a structured logging library (e.g., log/slog, zap, zerolog). This provides several benefits:
- Controllable Output: Consumers of the package can direct logs to files, network services, etc.
- Log Levels: You can assign severity levels (e.g.,
error,warn,info) to messages. - Structured Data: You can include key-value pairs (like the
keyin this case) for easier parsing and filtering by log analysis tools.
Consider injecting a logger into the BatchQueue to handle these cases.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2938 +/- ##
==========================================
+ Coverage 59.06% 59.19% +0.13%
==========================================
Files 90 90
Lines 8596 8627 +31
==========================================
+ Hits 5077 5107 +30
- Misses 2937 2938 +1
Partials 582 582
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overview
State should be same after restart.
Note to reviewer:
This PR contains is a breaking change the the queue store. I am not sure if we need a migration.