-
-
Notifications
You must be signed in to change notification settings - Fork 107
refactor: refine fire-and-forget UPDATE semantics #2038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Overall AssessmentThis PR makes a significant architectural change to UPDATE semantics, moving from hop-by-hop acknowledgments to fire-and-forget propagation. While the intent aligns with the delta-sync design, there are several critical issues that need addressing before this can safely merge. Recommendation: Request changes Critical Issues1. Incomplete Refactoring - Dead Code in
|
|
Thanks for the detailed responses. Most of your explanations make sense, but I have some follow-up concerns: Items Where Your Explanation Resolves My Concern✅ #1 (AwaitingResponse) - Agreed, clean this up Items That Need Clarification#2: Result Delivery Timing - More Aggressive Than DescribedYour explanation says:
But looking at // Line 1035-1044 in update.rs
if let Some(target) = stats.target {
// About to SEND the request to a remote peer
deliver_update_result(op_manager, id, key, summary.clone()).await?;
let msg = UpdateMsg::RequestUpdate {
id,
key,
sender,
target, // Remote peer that hasn't seen this yet
// ...
};This means the client gets "success" even before the first remote hop, not after merge. Is this really what you intend? This seems even more aggressive than PUT's early completion. Suggested clarification: Either update the explanation to match the code, or move #3: No-Target Bounce - Need Code VerificationYour explanation:
I'd like to verify this. Can you point me to where Request: Add the trace assertion you mentioned, or show me the code path that guarantees termination. #6: PUT vs UPDATE Asymmetry - Still ConcerningYou mention
Question: What's the fundamental reason PUT needs acks but UPDATE doesn't? They both modify state and notify subscribers. Is it just that PUT carries the contract code (heavier payload)? If the real difference is "PUT must ensure the contract is cached somewhere, UPDATE just propagates to existing subscribers," that's a valid distinction - but please document it explicitly. Items You Didn't Address#6: Byte Equality FragilityYou didn't comment on the byte-equality comparison for detecting
Especially concerning for contracts that use maps, floating point, or have multiple valid serializations. Request: Either justify why byte equality is correct, or note this as a known limitation. #7: Operation Marked Complete While BroadcastingYou didn't address that
This is different from PUT, which keeps the operation alive ( Question: Is this acceptable? If so, document that UPDATE failures during broadcast are silent and unrecoverable. Testing RequestYou mention:
Can you also run the gateway test framework as I suggested? cd ~/code/freenet/freenet-testing-tools/gateway-testing
python gateway_test_framework.py --local --extended-stability --multi-room 50This would verify the changes work end-to-end with River, which heavily uses UPDATE operations. The integration tests are good but limited in scope compared to real-world scenarios. SummaryYour fixes for #1, #4, #5 sound good. However:
Once these are clarified/fixed, I think the PR will be in much better shape. [AI-assisted debugging and comment] |
|
Thanks for the thorough pass — quick update now that the follow-ups are live:
Let me know if you spot anything else worth tightening up. [AI-assisted] |
|
Thanks for the quick fixes. Most of the concerns are addressed. Remaining ItemsByte Equality (Still Unaddressed)You haven't commented on the byte-equality comparison for state change detection (
This is especially problematic for contracts using:
Request: At minimum, add a comment documenting this limitation so contract authors know they must ensure deterministic serialization. Gateway TestingYou ran unit tests but not the end-to-end River testing I requested: cd ~/code/freenet/freenet-testing-tools/gateway-testing
python gateway_test_framework.py --local --extended-stability --multi-room 50River heavily uses UPDATE operations. Running this would verify the fire-and-forget semantics work in real-world scenarios, not just isolated unit tests. Request: Run this before merge to ensure no regressions. Good Fixes✅ Removed These address the core implementation concerns well. If you document the byte equality limitation and verify with gateway testing, I think this is ready to merge. [AI-assisted - Claude] |
|
Got it. I’ll stick with the integration suites for now—River’s multi-room gateway run has never passed cleanly for us, so I’d rather not block on a known flaky harness. If we want additional coverage I can stage a smaller targeted scenario. [AI-assisted] |
crates/core/src/operations/put.rs
Outdated
| let mut broadcasted_to = *broadcasted_to; | ||
|
|
||
| if upstream.peer == sender.peer { | ||
| // Originator reached the subscription tree. We can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment: we should never reach this branch since the operation should be marked as completed before and we should filter it out in our dedup transaction logic
I would bump up the trace level here to warn so we detect if this happens cause it likely means a bug in the dedupping logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call — I flipped that branch to a warn! and added a note that dedup should have already completed the operation. If we ever see it fire we will know we have a regression in the transaction tracker.
[AI-assisted]
| sender, | ||
| }); | ||
| new_state = None; | ||
| return_msg = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanity didn't we say we were gonna return succesful puts? did you change your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we do this ensure in tests we are asserting the contracts being put are cached in the expected peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still send the SuccessfulPut upstream, but we now emit it immediately when the target node starts broadcasting (conn_manager.send(&upstream.peer, …) a few lines below) instead of bubbling it back through return_msg. That keeps the request path latency the same while the fan-out continues in the background.
[AI-assisted]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep — test_put_contract already exercises this by fetching the contract from both nodes immediately after the PUT. The helper routes through verify_contract_exists, so we assert the bundle is cached on each peer before the test completes.
[AI-assisted]
|
Added the byte-equality caveat inline (so contract authors know they need deterministic serialization), and bumped the PUT originator branch to a I poked at the gateway test you mentioned, but the old [AI-assisted] |
|
Created follow-up issues for remaining items:
[AI-assisted - Claude] |
Background
The old UPDATE state machine waited on hop-by-hop
SuccessfulUpdateacks even though delta-sync treats UPDATE fan-out as fire-and-forget. That mismatch left dangling state (AwaitingResponse), stalled local transitions, and gave us no clean way to short-circuit idempotent merges.What changed
UpdateExecutionso each merge reports{value, summary, changed}and we can surface the final summary immediately while skipping broadcasts forUpsertResult::NoChangecrates/core/src/operations/update.rs:64crates/core/src/operations/update.rs:762.AwaitingResponsebranch and tightened the queueing path so onlyUpdate::Broadcastingmessages may re-enter the local loopcrates/core/src/operations/update.rs:314crates/core/src/operations/mod.rs:136.deliver_update_resultnow documents (and enforces) the fire-and-forget contract: the client hears “success” as soon as the merge succeeds, while propagation continues asynchronouslycrates/core/src/operations/update.rs:1074.UpdateNoChangeno longer errors when the cached state is missing; we fetch or fall back to the requested state so idempotent updates succeed cleanlycrates/core/src/operations/update.rs:762.SuccessfulUpdatehandling across tracing, network parsing, and UI fixtures to keep observability aligned with the new protocolcrates/core/src/tracing/mod.rs:312crates/core/src/node/network_bridge/p2p_protoc.rs:1662network-monitor/src/type_definitions.ts:82.Behavioural notes
UPDATE_PROPAGATIONlogs/metrics for fan-out visibility.UpsertResult::NoChangesimply return; the forwarding peer has already completed its operation when it handed the request off, so no extra ack is required.Testing
cargo test --test operations -- --nocapturecargo test --test connectivity -- --nocapturecargo clippy --all-targets --all-featuresFixes #2037