PUT response delivery fails in 2-node networks with proximity cache broadcasts

## Summary

The `test_gateway_reconnection` test fails when proximity cache broadcasts are enabled in 2-node networks. The PUT operation succeeds (contract is cached), but the response never reaches the client, causing a timeout.

## Environment

- **Branch**: `pr-1853-clean-restart` (PR #1937)
- **Commit**: Current HEAD (workaround removed per @sanity feedback)
- **Affected Test**: `crates/core/tests/connectivity.rs::test_gateway_reconnection`

## Symptoms

```
test test_gateway_reconnection has been running for over 60 seconds
thread 'test_gateway_reconnection' panicked at crates/core/tests/connectivity.rs:304:13:
error: anyhow::Error - Timeout waiting for put response
```

**Timeline**:
1. Gateway and peer connect successfully
2. PUT operation initiated by client
3. Both nodes successfully cache the contract (logs show "Added contract to cache")
4. Proximity cache broadcast sent to peer
5. **PUT response never arrives at client**
6. Test times out after 60s

## Investigation Findings

### ✅ What's Working Correctly

1. **No message loops**: Proximity cache protocol is architecturally sound
   - `handle_message()` returns `None` for `CacheAnnounce` (mod.rs:1057-1078)
   - Broadcasts are one-way, responses are point-to-point only
   - Deduplication works via `cache.insert(hash)`

2. **No packet floods**: Unlike the original issue in commit 16623695
   - Original: 1300+ dropped packets
   - Current: Only 1 dropped websocket connection (test cleanup)
   - The packet flood issue appears to have been resolved

3. **PUT operation succeeds**: 
   - Contract is successfully cached on both nodes
   - No errors in PUT operation logs
   - Proximity cache broadcast is sent correctly

### ❌ The Problem

**PUT response fails to reach client specifically in 2-node networks when proximity broadcasts occur.**

Evidence from logs:
```
2025-10-19T01:32:09.913709Z INFO Created new operation - starting network request, transaction: 01K7X1GW9SSBGBY8R3RJN69M01
2025-10-19T01:32:09.941766Z INFO PROXIMITY_PROPAGATION: Added contract to cache
2025-10-19T01:32:10.075582Z INFO PROXIMITY_PROPAGATION: Added contract to cache
[60 seconds of silence]
error: anyhow::Error - Timeout waiting for put response
```

## Current Workaround

The workaround from commit 16623695 still fixes the issue:

```rust
// In crates/core/src/node/network_bridge/p2p_protoc.rs:705
NodeEvent::BroadcastProximityCache { from, message } => {
    if self.connections.len() <= 1 {
        tracing::debug!("Skipping broadcast in 2-node network");
        continue;
    }
    // ... broadcast logic
}
```

**Test Results**:
- ✅ With workaround: Test passes in 27.22s
- ❌ Without workaround: Test times out after 60s

## Code References

**Proximity cache broadcast sender** (p2p_protoc.rs:705-740):
- Event handler spawns tasks to send broadcasts to all peers
- Each broadcast is a `NetMessage::V1(NetMessageV1::ProximityCache)`

**Proximity cache message receiver** (mod.rs:1057-1078):
- Handles incoming `ProximityCache` messages
- Updates neighbor cache knowledge
- Optionally sends point-to-point responses (for state requests only)

**PUT response flow** (mod.rs:907-952):
- PUT operation completes successfully
- Response should be sent back to client
- **Something interferes with delivery in 2-node networks**

## Investigation Approach Taken

1. ✅ Traced complete message flow (PUT → cache → broadcast → receive)
2. ✅ Verified no message loops exist in protocol
3. ✅ Confirmed PUT operation completes successfully
4. ✅ Checked for packet floods (none found)
5. ✅ Verified workaround still fixes the issue
6. ❌ **Not yet identified**: Why PUT responses don't reach clients

## Questions for Further Investigation

1. **Response routing**: Does the proximity broadcast somehow interfere with the PUT response channel?
2. **Timing issue**: Is there a race condition between broadcast completion and response delivery?
3. **2-node specific**: Why does this only affect 2-node networks? (3+ node tests pass)
4. **Client session state**: Does the broadcast affect client transaction tracking?

## Proposed Next Steps

**Option A** (Short-term): Restore workaround with improved documentation
- Re-add `connections.len() <= 1` check
- Update comment to reflect new understanding (not a packet flood, but a response delivery issue)
- Document as known limitation requiring investigation

**Option B** (Investigation): Debug PUT response delivery
- Add detailed tracing to response routing code
- Check if `result_router_tx` channel has issues in 2-node scenarios
- Examine `MessageProcessor` behavior during broadcasts

**Option C** (Test modification): Disable proximity cache for 2-node tests
- Production networks will have >2 nodes
- Keep proximity cache enabled for 3+ node tests
- Add separate test specifically for 2-node proximity cache behavior

## Related Context

- Original workaround: commit 16623695 (Oct 7, 2025)
- Stack overflow fix: commit 16623695 (spawning broadcast tasks)
- PR review feedback: @sanity requested workaround removal
- This investigation: Attempting to address root cause per user request

[AI-assisted debugging and comment]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PUT response delivery fails in 2-node networks with proximity cache broadcasts #1960

Summary

Environment

Symptoms

Investigation Findings

✅ What's Working Correctly

❌ The Problem

Current Workaround

Code References

Investigation Approach Taken

Questions for Further Investigation

Proposed Next Steps

Related Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PUT response delivery fails in 2-node networks with proximity cache broadcasts #1960

Description

Summary

Environment

Symptoms

Investigation Findings

✅ What's Working Correctly

❌ The Problem

Current Workaround

Code References

Investigation Approach Taken

Questions for Further Investigation

Proposed Next Steps

Related Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions