Skip to content

Conversation

@sanity
Copy link
Collaborator

@sanity sanity commented Oct 31, 2025

Summary

  • ensure gateways in early network formation immediately register accepted connections so later FindOptimalPeer requests have a viable candidate for peer↔peer links
  • harden test_three_node_network_connectivity to require public peer ports and fail fast if the full mesh never forms, logging the final topology snapshot for triage

Key Changes

if is_gateway
    && accepted
    && (num_connections == 0 || (num_connections < EARLY_NETWORK_THRESHOLD && has_capacity))
{
    let connectivity_info = ConnectivityInfo::new_bootstrap(joiner.clone(), 1);
    return Ok(Some(ConnectState::AwaitingConnectivity(connectivity_info)));
}
let peer1_public_port = peer1.network_port.context(
    "peer1 missing network port; auto_connect_peers requires public_port for mesh connectivity",
)?;
// ...
if !mesh_established {
    bail!(
        "Failed to establish full mesh connectivity after {} attempts. Gateway peers: {}; peer1 peers: {}; peer2 peers: {}",
        MAX_RETRIES,
        last_snapshot.0,
        last_snapshot.1,
        last_snapshot.2
    );
}
  • The first snippet guarantees gateways keep bookkeeping in sync for the first few peers, which unblocks routing recommendations between non-gateway nodes.
  • The second snippet prevents silent test passes when peers never advertise a dialable port and reports the final topology when the mesh fails to appear.

Testing

  • cargo test --test connectivity test_three_node_network_connectivity -- --nocapture
  • cargo make test

Notes

  • The panic logged from op_state_manager.rs during shutdown reproduces on the base branch and is outside the scope of this fix.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a #[freenet_test] macro that dramatically simplifies integration test authoring for the Freenet project. The macro automates boilerplate setup including node configuration, event aggregation, TestLogger initialization, and comprehensive failure reporting.

Key Changes:

  • New freenet-macros crate with procedural macro for test setup automation
  • Comprehensive documentation in markdown files explaining macro usage and enhanced reporting
  • Conversion of existing integration tests to use the new macro, reducing test code by ~50+ lines per test
  • Addition of flush handle support in node building for proper event aggregation
  • Deletion of docs/TESTING.md file (315 lines removed)

Reviewed Changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
crates/freenet-macros/Cargo.toml New crate configuration with dependencies for procedural macro support
crates/freenet-macros/src/lib.rs Main macro entry point and documentation
crates/freenet-macros/src/parser.rs Attribute parsing logic for macro configuration
crates/freenet-macros/src/codegen.rs Code generation for expanded test functions
crates/freenet-macros/README.md Comprehensive documentation for the macro
docs/debugging/testing-logging-guide.md Updated guide with macro usage and enhanced reporting features
docs/TESTING.md Deleted - content likely consolidated elsewhere
crates/core/tests/operations.rs Converted tests to use new macro, significant code reduction
crates/core/tests/isolated_node_regression.rs Converted tests to use new macro
crates/core/tests/error_notification.rs Converted tests to use new macro
crates/core/tests/test_macro_example.rs Example tests demonstrating macro features
crates/core/src/node/mod.rs Added build_with_flush_handle method for event aggregation
crates/core/src/ring/mod.rs Minor whitespace cleanup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sanity sanity force-pushed the fix/2039-peer-connectivity branch from e590035 to 33a7a18 Compare November 1, 2025 00:03
@sanity sanity changed the title Fix peer mesh bootstrap after aggregator fix(connectivity): restore peer mesh after aggregator Nov 1, 2025
@iduartgomez
Copy link
Collaborator

@claude there will be any issues with reservations by doing this change?

@claude
Copy link

claude bot commented Nov 1, 2025

Claude finished @iduartgomez's task —— View job


Analyzing reservation impact

  • Read the changed files to understand the modifications
  • Search for reservation-related code
  • Analyze potential conflicts between connectivity changes and reservations
  • Provide answer with findings

Analysis

Good question! I've analyzed the changes and the reservation system. The short answer is: No, there should not be any issues with reservations from this change. Here's why:

How Reservations Work

The reservation system in connection_manager.rs (should_accept() at line 143) works as follows:

  1. When a peer receives a connection request, should_accept() is called
  2. It immediately increments reserved_connections (line 150) to reserve a slot
  3. If the connection is rejected, it decrements the reservation (lines 187-188)
  4. If accepted and the connection completes, add_connection() decrements the reservation (lines 257-259)

The Change in This PR

The key change in connect.rs:1109-1129 is in the forward_conn() function:

const EARLY_NETWORK_THRESHOLD: usize = 4;
let has_capacity = num_connections + num_reserved < max_connections;
if is_gateway
    && accepted
    && (num_connections == 0 || (num_connections < EARLY_NETWORK_THRESHOLD && has_capacity))
{
    // Accept directly and register immediately
    let connectivity_info = ConnectivityInfo::new_bootstrap(joiner.clone(), 1);
    return Ok(Some(ConnectState::AwaitingConnectivity(connectivity_info)));
}

Why This is Safe

  1. Reservation already happened: By the time forward_conn() is called, should_accept() has already been invoked (at connect.rs:324), which means a reservation slot was already incremented.

  2. The capacity check respects reservations: Notice line 1108: let has_capacity = num_connections + num_reserved < max_connections; — This correctly accounts for both active connections AND reserved slots.

  3. Proper cleanup path: When the connection is eventually registered via add_connection(), the was_reserved=true parameter ensures the reservation counter is properly decremented (see connection_manager.rs:256-268).

  4. Early network threshold prevents overload: The condition num_connections < EARLY_NETWORK_THRESHOLD && has_capacity ensures that even during early network formation, the gateway won't accept more connections than it has capacity for.

Trace Through Early Network Formation

Let's trace what happens when a gateway with 0 connections receives a join request:

  1. CheckConnectivity arrives at gateway (has 0 connections)
  2. should_accept() is called → reserves slot (reserved=1)
  3. forward_conn() sees: num_connections=0, num_reserved=1, accepted=true
  4. Bootstrap path triggers → returns AwaitingConnectivity with is_bootstrap_acceptance=true
  5. Connection completes → add_connection(was_reserved=true) decrements reservation back to 0

Conclusion

The change is safe because it operates within the existing reservation framework rather than bypassing it. The early network acceptance logic only triggers when there's capacity (has_capacity check), and all connection completions still properly decrement the reservation counter.


@iduartgomez iduartgomez added this pull request to the merge queue Nov 1, 2025
Merged via the queue into main with commit abd0c7b Nov 1, 2025
27 of 28 checks passed
@iduartgomez iduartgomez deleted the fix/2039-peer-connectivity branch November 1, 2025 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants