Skip to content

Conversation

@christophefontaine
Copy link
Collaborator

@christophefontaine christophefontaine commented Nov 24, 2025

EVPN use cases require to have a bridge as it pushes ip/mac tuples.
For VXLAN and SRv6 based EVPN, FRR will have to implement DPLANE_OP_MAC_INSTALL and DPLANE_OP_MAC_DELETE to populate 'external' Mac addresses.
This will require an additional flag to the mac table to signify the provenance, in addition to dynamic and static.

TODO:

  • ping from namespace to grout only works when grout pings first (arp resolution issue) (see "smoke/l2_bridge_test.sh" +61
  • Add IPv6 tests

Summary by CodeRabbit

Release Notes

  • New Features
    • L2 bridge domain creation and management with configurable aging and MAC count limits
    • Bridge member interface management for multi-port Layer 2 switching
    • MAC table operations including dynamic/static entry management with automatic aging
    • CLI commands for bridge and MAC address configuration and monitoring
    • DPDK timer library support

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive Layer 2 bridge functionality to the system. Changes include: adding bridge interface type (GR_IFACE_TYPE_BRIDGE) and L2 bridge mode to the interface subsystem; separating vrf_id and domain_id fields in the interface base structure; defining new L2 API primitives for bridge domains, members, and MAC table management; implementing bridge lifecycle, configuration, and member management in the control plane; adding MAC address learning, lookup, and aging in the MAC table subsystem; implementing L2 forwarding with MAC learning and flooding in the datapath; exposing CLI commands for bridge and MAC management; and providing a smoke test for L2 bridge operations.

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.35% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately reflects the main change—adding L2 bridge functionality with comprehensive API, datapath, CLI, and control components.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/datapath/eth_output.c (1)

94-103: Critical: Label placement causes null pointer dereferences and uninitialized memory access.

Moving the next: label before the tracing block introduces two critical bugs when a traced mbuf hits an error path:

  1. Uninitialized/NULL eth access (lines 97-99): All error paths (goto next at lines 57, 61, 66, 74, 79, 84) jump here before eth is initialized at line 81, causing undefined behavior.

  2. NULL priv->iface dereference (line 101): When jumping from line 57 after iface_from_id() returns NULL, priv->iface->id dereferences NULL.

Apply this diff to move the label after the tracing block:

 		stats->tx_packets += 1;
 		stats->tx_bytes += rte_pktmbuf_pkt_len(mbuf);
 
-next:
 		if (gr_mbuf_is_traced(mbuf)) {
 			struct eth_trace_data *t = gr_mbuf_trace_add(mbuf, node, sizeof(*t));
 			t->eth.dst_addr = eth->dst_addr;
 			t->eth.src_addr = eth->src_addr;
 			t->eth.ether_type = vlan ? vlan->eth_proto : eth->ether_type;
 			t->vlan_id = rte_be_to_cpu_16(vlan ? vlan->vlan_tci : 0);
 			t->iface_id = priv->iface->id;
 		}
+next:
 		rte_node_enqueue_x1(graph, node, edge, mbuf);
 	}
🧹 Nitpick comments (8)
modules/infra/cli/iface.c (1)

276-276: Optional: Remove obsolete comment.

The // vrf comment appears to be a leftover from when VRF display was separate. Since VRF info is now handled in the switch statement above, this comment can be removed.

smoke/l2_bridge_test.sh (1)

18-20: Avoid hard‑coding BRIDGE_ID=1

Assuming bridge ID 1 will break if any bridge already exists. Prefer deriving the ID from grcli bridge add/bridge list output (e.g., parsing the created ID) instead of hard‑coding.

modules/infra/api/gr_infra.h (1)

56-73: Clarify vrf_id/domain_id semantics after union removal

Splitting the old union into separate vrf_id and domain_id fields changes the base struct layout and removes the strong “alias” semantics, while GR_IFACE_SET_VRF and GR_IFACE_SET_DOMAIN still share the same bit. That’s fine if only one of the two is ever meaningful for a given iface, but it’s worth:

  • Documenting that only one of vrf_id/domain_id should be set/used at a time.
  • Updating the domain_id comment (currently “L2 xconnect peer interface id”) to also cover bridge domain usage.

No functional issue, just tightening the API contract and docs.

modules/l2/cli/mac.c (1)

20-95: Zero‑init MAC add/del requests for future‑proofing

gr_l2_mac_add_req req; and gr_l2_mac_del_req req; are left uninitialized and then have only known fields filled in. That works with the current struct layouts, but it’s safer and clearer to zero‑init them (like you already do for list/flush) in case new fields are added later.

For example:

-static cmd_status_t mac_add(struct gr_api_client *c, const struct ec_pnode *p) {
-	struct gr_l2_mac_add_req req;
+static cmd_status_t mac_add(struct gr_api_client *c, const struct ec_pnode *p) {
+	struct gr_l2_mac_add_req req = {0};
@@
-static cmd_status_t mac_del(struct gr_api_client *c, const struct ec_pnode *p) {
-	struct gr_l2_mac_del_req req;
+static cmd_status_t mac_del(struct gr_api_client *c, const struct ec_pnode *p) {
+	struct gr_l2_mac_del_req req = {0};

Otherwise the CLI logic looks solid.

modules/l2/cli/bridge.c (1)

81-283: Consistently zero‑init bridge request structs

Several request structs are defined without initialization:

  • struct gr_l2_bridge_del_req req;
  • struct gr_l2_bridge_get_req req;
  • struct gr_l2_bridge_member_add_req req;
  • struct gr_l2_bridge_member_del_req req;
  • struct gr_l2_bridge_member_list_req req;

and then only known fields are assigned. This works today but is slightly fragile if those structs gain new fields.

Consider aligning them with the patterns you already use elsewhere (e.g., bridge_add, mac_list, mac_flush) by zero‑initializing:

-static cmd_status_t bridge_del(struct gr_api_client *c, const struct ec_pnode *p) {
-	struct gr_l2_bridge_del_req req;
+static cmd_status_t bridge_del(struct gr_api_client *c, const struct ec_pnode *p) {
+	struct gr_l2_bridge_del_req req = {0};
@@
-static cmd_status_t bridge_show_cmd(struct gr_api_client *c, const struct ec_pnode *p) {
-	struct gr_l2_bridge_get_req req;
+static cmd_status_t bridge_show_cmd(struct gr_api_client *c, const struct ec_pnode *p) {
+	struct gr_l2_bridge_get_req req = {0};
@@
-	struct gr_l2_bridge_member_add_req req;
+	struct gr_l2_bridge_member_add_req req = {0};
@@
-	struct gr_l2_bridge_member_del_req req;
+	struct gr_l2_bridge_member_del_req req = {0};
@@
-	struct gr_l2_bridge_member_list_req req;
+	struct gr_l2_bridge_member_list_req req = {0};

Purely a robustness/readability tweak.

modules/l2/control/mac_table.c (2)

42-48: max_count is effectively decoupled from bridge config’s max_mac_count

bridge_mac_table.max_count is set from the entries field in create_bridge_mac_table(), and mac_entry_add() enforces that limit. However, create_bridge_mac_table() is only ever called with max_entries == 0, so you always fall back to the hardcoded default of 1024 entries.

Given that struct gr_l2_bridge_config also has a max_mac_count field (used in bridge_add/bridge_config_set), it looks like the intended per‑bridge capacity knob is currently ignored by the MAC table.

You may want to pass the configured max_mac_count when first creating the table (and/or adjust table->max_count when the bridge config changes) so that mac_entry_add() enforces the configured limit instead of a fixed default.

Also applies to: 70-88, 183-186


42-48: bridge_info.mac_count is never updated; API will always report 0

Based on modules/l2/control/bridge.c: bridge_to_api(), the public bridge API exposes bridge->mac_count, but this file only tracks occupancy in bridge_mac_table.count. No code here updates bridge_info.mac_count on add/delete/flush/aging.

If mac_count is intended to reflect the current number of MAC entries per bridge in API/CLI output, consider wiring it to table->count (or increment/decrement the bridge’s counter in mac_entry_add, mac_entry_del, mac_table_flush, and mac_aging_callback). Right now that field will remain at its initial value.

Also applies to: 99-104, 223-227, 277-281, 363-365, 461-462

modules/l2/api/gr_l2.h (1)

114-119: Clarify semantics for “0 = all bridges / interfaces” in MAC flush

gr_l2_mac_flush_req documents bridge_id == 0 and iface_id == 0 as wildcards. Ensure the control‑plane handler and mac_table_flush() implementation actually implement this behavior, or tighten the comment if the wildcard is handled at a higher level.

No change strictly required here, but worth double‑checking for consistency.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ab85f5 and 546c4c8.

⛔ Files ignored due to path filters (1)
  • docs/graph.svg is excluded by !**/*.svg
📒 Files selected for processing (21)
  • meson.build (1 hunks)
  • modules/infra/api/gr_infra.h (5 hunks)
  • modules/infra/cli/iface.c (1 hunks)
  • modules/infra/datapath/eth_output.c (1 hunks)
  • modules/l2/api/gr_l2.h (1 hunks)
  • modules/l2/api/meson.build (1 hunks)
  • modules/l2/cli/bridge.c (1 hunks)
  • modules/l2/cli/mac.c (1 hunks)
  • modules/l2/cli/meson.build (1 hunks)
  • modules/l2/control/api_handlers.c (1 hunks)
  • modules/l2/control/bridge.c (1 hunks)
  • modules/l2/control/bridge_iface.c (1 hunks)
  • modules/l2/control/gr_l2_control.h (1 hunks)
  • modules/l2/control/mac_table.c (1 hunks)
  • modules/l2/control/meson.build (1 hunks)
  • modules/l2/datapath/l2_bridge.c (1 hunks)
  • modules/l2/datapath/l2_flood.c (1 hunks)
  • modules/l2/datapath/meson.build (1 hunks)
  • modules/l2/meson.build (1 hunks)
  • modules/meson.build (1 hunks)
  • smoke/l2_bridge_test.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Don't suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/l2/control/api_handlers.c
  • modules/l2/datapath/l2_flood.c
  • modules/l2/cli/bridge.c
  • modules/infra/api/gr_infra.h
  • modules/l2/cli/mac.c
  • modules/l2/control/gr_l2_control.h
  • modules/l2/datapath/l2_bridge.c
  • modules/l2/control/bridge_iface.c
  • modules/infra/cli/iface.c
  • modules/l2/control/bridge.c
  • modules/l2/control/mac_table.c
  • modules/l2/api/gr_l2.h
  • modules/infra/datapath/eth_output.c
**/*.sh

⚙️ CodeRabbit configuration file

**/*.sh: - Don't bother about unquoted shell variables.

Files:

  • smoke/l2_bridge_test.sh
🧠 Learnings (5)
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: In this project, cli_src is a global variable that gets populated by individual modules (like modules/policy/cli/meson.build) and is ultimately consumed by an executable() call in the top-level meson.build file. This creates a modular CLI build where each module contributes its CLI sources to the main CLI binary.

Applied to files:

  • modules/l2/meson.build
  • modules/l2/control/meson.build
  • modules/l2/cli/meson.build
  • modules/meson.build
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: The grout project uses a modular CLI build system where individual modules contribute their CLI sources to a global cli_src variable via `cli_src += files(...)`, and the top-level meson.build file builds the final grcli executable using all collected CLI sources. The grcli executable is defined in the top-level meson.build at lines 122-127.

Applied to files:

  • modules/l2/cli/meson.build
📚 Learning: 2025-11-05T14:28:28.817Z
Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

  • modules/l2/datapath/l2_flood.c
  • modules/l2/datapath/l2_bridge.c
  • modules/l2/control/bridge.c
  • modules/l2/api/gr_l2.h
  • modules/infra/datapath/eth_output.c
📚 Learning: 2025-11-03T13:28:19.489Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

  • modules/l2/datapath/l2_flood.c
  • modules/l2/datapath/l2_bridge.c
  • modules/infra/datapath/eth_output.c
📚 Learning: 2025-11-05T13:55:26.189Z
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 372
File: smoke/cross_vrf_forward_test.sh:18-18
Timestamp: 2025-11-05T13:55:26.189Z
Learning: In the DPDK/grout codebase, VRF interfaces (named gr-vrf<id>) are automatically created when an interface is added to a non-existing VRF using port_add. The VRF creation is handled automatically by the event system in vrf_netlink.c, so no explicit VRF interface creation commands are needed in test scripts.

Applied to files:

  • modules/infra/api/gr_infra.h
  • modules/l2/control/bridge_iface.c
  • modules/l2/control/bridge.c
🧬 Code graph analysis (8)
modules/l2/control/api_handlers.c (4)
modules/l2/control/bridge.c (8)
  • bridge_add (97-142)
  • bridge_get (79-83)
  • bridge_to_api (291-298)
  • bridge_get_next (301-314)
  • bridge_member_add (184-223)
  • bridge_member_del (225-259)
  • bridge_config_get (277-289)
  • bridge_config_set (261-275)
main/gr_module.h (1)
  • api_out (20-23)
main/api.c (1)
  • api_send (242-258)
modules/l2/control/mac_table.c (5)
  • mac_entry_add (161-236)
  • mac_entry_del (238-286)
  • mac_entry_get_next (398-412)
  • mac_entry_to_api (383-396)
  • mac_table_flush (322-381)
modules/l2/datapath/l2_flood.c (4)
modules/l2/control/bridge.c (1)
  • bridge_get (79-83)
modules/infra/control/gr_iface.h (1)
  • iface_get_stats (102-104)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (592-619)
modules/l2/cli/bridge.c (8)
modules/l2/control/bridge.c (4)
  • bridge_add (97-142)
  • bridge_del (144-182)
  • bridge_member_add (184-223)
  • bridge_member_del (225-259)
cli/ecoli.c (2)
  • arg_str (99-109)
  • with_help (15-24)
cli/gr_cli.h (1)
  • arg_u32 (72-78)
api/gr_api.h (1)
  • gr_api_client_send_recv (41-52)
cli/table.c (1)
  • scols_line_sprintf (10-23)
modules/infra/cli/iface.c (2)
  • iface_from_name (93-106)
  • ctx_init (483-533)
modules/l2/cli/mac.c (1)
  • ctx_init (204-260)
cli/exec.c (1)
  • cli_context_register (16-18)
modules/l2/cli/mac.c (5)
cli/gr_cli.h (1)
  • arg_u32 (72-78)
cli/ecoli.c (2)
  • arg_eth_addr (164-178)
  • arg_str (99-109)
modules/infra/cli/iface.c (2)
  • iface_from_name (93-106)
  • iface_from_id (108-116)
api/gr_api.h (1)
  • gr_api_client_send_recv (41-52)
cli/table.c (1)
  • scols_line_sprintf (10-23)
smoke/l2_bridge_test.sh (2)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
smoke/_init.sh (1)
  • netns_add (113-121)
modules/l2/control/gr_l2_control.h (3)
modules/l2/control/bridge.c (11)
  • bridge_get (79-83)
  • bridge_get_by_name (85-95)
  • bridge_add (97-142)
  • bridge_del (144-182)
  • bridge_member_add (184-223)
  • bridge_member_del (225-259)
  • bridge_config_set (261-275)
  • bridge_config_get (277-289)
  • bridge_to_api (291-298)
  • bridge_get_next (301-314)
  • bridge_cleanup_interface_macs (317-343)
modules/l2/control/mac_table.c (9)
  • mac_table_fini (139-151)
  • mac_entry_add (161-236)
  • mac_entry_del (238-286)
  • mac_entry_lookup (288-320)
  • mac_table_flush (322-381)
  • mac_entry_to_api (383-396)
  • mac_entry_get_next (398-412)
  • mac_aging_timer_start (483-503)
  • mac_aging_timer_stop (505-512)
modules/l2/control/bridge_iface.c (3)
  • bridge_iface_create (166-202)
  • bridge_iface_destroy (204-222)
  • bridge_get_iface (224-231)
modules/l2/datapath/l2_bridge.c (9)
modules/infra/control/gr_iface.h (1)
  • iface (16-22)
modules/l2/control/bridge.c (1)
  • bridge_get (79-83)
modules/l2/control/mac_table.c (2)
  • mac_entry_lookup (288-320)
  • mac_entry_add (161-236)
modules/l2/control/bridge_iface.c (1)
  • bridge_get_iface (224-231)
modules/infra/control/iface.c (1)
  • iface_get_eth_addr (259-272)
modules/infra/datapath/gr_mbuf.h (1)
  • gr_mbuf_is_traced (52-54)
modules/infra/datapath/trace.c (1)
  • gr_mbuf_trace_add (592-619)
modules/infra/control/gr_graph.h (1)
  • rte_node_enqueue_x1 (22-25)
modules/infra/datapath/eth_output.c (1)
  • eth_output_register_interface_type (27-34)
modules/l2/control/bridge.c (4)
api/gr_errno.h (1)
  • errno_set (9-12)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
modules/infra/cli/iface.c (1)
  • iface_from_id (108-116)
modules/l2/control/mac_table.c (1)
  • mac_table_flush (322-381)
🪛 Shellcheck (0.11.0)
smoke/l2_bridge_test.sh

[warning] 5-5: Quote this to prevent word splitting.

(SC2046)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-cross-aarch64
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (19)
meson.build (1)

69-69: LGTM!

Adding the timer library to support MAC aging functionality is appropriate for the L2 bridge implementation.

modules/meson.build (1)

8-8: LGTM!

Standard subdir inclusion for the new L2 module.

modules/l2/cli/meson.build (1)

4-8: LGTM!

Standard CLI source registration pattern. Based on learnings, the modular CLI build system is correctly used here.

modules/l2/meson.build (1)

4-7: LGTM!

Standard submodule wiring for the L2 components.

modules/l2/control/meson.build (1)

4-10: LGTM!

Standard source file registration for L2 control components.

modules/l2/datapath/meson.build (1)

4-8: LGTM!

Standard source file registration for L2 datapath nodes.

modules/l2/api/meson.build (1)

4-5: LGTM!

Standard API header publication pattern.

modules/infra/cli/iface.c (1)

256-273: LGTM!

The conditional display logic correctly shows vrf_id for L3 mode and domain_id for L2_BRIDGE mode in the DOMAIN column. The switch statement handles all cases appropriately.

modules/l2/datapath/l2_flood.c (1)

32-149: Flood datapath logic looks sound

Per‑packet flooding, mbuf ownership (first dest vs clones), DROP path, TX stats, and trace filling are consistent and safe. No issues spotted.

modules/infra/api/gr_infra.h (1)

387-423: Enum→string helpers correctly extended

The new GR_IFACE_TYPE_BRIDGE and GR_IFACE_MODE_L2_BRIDGE cases in gr_iface_type_name() / gr_iface_mode_name() look correct and keep the default fallback as‑is.

modules/l2/cli/mac.c (1)

97-201: MAC list/flush CLI and output look correct

The list/flush paths correctly honor the optional BRIDGE_ID/IFACE/dynamic_only filters, resolve iface names safely, and format the libscols table sensibly. No issues spotted.

modules/l2/cli/bridge.c (1)

20-170: Bridge CLI and iface‑type wiring look coherent

Bridge domain add/del/show/list, member management, and bridge‑iface creation are wired cleanly through the GR_L2/GR_INFRA APIs, and the iface‑type hooks (bridge_show, bridge_list_info, register_iface_type) are consistent with the rest of the CLI framework. No functional problems noticed.

Also applies to: 285-441

modules/l2/control/bridge.c (5)

184-223: bridge_member_add logic looks solid

Membership checks (within the same bridge and across other bridges) and the updates to iface->mode and iface->domain_id are consistent with a single‑bridge membership model. No further issues spotted here.


261-289: Config get/set are straightforward

bridge_config_set/bridge_config_get validate arguments, reuse bridge_get, and operate on the stored gr_l2_bridge_config as expected. Behavior is clear and consistent with the API.


291-298: bridge_to_api: copy behavior is acceptable

Mapping internal bridge_info to gr_l2_bridge is direct and uses safe bounds for name and counters. Given the earlier name length checks, using strncpy without explicit NUL termination on the destination is still safe here.


316-343: MAC cleanup helper looks correct and efficient

Iterating active bridges and delegating to mac_table_flush(bridge_id, iface_id, false) matches the intended “remove all MACs for this iface on all bridges” behavior, and the aggregated logging is useful without being noisy.


345-365: Module wiring is consistent with the rest of the system

Module registration, dependency on "iface", and init/fini hooks look aligned, aside from the earlier bridge_init errno nit. The extra mac_table_fini() in bridge_module_fini matches the ownership implied by this module.

modules/l2/api/gr_l2.h (2)

24-141: Overall bridge/MAC API layout looks coherent

The bridge config, bridge descriptors, member and MAC request/response structs, and iface‑info base all line up cleanly with the control‑plane code and the described operations (add/del/list, member mgmt, MAC add/del/flush, config get/set). Field types and sizes look appropriate for the intended ranges.


18-21: This review comment is incorrect.

The project uses -std=gnu2x, which supports C23 fixed-width enums. The syntax enum : uint8_t is valid C23, not C++ syntax. This pattern is already used in the same codebase (e.g., modules/srv6/api/gr_srv6.h), and the code will compile successfully without modification.

Likely an incorrect or invalid review comment.

Comment on lines 16 to 30
static struct api_out l2_bridge_add_api(const void *request, struct api_ctx *ctx __rte_unused) {
const struct gr_l2_bridge_add_req *req = request;
struct bridge_info *bridge;
struct gr_l2_bridge *resp;
uint16_t bridge_id;

bridge_id = bridge_add(req->name, &req->config);
if (bridge_id == 0)
return api_out(errno, 0, NULL);

bridge = bridge_get(bridge_id);
if (bridge == NULL)
return api_out(ENOENT, 0, NULL);

resp = malloc(sizeof(*resp));
if (resp == NULL)
return api_out(ENOMEM, 0, NULL);

bridge_to_api(resp, bridge);
return api_out(0, sizeof(*resp), resp);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

l2_bridge_add_api may return bogus status because bridge_add doesn’t set errno

bridge_add() returns 0 on several failures (invalid name, duplicate name, alloc failure) but, per the bridge.c snippet, does not call errno_set on those paths. Here you translate bridge_id == 0 into api_out(errno, ...), which can end up sending status == 0 or some unrelated old errno even though the operation failed.

I’d recommend tightening this up by either:

  • Making bridge_add() consistently use errno_set(...) on every failure path and documenting the mapping, then keeping this handler as-is, or
  • Mapping bridge_id == 0 to an explicit error code here (e.g., EINVAL or a more specific status once bridge_add distinguishes reasons), rather than blindly forwarding errno.

Right now clients can’t reliably tell why a bridge add failed, and in some cases will see a “success” status with no payload.

🤖 Prompt for AI Agents
In modules/l2/control/api_handlers.c around lines 16-36, the handler currently
returns api_out(errno,...) when bridge_add() returns 0 but bridge_add() does not
set errno, so callers can receive a bogus status; change the handler to map a
zero return to an explicit error code instead of forwarding errno (for example
return api_out(EINVAL, 0, NULL) or a more specific code if you can detect
duplicate/invalid/name/alloc conditions), and add a brief TODO comment to switch
to propagated errno once bridge_add() is updated to set errno on all failure
paths.

Comment on lines +39 to +50
static int bridge_init(void) {
bridge_name_hash = rte_hash_create(&bridge_name_hash_params);
if (bridge_name_hash == NULL) {
LOG(ERR, "Failed to create bridge name hash table: %s", rte_strerror(rte_errno));
return -rte_errno;
}

// Initialize bridge table
memset(bridges, 0, sizeof(bridges));

return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

bridge_init doesn’t set errno on failure

bridge_init returns -rte_errno but never sets errno, yet bridge_module_init later prints strerror(errno). This can yield a misleading message.

Consider using the common helper so errno is consistent:

- bridge_name_hash = rte_hash_create(&bridge_name_hash_params);
- if (bridge_name_hash == NULL) {
-     LOG(ERR, "Failed to create bridge name hash table: %s", rte_strerror(rte_errno));
-     return -rte_errno;
- }
+ bridge_name_hash = rte_hash_create(&bridge_name_hash_params);
+ if (bridge_name_hash == NULL) {
+     int err = rte_errno;
+     LOG(ERR, "Failed to create bridge name hash table: %s", rte_strerror(err));
+     return errno_set(err);
+ }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In modules/l2/control/bridge.c around lines 39-50, bridge_init returns
-rte_errno on failure but does not set errno, causing later strerror(errno) to
be misleading; fix by making errno consistent before returning — e.g., convert
or assign the DPDK rte_errno into the global errno (or call the project’s common
helper that maps/sets errno from rte_errno), then log/return as before so
bridge_module_init’s strerror(errno) reports the correct error.

Comment on lines +85 to +95
struct bridge_info *bridge_get_by_name(const char *name) {
void *data;
int ret;

ret = rte_hash_lookup_data(bridge_name_hash, name, &data);
if (ret < 0)
return NULL;

uint16_t bridge_id = (uintptr_t)data;
return bridge_get(bridge_id);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent hash key usage for bridge names (potential leak and delete failure)

bridge_add inserts into bridge_name_hash using the raw name pointer as key, while bridge_del deletes using bridge->name. The byte contents of these 32‑byte keys can differ (e.g., different padding after the NUL), so rte_hash_del_key() may fail and leave stale entries. It also means you depend on callers passing a buffer of at least GR_L2_BRIDGE_NAME_SIZE bytes to bridge_add, which isn’t enforced by the API.

Recommend canonicalizing keys via bridge->name everywhere and zero‑padding for lookups:

- // Add to name hash table
- ret = rte_hash_add_key_data(bridge_name_hash, name, (void *)(uintptr_t)bridge_id);
+ // Add to name hash table using canonical, zero‑padded name
+ ret = rte_hash_add_key_data(bridge_name_hash,
+                             bridge->name,
+                             (void *)(uintptr_t)bridge_id);

And in bridge_get_by_name:

- ret = rte_hash_lookup_data(bridge_name_hash, name, &data);
+ char key[GR_L2_BRIDGE_NAME_SIZE] = {0};
+ strncpy(key, name, sizeof(key) - 1);
+ ret = rte_hash_lookup_data(bridge_name_hash, key, &data);

Deletion already uses bridge->name, so after this change add/lookup/del all share the same canonical key. This also avoids reading past the end of arbitrary C strings.

Also applies to: 133-139, 159-165

🤖 Prompt for AI Agents
In modules/l2/control/bridge.c around lines 85-95 (and also apply same pattern
at 133-139 and 159-165), the hash is being looked up/inserted with raw caller
pointers while deletion uses bridge->name, which can differ in padding and cause
failed deletes/stale entries; fix by canonicalizing the hash key to always use
the bridge->name storage and by performing lookups with a fixed-size,
zero-filled key buffer of GR_L2_BRIDGE_NAME_SIZE: when adding, insert using
bridge->name (not the caller pointer); when looking up (bridge_get_by_name)
allocate a local char key[GR_L2_BRIDGE_NAME_SIZE] = {0}, strncpy or memcpy up to
GR_L2_BRIDGE_NAME_SIZE-1 from the provided name, then call rte_hash_lookup_data
with that key; ensure deletions already use bridge->name so add/lookup/del use
the exact same canonical, zero-padded key to avoid mismatches and out‑of‑bounds
reads.

Comment on lines +144 to +176
int bridge_del(uint16_t bridge_id) {
struct bridge_info *bridge;
int ret;

bridge = bridge_get(bridge_id);
if (bridge == NULL)
return errno_set(ENOENT);

// Remove all members first
while (gr_vec_len(bridge->members) > 0) {
uint16_t iface_id = bridge->members[0];
bridge_member_del(bridge_id, iface_id);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

bridge_del can loop forever if iface_from_id() fails in bridge_member_del

bridge_del repeatedly calls bridge_member_del(bridge_id, bridge->members[0]) until gr_vec_len(bridge->members) == 0. In bridge_member_del, if iface_from_id(iface_id) returns NULL, the function returns errno_set(ENODEV) without modifying bridge->members. Since bridge_del ignores the return value, this results in an infinite loop on that first element.

You should ensure that membership is removed from the vector even when the iface object no longer exists. For example:

 int bridge_member_del(uint16_t bridge_id, uint16_t iface_id) {
     struct bridge_info *bridge;
     struct iface *iface;
     uint16_t *member;
     size_t index = 0;

     bridge = bridge_get(bridge_id);
     if (bridge == NULL)
         return errno_set(ENOENT);

-    iface = iface_from_id(iface_id);
-    if (iface == NULL)
-        return errno_set(ENODEV);
+    iface = iface_from_id(iface_id);
@@
-    gr_vec_foreach_ref (member, bridge->members) {
+    gr_vec_foreach_ref (member, bridge->members) {
         if (*member == iface_id) {
             gr_vec_del(bridge->members, index);
-
-            // Reset interface mode to L3
-            iface->mode = GR_IFACE_MODE_L3;
-            iface->domain_id = 0;
+            // Reset interface mode only if iface still exists
+            if (iface != NULL) {
+                iface->mode = GR_IFACE_MODE_L3;
+                iface->domain_id = 0;
+            }
@@
-    return errno_set(ENOENT);
+    return errno_set(ENOENT);
 }

Alternatively, bridge_del could directly drop bridge->members without going through bridge_member_del, but the key point is to avoid a loop where the member list never shrinks.

Also applies to: 225-259

🤖 Prompt for AI Agents
In modules/l2/control/bridge.c around lines 144-156 (and similarly 225-259),
bridge_del currently loops calling bridge_member_del(bridge_id,
bridge->members[0]) until the member vector is empty but ignores
bridge_member_del's return value; if iface_from_id() fails inside
bridge_member_del the member isn't removed and bridge_del can loop forever. Fix
by ensuring the member is removed from bridge->members even when
bridge_member_del fails: call bridge_member_del and if it returns non-zero,
explicitly remove the element at index 0 from the vector (or otherwise drop the
entire members vector) before continuing; apply the same change to the other
affected range so the loop cannot hang when iface objects are missing.

Comment on lines +50 to +49
static struct bridge_mac_table mac_tables[GR_MAX_BRIDGE_DOMAINS];
static struct event *aging_event;
static struct event_base *ev_base;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Shared mac_tables state is unsynchronized; verify single-threaded usage

All the MAC table operations (mac_entry_add/del/lookup, mac_table_flush, and mac_aging_callback) mutate the same mac_tables[bridge_id] state (hash, entries list, count) with no visible synchronization.

If these functions can be called from multiple threads/lcores (e.g., datapath learning + control-plane CLI + aging timer), there’s a real risk of races and list/hash corruption.

If the design guarantees that:

  • All callers run on a single thread, or
  • Access is externally serialized,

it would be good to document that assumption clearly. Otherwise, consider adding per‑bridge locking or another concurrency mechanism around bridge_mac_table mutations so aging, flush, add, and delete can’t interleave unsafely.

Also applies to: 161-236, 238-286, 322-381, 414-481



---

🏁 Script executed:

```shell
#!/bin/bash
# Examine the full mac_table.c to understand function structure and synchronization
cat modules/l2/control/mac_table.c | head -100

Length of output: 2402


🏁 Script executed:

#!/bin/bash
# Check lines mentioned in review for mutation patterns
sed -n '50,52p; 161,236p; 238,286p; 322,381p; 414,481p' modules/l2/control/mac_table.c | grep -A2 -B2 'lock\|mutex\|rcu\|atomic\|spinlock' || echo "No sync primitives found in those ranges"

Length of output: 40


🏁 Script executed:

#!/bin/bash
# Search for where mac_tables is accessed/modified
rg -n 'mac_tables\[' modules/l2/control/mac_table.c -A2 -B2

Length of output: 1361


🏁 Script executed:

#!/bin/bash
# Check for calls to the functions mentioned
rg -n 'mac_entry_add|mac_entry_del|mac_table_flush|mac_aging_callback' modules/l2/ --type=c

Length of output: 1266


🏁 Script executed:

#!/bin/bash
# Look for threading/locking patterns elsewhere in l2 module
find modules/l2 -name '*.c' -o -name '*.h' | xargs grep -l 'pthread\|rte_spinlock\|rte_rwlock\|rte_atomic' | head -5

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Check l2_bridge.c to see execution context of mac_entry_add calls
rg -n 'mac_entry_add' modules/l2/datapath/l2_bridge.c -B10 -A2 | head -40

Length of output: 626


🏁 Script executed:

#!/bin/bash
# Look for threading model or worker thread documentation
grep -r 'single.*thread\|serialize\|lcore' modules/l2/control/ modules/l2/datapath/ --include='*.c' --include='*.h' | grep -i 'comment\|//\|/\*' | head -10

Length of output: 0


🏁 Script executed:

#!/bac
# Check how api_handlers are called (control plane threading)
rg -n 'api_handlers\|mac_entry_add' modules/l2/control/api_handlers.c -B5 | head -30

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Check header for any documented threading guarantee
cat modules/l2/control/gr_l2_control.h | grep -A15 'mac_entry_add\|mac_entry_del\|mac_table_flush'

Length of output: 820


🏁 Script executed:

#!/bin/bash
# Look for event_base initialization and callback context
rg -n 'event_base\|EV_PERSIST\|mac_aging_callback' modules/l2/control/mac_table.c -A3 -B3 | tail -40

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if there's a main event loop or worker model that might serialize
rg -n 'event_base_dispatch\|event_loop\|worker.*thread' modules/l2/ --type=c -B2 -A2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Full inspection of destroy and aging callback to understand mutation scope
sed -n '108,160p; 415,480p' modules/l2/control/mac_table.c

Length of output: 2604


Unsynchronized entries list and count confirmed; races between add/del/aging

The table->entries linked list and table->count field are unprotected. While the hash table uses lock-free concurrency, the list mutations in mac_entry_add (lines 161–236), mac_entry_del (238–286), mac_table_flush (322–381), and especially mac_aging_callback (415–480) can race when called from multiple contexts (datapath rx, control plane API, aging timer).

The aging callback specifically walks and removes entries without locking (lines 431–475), risking corruption if add/del execute concurrently. Add per-bridge synchronization (lock or RCU) around entries list and count mutations.

🤖 Prompt for AI Agents
In modules/l2/control/mac_table.c around lines 50-52, the mac_tables array
currently exposes unsynchronized mutations: the table->entries linked list and
table->count are accessed/modified without protection, causing races between
mac_entry_add (lines ~161–236), mac_entry_del (238–286), mac_table_flush
(322–381) and mac_aging_callback (415–480). Fix by adding a per-bridge
synchronization primitive (e.g., a pthread_mutex_t or rwlock in struct
bridge_mac_table) and initialize/destroy it when creating/tearing down tables;
then take the lock in write mode for all mutations (mac_entry_add,
mac_entry_del, mac_table_flush and any removal inside mac_aging_callback) and in
read mode (or drop to shared lock) for pure walkers; ensure the aging timer
callback acquires the same lock before traversing/removing entries and update
reference/count modifications under the lock to prevent list corruption and
stale counts.

Comment on lines 27 to 34
struct l2_bridge_trace {
uint16_t bridge_id;
uint16_t src_iface;
uint16_t dst_iface;
struct rte_ether_addr src_mac;
struct rte_ether_addr dst_mac;
uint8_t action; // 0=learn, 1=forward, 2=flood, 3=to_bridge_iface, 4=drop
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Trace action encoding for DROP is inconsistent with formatter

trace->action uses 0–4 with 4 documented as DROP, but l2_bridge_trace_format only defines 4 strings and indexes them with t->action % 4. As a result, DROP (4) is formatted as "learn", and the "learn" state (0) is effectively unused.

Suggest aligning the encoding with the formatter, e.g.:

-struct l2_bridge_trace {
+struct l2_bridge_trace {
@@
-    uint8_t action; // 0=learn, 1=forward, 2=flood, 3=to_bridge_iface, 4=drop
+    uint8_t action; // 0=learn, 1=forward, 2=flood, 3=to_bridge_iface, 4=drop
@@
-static int l2_bridge_trace_format(char *buf, size_t len, const void *data, size_t data_len) {
-    const char *actions[] = {"learn", "forward", "flood", "to_bridge"};
+static int l2_bridge_trace_format(char *buf, size_t len, const void *data, size_t data_len) {
+    const char *actions[] = {"learn", "forward", "flood", "to_bridge_iface", "drop"};
@@
-        actions[t->action % 4],
+        actions[t->action],

This way DROP is rendered correctly and you still have room for a future “learn” state if you decide to use action == 0.

Also applies to: 147-156, 165-183

Comment on lines +53 to +59
// Get bridge information
bridge = bridge_get(bridge_id);
if (bridge == NULL) {
edge = DROP;
goto next;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Uninitialized eth used in trace path when bridge_get fails

If bridge_get(bridge_id) returns NULL, execution jumps to next and the trace block still dereferences eth, which has never been assigned in that path. That’s undefined behavior and can corrupt traces or crash under tracing.

Consider ensuring eth is always initialized before any goto next that can reach the trace block, e.g. by moving the header extraction earlier:

-        mbuf = objs[i];
-        iface = (struct iface *)mbuf_data(mbuf)->iface;
-        bridge_id = iface->domain_id;
-
-        // Get bridge information
-        bridge = bridge_get(bridge_id);
+        mbuf = objs[i];
+        eth = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);
+        iface = (struct iface *)mbuf_data(mbuf)->iface;
+        bridge_id = iface->domain_id;
+
+        // Get bridge information
+        bridge = bridge_get(bridge_id);
@@
-        eth = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);

Alternatively, you could skip filling MAC fields in the trace when bridge == NULL, but the above keeps tracing consistent.

Also applies to: 135-157

🤖 Prompt for AI Agents
In modules/l2/datapath/l2_bridge.c around lines 53-59 (and similarly at
135-157), the variable `eth` can remain uninitialized when bridge_get(bridge_id)
returns NULL and execution jumps to the trace path, causing undefined behavior;
ensure `eth` is assigned before any goto that can reach the trace block (for
example, move the header extraction/eth = ... up before calling bridge_get or
otherwise initialize `eth` to a safe default), or guard the trace MAC field
population so it is skipped when bridge == NULL; update both occurrences so the
trace code never dereferences an uninitialized `eth`.

@christophefontaine christophefontaine marked this pull request as draft November 24, 2025 09:38
@christophefontaine christophefontaine force-pushed the l2_switch branch 3 times, most recently from a2b6fcb to a70c7c5 Compare November 24, 2025 12:42
@christophefontaine christophefontaine force-pushed the l2_switch branch 7 times, most recently from df3b54f to 8aa1e24 Compare November 25, 2025 15:41
@david-marchand
Copy link
Member

Good catch with the new trace copy helper...

I wonder if a more global helper for creating a packet copy should be added.
It would be responsible for copying the mbuf data, grout metadata and the traces.

This will avoid missing copying those.

@christophefontaine
Copy link
Collaborator Author

@david-marchand yes, definitely !
I'll push this patch in to a dedicated PR.

@christophefontaine christophefontaine force-pushed the l2_switch branch 10 times, most recently from ca03e32 to 576f560 Compare November 27, 2025 21:10
@christophefontaine christophefontaine force-pushed the l2_switch branch 6 times, most recently from 479653b to f543d56 Compare December 4, 2025 09:00
@christophefontaine christophefontaine force-pushed the l2_switch branch 4 times, most recently from e1c00b1 to b73345f Compare December 5, 2025 14:12
Bridge interfaces belong to both a VRF and a bridge domain,
and so can't be part of a union.

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Grout is meant to be a router.  Yet, for some usecases (vxlan EVPN,
SRv6 L2 encap, ... ), it is required to be able to switch packets in
addition to route them.  Add a 'l2' module to add a mac learning switch
to grout.  Expiration is based on libevent.

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Add a new type of interface to be able to route the packets received on
the bridge.

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Add new events when a mac is added and removed to/from a bridge.

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Automate the creation of a bridge interface when a bridge is created.
Instead of generating an interface name based on the bridge id
"br%u", use the bridge name directly.

Update l2 bridge test to use the proper name.
Add a helper to create/delete a bridge.

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Add bridge interface type support to FRR, and populate the fields
"bridge_ifindex" and "master_ifindex" to create the proper hierarchy in
frr.

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Register the proper events when a mac is added/migrated/removed to a
bridge, and forward the info to zebra.

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants