-
Notifications
You must be signed in to change notification settings - Fork 23
module: add bridge #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
module: add bridge #411
Conversation
📝 WalkthroughWalkthroughThis pull request introduces comprehensive Layer 2 bridge functionality to the system. Changes include: adding bridge interface type (GR_IFACE_TYPE_BRIDGE) and L2 bridge mode to the interface subsystem; separating vrf_id and domain_id fields in the interface base structure; defining new L2 API primitives for bridge domains, members, and MAC table management; implementing bridge lifecycle, configuration, and member management in the control plane; adding MAC address learning, lookup, and aging in the MAC table subsystem; implementing L2 forwarding with MAC learning and flooding in the datapath; exposing CLI commands for bridge and MAC management; and providing a smoke test for L2 bridge operations. Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/infra/datapath/eth_output.c (1)
94-103: Critical: Label placement causes null pointer dereferences and uninitialized memory access.Moving the
next:label before the tracing block introduces two critical bugs when a traced mbuf hits an error path:
Uninitialized/NULL
ethaccess (lines 97-99): All error paths (goto nextat lines 57, 61, 66, 74, 79, 84) jump here beforeethis initialized at line 81, causing undefined behavior.NULL
priv->ifacedereference (line 101): When jumping from line 57 afteriface_from_id()returns NULL,priv->iface->iddereferences NULL.Apply this diff to move the label after the tracing block:
stats->tx_packets += 1; stats->tx_bytes += rte_pktmbuf_pkt_len(mbuf); -next: if (gr_mbuf_is_traced(mbuf)) { struct eth_trace_data *t = gr_mbuf_trace_add(mbuf, node, sizeof(*t)); t->eth.dst_addr = eth->dst_addr; t->eth.src_addr = eth->src_addr; t->eth.ether_type = vlan ? vlan->eth_proto : eth->ether_type; t->vlan_id = rte_be_to_cpu_16(vlan ? vlan->vlan_tci : 0); t->iface_id = priv->iface->id; } +next: rte_node_enqueue_x1(graph, node, edge, mbuf); }
🧹 Nitpick comments (8)
modules/infra/cli/iface.c (1)
276-276: Optional: Remove obsolete comment.The
// vrfcomment appears to be a leftover from when VRF display was separate. Since VRF info is now handled in the switch statement above, this comment can be removed.smoke/l2_bridge_test.sh (1)
18-20: Avoid hard‑codingBRIDGE_ID=1Assuming bridge ID 1 will break if any bridge already exists. Prefer deriving the ID from
grcli bridge add/bridge listoutput (e.g., parsing the created ID) instead of hard‑coding.modules/infra/api/gr_infra.h (1)
56-73: Clarifyvrf_id/domain_idsemantics after union removalSplitting the old union into separate
vrf_idanddomain_idfields changes the base struct layout and removes the strong “alias” semantics, whileGR_IFACE_SET_VRFandGR_IFACE_SET_DOMAINstill share the same bit. That’s fine if only one of the two is ever meaningful for a given iface, but it’s worth:
- Documenting that only one of
vrf_id/domain_idshould be set/used at a time.- Updating the
domain_idcomment (currently “L2 xconnect peer interface id”) to also cover bridge domain usage.No functional issue, just tightening the API contract and docs.
modules/l2/cli/mac.c (1)
20-95: Zero‑init MAC add/del requests for future‑proofing
gr_l2_mac_add_req req;andgr_l2_mac_del_req req;are left uninitialized and then have only known fields filled in. That works with the current struct layouts, but it’s safer and clearer to zero‑init them (like you already do for list/flush) in case new fields are added later.For example:
-static cmd_status_t mac_add(struct gr_api_client *c, const struct ec_pnode *p) { - struct gr_l2_mac_add_req req; +static cmd_status_t mac_add(struct gr_api_client *c, const struct ec_pnode *p) { + struct gr_l2_mac_add_req req = {0}; @@ -static cmd_status_t mac_del(struct gr_api_client *c, const struct ec_pnode *p) { - struct gr_l2_mac_del_req req; +static cmd_status_t mac_del(struct gr_api_client *c, const struct ec_pnode *p) { + struct gr_l2_mac_del_req req = {0};Otherwise the CLI logic looks solid.
modules/l2/cli/bridge.c (1)
81-283: Consistently zero‑init bridge request structsSeveral request structs are defined without initialization:
struct gr_l2_bridge_del_req req;struct gr_l2_bridge_get_req req;struct gr_l2_bridge_member_add_req req;struct gr_l2_bridge_member_del_req req;struct gr_l2_bridge_member_list_req req;and then only known fields are assigned. This works today but is slightly fragile if those structs gain new fields.
Consider aligning them with the patterns you already use elsewhere (e.g.,
bridge_add,mac_list,mac_flush) by zero‑initializing:-static cmd_status_t bridge_del(struct gr_api_client *c, const struct ec_pnode *p) { - struct gr_l2_bridge_del_req req; +static cmd_status_t bridge_del(struct gr_api_client *c, const struct ec_pnode *p) { + struct gr_l2_bridge_del_req req = {0}; @@ -static cmd_status_t bridge_show_cmd(struct gr_api_client *c, const struct ec_pnode *p) { - struct gr_l2_bridge_get_req req; +static cmd_status_t bridge_show_cmd(struct gr_api_client *c, const struct ec_pnode *p) { + struct gr_l2_bridge_get_req req = {0}; @@ - struct gr_l2_bridge_member_add_req req; + struct gr_l2_bridge_member_add_req req = {0}; @@ - struct gr_l2_bridge_member_del_req req; + struct gr_l2_bridge_member_del_req req = {0}; @@ - struct gr_l2_bridge_member_list_req req; + struct gr_l2_bridge_member_list_req req = {0};Purely a robustness/readability tweak.
modules/l2/control/mac_table.c (2)
42-48:max_countis effectively decoupled from bridge config’smax_mac_count
bridge_mac_table.max_countis set from theentriesfield increate_bridge_mac_table(), andmac_entry_add()enforces that limit. However,create_bridge_mac_table()is only ever called withmax_entries == 0, so you always fall back to the hardcoded default of 1024 entries.Given that
struct gr_l2_bridge_configalso has amax_mac_countfield (used inbridge_add/bridge_config_set), it looks like the intended per‑bridge capacity knob is currently ignored by the MAC table.You may want to pass the configured
max_mac_countwhen first creating the table (and/or adjusttable->max_countwhen the bridge config changes) so thatmac_entry_add()enforces the configured limit instead of a fixed default.Also applies to: 70-88, 183-186
42-48:bridge_info.mac_countis never updated; API will always report 0Based on
modules/l2/control/bridge.c: bridge_to_api(), the public bridge API exposesbridge->mac_count, but this file only tracks occupancy inbridge_mac_table.count. No code here updatesbridge_info.mac_counton add/delete/flush/aging.If
mac_countis intended to reflect the current number of MAC entries per bridge in API/CLI output, consider wiring it totable->count(or increment/decrement the bridge’s counter inmac_entry_add,mac_entry_del,mac_table_flush, andmac_aging_callback). Right now that field will remain at its initial value.Also applies to: 99-104, 223-227, 277-281, 363-365, 461-462
modules/l2/api/gr_l2.h (1)
114-119: Clarify semantics for “0 = all bridges / interfaces” in MAC flush
gr_l2_mac_flush_reqdocumentsbridge_id == 0andiface_id == 0as wildcards. Ensure the control‑plane handler andmac_table_flush()implementation actually implement this behavior, or tighten the comment if the wildcard is handled at a higher level.No change strictly required here, but worth double‑checking for consistency.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/graph.svgis excluded by!**/*.svg
📒 Files selected for processing (21)
meson.build(1 hunks)modules/infra/api/gr_infra.h(5 hunks)modules/infra/cli/iface.c(1 hunks)modules/infra/datapath/eth_output.c(1 hunks)modules/l2/api/gr_l2.h(1 hunks)modules/l2/api/meson.build(1 hunks)modules/l2/cli/bridge.c(1 hunks)modules/l2/cli/mac.c(1 hunks)modules/l2/cli/meson.build(1 hunks)modules/l2/control/api_handlers.c(1 hunks)modules/l2/control/bridge.c(1 hunks)modules/l2/control/bridge_iface.c(1 hunks)modules/l2/control/gr_l2_control.h(1 hunks)modules/l2/control/mac_table.c(1 hunks)modules/l2/control/meson.build(1 hunks)modules/l2/datapath/l2_bridge.c(1 hunks)modules/l2/datapath/l2_flood.c(1 hunks)modules/l2/datapath/meson.build(1 hunks)modules/l2/meson.build(1 hunks)modules/meson.build(1 hunks)smoke/l2_bridge_test.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{c,h}
⚙️ CodeRabbit configuration file
**/*.{c,h}: -gr_vec_*()functions cannot fail. No need to check their return value.
gr_vec_free(x)always setsx = NULL. There is no risk of double free.ec_node_*()functions consume theirec_nodearguments. No leaks on error.rte_node->ctxis an uint8_t array of size 16, not a pointer.- Don't suggest to replace
assert()with graceful error checking.- We compile with
-std=gnu2x. Unnamed parameters in function definitions are valid.
Files:
modules/l2/control/api_handlers.cmodules/l2/datapath/l2_flood.cmodules/l2/cli/bridge.cmodules/infra/api/gr_infra.hmodules/l2/cli/mac.cmodules/l2/control/gr_l2_control.hmodules/l2/datapath/l2_bridge.cmodules/l2/control/bridge_iface.cmodules/infra/cli/iface.cmodules/l2/control/bridge.cmodules/l2/control/mac_table.cmodules/l2/api/gr_l2.hmodules/infra/datapath/eth_output.c
**/*.sh
⚙️ CodeRabbit configuration file
**/*.sh: - Don't bother about unquoted shell variables.
Files:
smoke/l2_bridge_test.sh
🧠 Learnings (5)
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: In this project, cli_src is a global variable that gets populated by individual modules (like modules/policy/cli/meson.build) and is ultimately consumed by an executable() call in the top-level meson.build file. This creates a modular CLI build where each module contributes its CLI sources to the main CLI binary.
Applied to files:
modules/l2/meson.buildmodules/l2/control/meson.buildmodules/l2/cli/meson.buildmodules/meson.build
📚 Learning: 2025-09-05T07:06:51.554Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/cli/meson.build:4-8
Timestamp: 2025-09-05T07:06:51.554Z
Learning: The grout project uses a modular CLI build system where individual modules contribute their CLI sources to a global cli_src variable via `cli_src += files(...)`, and the top-level meson.build file builds the final grcli executable using all collected CLI sources. The grcli executable is defined in the top-level meson.build at lines 122-127.
Applied to files:
modules/l2/cli/meson.build
📚 Learning: 2025-11-05T14:28:28.817Z
Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.
Applied to files:
modules/l2/datapath/l2_flood.cmodules/l2/datapath/l2_bridge.cmodules/l2/control/bridge.cmodules/l2/api/gr_l2.hmodules/infra/datapath/eth_output.c
📚 Learning: 2025-11-03T13:28:19.489Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).
Applied to files:
modules/l2/datapath/l2_flood.cmodules/l2/datapath/l2_bridge.cmodules/infra/datapath/eth_output.c
📚 Learning: 2025-11-05T13:55:26.189Z
Learnt from: maxime-leroy
Repo: DPDK/grout PR: 372
File: smoke/cross_vrf_forward_test.sh:18-18
Timestamp: 2025-11-05T13:55:26.189Z
Learning: In the DPDK/grout codebase, VRF interfaces (named gr-vrf<id>) are automatically created when an interface is added to a non-existing VRF using port_add. The VRF creation is handled automatically by the event system in vrf_netlink.c, so no explicit VRF interface creation commands are needed in test scripts.
Applied to files:
modules/infra/api/gr_infra.hmodules/l2/control/bridge_iface.cmodules/l2/control/bridge.c
🧬 Code graph analysis (8)
modules/l2/control/api_handlers.c (4)
modules/l2/control/bridge.c (8)
bridge_add(97-142)bridge_get(79-83)bridge_to_api(291-298)bridge_get_next(301-314)bridge_member_add(184-223)bridge_member_del(225-259)bridge_config_get(277-289)bridge_config_set(261-275)main/gr_module.h (1)
api_out(20-23)main/api.c (1)
api_send(242-258)modules/l2/control/mac_table.c (5)
mac_entry_add(161-236)mac_entry_del(238-286)mac_entry_get_next(398-412)mac_entry_to_api(383-396)mac_table_flush(322-381)
modules/l2/datapath/l2_flood.c (4)
modules/l2/control/bridge.c (1)
bridge_get(79-83)modules/infra/control/gr_iface.h (1)
iface_get_stats(102-104)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(592-619)
modules/l2/cli/bridge.c (8)
modules/l2/control/bridge.c (4)
bridge_add(97-142)bridge_del(144-182)bridge_member_add(184-223)bridge_member_del(225-259)cli/ecoli.c (2)
arg_str(99-109)with_help(15-24)cli/gr_cli.h (1)
arg_u32(72-78)api/gr_api.h (1)
gr_api_client_send_recv(41-52)cli/table.c (1)
scols_line_sprintf(10-23)modules/infra/cli/iface.c (2)
iface_from_name(93-106)ctx_init(483-533)modules/l2/cli/mac.c (1)
ctx_init(204-260)cli/exec.c (1)
cli_context_register(16-18)
modules/l2/cli/mac.c (5)
cli/gr_cli.h (1)
arg_u32(72-78)cli/ecoli.c (2)
arg_eth_addr(164-178)arg_str(99-109)modules/infra/cli/iface.c (2)
iface_from_name(93-106)iface_from_id(108-116)api/gr_api.h (1)
gr_api_client_send_recv(41-52)cli/table.c (1)
scols_line_sprintf(10-23)
smoke/l2_bridge_test.sh (2)
modules/infra/control/gr_iface.h (1)
iface(16-22)smoke/_init.sh (1)
netns_add(113-121)
modules/l2/control/gr_l2_control.h (3)
modules/l2/control/bridge.c (11)
bridge_get(79-83)bridge_get_by_name(85-95)bridge_add(97-142)bridge_del(144-182)bridge_member_add(184-223)bridge_member_del(225-259)bridge_config_set(261-275)bridge_config_get(277-289)bridge_to_api(291-298)bridge_get_next(301-314)bridge_cleanup_interface_macs(317-343)modules/l2/control/mac_table.c (9)
mac_table_fini(139-151)mac_entry_add(161-236)mac_entry_del(238-286)mac_entry_lookup(288-320)mac_table_flush(322-381)mac_entry_to_api(383-396)mac_entry_get_next(398-412)mac_aging_timer_start(483-503)mac_aging_timer_stop(505-512)modules/l2/control/bridge_iface.c (3)
bridge_iface_create(166-202)bridge_iface_destroy(204-222)bridge_get_iface(224-231)
modules/l2/datapath/l2_bridge.c (9)
modules/infra/control/gr_iface.h (1)
iface(16-22)modules/l2/control/bridge.c (1)
bridge_get(79-83)modules/l2/control/mac_table.c (2)
mac_entry_lookup(288-320)mac_entry_add(161-236)modules/l2/control/bridge_iface.c (1)
bridge_get_iface(224-231)modules/infra/control/iface.c (1)
iface_get_eth_addr(259-272)modules/infra/datapath/gr_mbuf.h (1)
gr_mbuf_is_traced(52-54)modules/infra/datapath/trace.c (1)
gr_mbuf_trace_add(592-619)modules/infra/control/gr_graph.h (1)
rte_node_enqueue_x1(22-25)modules/infra/datapath/eth_output.c (1)
eth_output_register_interface_type(27-34)
modules/l2/control/bridge.c (4)
api/gr_errno.h (1)
errno_set(9-12)main/gr_vec.h (1)
gr_vec_len(47-51)modules/infra/cli/iface.c (1)
iface_from_id(108-116)modules/l2/control/mac_table.c (1)
mac_table_flush(322-381)
🪛 Shellcheck (0.11.0)
smoke/l2_bridge_test.sh
[warning] 5-5: Quote this to prevent word splitting.
(SC2046)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
- GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
- GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
- GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
- GitHub Check: build-cross-aarch64
- GitHub Check: rpm
- GitHub Check: deb
🔇 Additional comments (19)
meson.build (1)
69-69: LGTM!Adding the timer library to support MAC aging functionality is appropriate for the L2 bridge implementation.
modules/meson.build (1)
8-8: LGTM!Standard subdir inclusion for the new L2 module.
modules/l2/cli/meson.build (1)
4-8: LGTM!Standard CLI source registration pattern. Based on learnings, the modular CLI build system is correctly used here.
modules/l2/meson.build (1)
4-7: LGTM!Standard submodule wiring for the L2 components.
modules/l2/control/meson.build (1)
4-10: LGTM!Standard source file registration for L2 control components.
modules/l2/datapath/meson.build (1)
4-8: LGTM!Standard source file registration for L2 datapath nodes.
modules/l2/api/meson.build (1)
4-5: LGTM!Standard API header publication pattern.
modules/infra/cli/iface.c (1)
256-273: LGTM!The conditional display logic correctly shows
vrf_idfor L3 mode anddomain_idfor L2_BRIDGE mode in the DOMAIN column. The switch statement handles all cases appropriately.modules/l2/datapath/l2_flood.c (1)
32-149: Flood datapath logic looks soundPer‑packet flooding, mbuf ownership (first dest vs clones), DROP path, TX stats, and trace filling are consistent and safe. No issues spotted.
modules/infra/api/gr_infra.h (1)
387-423: Enum→string helpers correctly extendedThe new
GR_IFACE_TYPE_BRIDGEandGR_IFACE_MODE_L2_BRIDGEcases ingr_iface_type_name()/gr_iface_mode_name()look correct and keep the default fallback as‑is.modules/l2/cli/mac.c (1)
97-201: MAC list/flush CLI and output look correctThe list/flush paths correctly honor the optional BRIDGE_ID/IFACE/dynamic_only filters, resolve iface names safely, and format the libscols table sensibly. No issues spotted.
modules/l2/cli/bridge.c (1)
20-170: Bridge CLI and iface‑type wiring look coherentBridge domain add/del/show/list, member management, and bridge‑iface creation are wired cleanly through the GR_L2/GR_INFRA APIs, and the iface‑type hooks (
bridge_show,bridge_list_info,register_iface_type) are consistent with the rest of the CLI framework. No functional problems noticed.Also applies to: 285-441
modules/l2/control/bridge.c (5)
184-223: bridge_member_add logic looks solidMembership checks (within the same bridge and across other bridges) and the updates to
iface->modeandiface->domain_idare consistent with a single‑bridge membership model. No further issues spotted here.
261-289: Config get/set are straightforward
bridge_config_set/bridge_config_getvalidate arguments, reusebridge_get, and operate on the storedgr_l2_bridge_configas expected. Behavior is clear and consistent with the API.
291-298: bridge_to_api: copy behavior is acceptableMapping internal
bridge_infotogr_l2_bridgeis direct and uses safe bounds for name and counters. Given the earlier name length checks, usingstrncpywithout explicit NUL termination on the destination is still safe here.
316-343: MAC cleanup helper looks correct and efficientIterating active bridges and delegating to
mac_table_flush(bridge_id, iface_id, false)matches the intended “remove all MACs for this iface on all bridges” behavior, and the aggregated logging is useful without being noisy.
345-365: Module wiring is consistent with the rest of the systemModule registration, dependency on
"iface", and init/fini hooks look aligned, aside from the earlierbridge_initerrno nit. The extramac_table_fini()inbridge_module_finimatches the ownership implied by this module.modules/l2/api/gr_l2.h (2)
24-141: Overall bridge/MAC API layout looks coherentThe bridge config, bridge descriptors, member and MAC request/response structs, and iface‑info base all line up cleanly with the control‑plane code and the described operations (add/del/list, member mgmt, MAC add/del/flush, config get/set). Field types and sizes look appropriate for the intended ranges.
18-21: This review comment is incorrect.The project uses
-std=gnu2x, which supports C23 fixed-width enums. The syntaxenum : uint8_tis valid C23, not C++ syntax. This pattern is already used in the same codebase (e.g.,modules/srv6/api/gr_srv6.h), and the code will compile successfully without modification.Likely an incorrect or invalid review comment.
| static struct api_out l2_bridge_add_api(const void *request, struct api_ctx *ctx __rte_unused) { | ||
| const struct gr_l2_bridge_add_req *req = request; | ||
| struct bridge_info *bridge; | ||
| struct gr_l2_bridge *resp; | ||
| uint16_t bridge_id; | ||
|
|
||
| bridge_id = bridge_add(req->name, &req->config); | ||
| if (bridge_id == 0) | ||
| return api_out(errno, 0, NULL); | ||
|
|
||
| bridge = bridge_get(bridge_id); | ||
| if (bridge == NULL) | ||
| return api_out(ENOENT, 0, NULL); | ||
|
|
||
| resp = malloc(sizeof(*resp)); | ||
| if (resp == NULL) | ||
| return api_out(ENOMEM, 0, NULL); | ||
|
|
||
| bridge_to_api(resp, bridge); | ||
| return api_out(0, sizeof(*resp), resp); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
l2_bridge_add_api may return bogus status because bridge_add doesn’t set errno
bridge_add() returns 0 on several failures (invalid name, duplicate name, alloc failure) but, per the bridge.c snippet, does not call errno_set on those paths. Here you translate bridge_id == 0 into api_out(errno, ...), which can end up sending status == 0 or some unrelated old errno even though the operation failed.
I’d recommend tightening this up by either:
- Making
bridge_add()consistently useerrno_set(...)on every failure path and documenting the mapping, then keeping this handler as-is, or - Mapping
bridge_id == 0to an explicit error code here (e.g.,EINVALor a more specific status oncebridge_adddistinguishes reasons), rather than blindly forwardingerrno.
Right now clients can’t reliably tell why a bridge add failed, and in some cases will see a “success” status with no payload.
🤖 Prompt for AI Agents
In modules/l2/control/api_handlers.c around lines 16-36, the handler currently
returns api_out(errno,...) when bridge_add() returns 0 but bridge_add() does not
set errno, so callers can receive a bogus status; change the handler to map a
zero return to an explicit error code instead of forwarding errno (for example
return api_out(EINVAL, 0, NULL) or a more specific code if you can detect
duplicate/invalid/name/alloc conditions), and add a brief TODO comment to switch
to propagated errno once bridge_add() is updated to set errno on all failure
paths.
| static int bridge_init(void) { | ||
| bridge_name_hash = rte_hash_create(&bridge_name_hash_params); | ||
| if (bridge_name_hash == NULL) { | ||
| LOG(ERR, "Failed to create bridge name hash table: %s", rte_strerror(rte_errno)); | ||
| return -rte_errno; | ||
| } | ||
|
|
||
| // Initialize bridge table | ||
| memset(bridges, 0, sizeof(bridges)); | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bridge_init doesn’t set errno on failure
bridge_init returns -rte_errno but never sets errno, yet bridge_module_init later prints strerror(errno). This can yield a misleading message.
Consider using the common helper so errno is consistent:
- bridge_name_hash = rte_hash_create(&bridge_name_hash_params);
- if (bridge_name_hash == NULL) {
- LOG(ERR, "Failed to create bridge name hash table: %s", rte_strerror(rte_errno));
- return -rte_errno;
- }
+ bridge_name_hash = rte_hash_create(&bridge_name_hash_params);
+ if (bridge_name_hash == NULL) {
+ int err = rte_errno;
+ LOG(ERR, "Failed to create bridge name hash table: %s", rte_strerror(err));
+ return errno_set(err);
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In modules/l2/control/bridge.c around lines 39-50, bridge_init returns
-rte_errno on failure but does not set errno, causing later strerror(errno) to
be misleading; fix by making errno consistent before returning — e.g., convert
or assign the DPDK rte_errno into the global errno (or call the project’s common
helper that maps/sets errno from rte_errno), then log/return as before so
bridge_module_init’s strerror(errno) reports the correct error.
| struct bridge_info *bridge_get_by_name(const char *name) { | ||
| void *data; | ||
| int ret; | ||
|
|
||
| ret = rte_hash_lookup_data(bridge_name_hash, name, &data); | ||
| if (ret < 0) | ||
| return NULL; | ||
|
|
||
| uint16_t bridge_id = (uintptr_t)data; | ||
| return bridge_get(bridge_id); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent hash key usage for bridge names (potential leak and delete failure)
bridge_add inserts into bridge_name_hash using the raw name pointer as key, while bridge_del deletes using bridge->name. The byte contents of these 32‑byte keys can differ (e.g., different padding after the NUL), so rte_hash_del_key() may fail and leave stale entries. It also means you depend on callers passing a buffer of at least GR_L2_BRIDGE_NAME_SIZE bytes to bridge_add, which isn’t enforced by the API.
Recommend canonicalizing keys via bridge->name everywhere and zero‑padding for lookups:
- // Add to name hash table
- ret = rte_hash_add_key_data(bridge_name_hash, name, (void *)(uintptr_t)bridge_id);
+ // Add to name hash table using canonical, zero‑padded name
+ ret = rte_hash_add_key_data(bridge_name_hash,
+ bridge->name,
+ (void *)(uintptr_t)bridge_id);And in bridge_get_by_name:
- ret = rte_hash_lookup_data(bridge_name_hash, name, &data);
+ char key[GR_L2_BRIDGE_NAME_SIZE] = {0};
+ strncpy(key, name, sizeof(key) - 1);
+ ret = rte_hash_lookup_data(bridge_name_hash, key, &data);Deletion already uses bridge->name, so after this change add/lookup/del all share the same canonical key. This also avoids reading past the end of arbitrary C strings.
Also applies to: 133-139, 159-165
🤖 Prompt for AI Agents
In modules/l2/control/bridge.c around lines 85-95 (and also apply same pattern
at 133-139 and 159-165), the hash is being looked up/inserted with raw caller
pointers while deletion uses bridge->name, which can differ in padding and cause
failed deletes/stale entries; fix by canonicalizing the hash key to always use
the bridge->name storage and by performing lookups with a fixed-size,
zero-filled key buffer of GR_L2_BRIDGE_NAME_SIZE: when adding, insert using
bridge->name (not the caller pointer); when looking up (bridge_get_by_name)
allocate a local char key[GR_L2_BRIDGE_NAME_SIZE] = {0}, strncpy or memcpy up to
GR_L2_BRIDGE_NAME_SIZE-1 from the provided name, then call rte_hash_lookup_data
with that key; ensure deletions already use bridge->name so add/lookup/del use
the exact same canonical, zero-padded key to avoid mismatches and out‑of‑bounds
reads.
| int bridge_del(uint16_t bridge_id) { | ||
| struct bridge_info *bridge; | ||
| int ret; | ||
|
|
||
| bridge = bridge_get(bridge_id); | ||
| if (bridge == NULL) | ||
| return errno_set(ENOENT); | ||
|
|
||
| // Remove all members first | ||
| while (gr_vec_len(bridge->members) > 0) { | ||
| uint16_t iface_id = bridge->members[0]; | ||
| bridge_member_del(bridge_id, iface_id); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bridge_del can loop forever if iface_from_id() fails in bridge_member_del
bridge_del repeatedly calls bridge_member_del(bridge_id, bridge->members[0]) until gr_vec_len(bridge->members) == 0. In bridge_member_del, if iface_from_id(iface_id) returns NULL, the function returns errno_set(ENODEV) without modifying bridge->members. Since bridge_del ignores the return value, this results in an infinite loop on that first element.
You should ensure that membership is removed from the vector even when the iface object no longer exists. For example:
int bridge_member_del(uint16_t bridge_id, uint16_t iface_id) {
struct bridge_info *bridge;
struct iface *iface;
uint16_t *member;
size_t index = 0;
bridge = bridge_get(bridge_id);
if (bridge == NULL)
return errno_set(ENOENT);
- iface = iface_from_id(iface_id);
- if (iface == NULL)
- return errno_set(ENODEV);
+ iface = iface_from_id(iface_id);
@@
- gr_vec_foreach_ref (member, bridge->members) {
+ gr_vec_foreach_ref (member, bridge->members) {
if (*member == iface_id) {
gr_vec_del(bridge->members, index);
-
- // Reset interface mode to L3
- iface->mode = GR_IFACE_MODE_L3;
- iface->domain_id = 0;
+ // Reset interface mode only if iface still exists
+ if (iface != NULL) {
+ iface->mode = GR_IFACE_MODE_L3;
+ iface->domain_id = 0;
+ }
@@
- return errno_set(ENOENT);
+ return errno_set(ENOENT);
}Alternatively, bridge_del could directly drop bridge->members without going through bridge_member_del, but the key point is to avoid a loop where the member list never shrinks.
Also applies to: 225-259
🤖 Prompt for AI Agents
In modules/l2/control/bridge.c around lines 144-156 (and similarly 225-259),
bridge_del currently loops calling bridge_member_del(bridge_id,
bridge->members[0]) until the member vector is empty but ignores
bridge_member_del's return value; if iface_from_id() fails inside
bridge_member_del the member isn't removed and bridge_del can loop forever. Fix
by ensuring the member is removed from bridge->members even when
bridge_member_del fails: call bridge_member_del and if it returns non-zero,
explicitly remove the element at index 0 from the vector (or otherwise drop the
entire members vector) before continuing; apply the same change to the other
affected range so the loop cannot hang when iface objects are missing.
| static struct bridge_mac_table mac_tables[GR_MAX_BRIDGE_DOMAINS]; | ||
| static struct event *aging_event; | ||
| static struct event_base *ev_base; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Shared mac_tables state is unsynchronized; verify single-threaded usage
All the MAC table operations (mac_entry_add/del/lookup, mac_table_flush, and mac_aging_callback) mutate the same mac_tables[bridge_id] state (hash, entries list, count) with no visible synchronization.
If these functions can be called from multiple threads/lcores (e.g., datapath learning + control-plane CLI + aging timer), there’s a real risk of races and list/hash corruption.
If the design guarantees that:
- All callers run on a single thread, or
- Access is externally serialized,
it would be good to document that assumption clearly. Otherwise, consider adding per‑bridge locking or another concurrency mechanism around bridge_mac_table mutations so aging, flush, add, and delete can’t interleave unsafely.
Also applies to: 161-236, 238-286, 322-381, 414-481
---
🏁 Script executed:
```shell
#!/bin/bash
# Examine the full mac_table.c to understand function structure and synchronization
cat modules/l2/control/mac_table.c | head -100
Length of output: 2402
🏁 Script executed:
#!/bin/bash
# Check lines mentioned in review for mutation patterns
sed -n '50,52p; 161,236p; 238,286p; 322,381p; 414,481p' modules/l2/control/mac_table.c | grep -A2 -B2 'lock\|mutex\|rcu\|atomic\|spinlock' || echo "No sync primitives found in those ranges"Length of output: 40
🏁 Script executed:
#!/bin/bash
# Search for where mac_tables is accessed/modified
rg -n 'mac_tables\[' modules/l2/control/mac_table.c -A2 -B2Length of output: 1361
🏁 Script executed:
#!/bin/bash
# Check for calls to the functions mentioned
rg -n 'mac_entry_add|mac_entry_del|mac_table_flush|mac_aging_callback' modules/l2/ --type=cLength of output: 1266
🏁 Script executed:
#!/bin/bash
# Look for threading/locking patterns elsewhere in l2 module
find modules/l2 -name '*.c' -o -name '*.h' | xargs grep -l 'pthread\|rte_spinlock\|rte_rwlock\|rte_atomic' | head -5Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Check l2_bridge.c to see execution context of mac_entry_add calls
rg -n 'mac_entry_add' modules/l2/datapath/l2_bridge.c -B10 -A2 | head -40
Length of output: 626
🏁 Script executed:
#!/bin/bash
# Look for threading model or worker thread documentation
grep -r 'single.*thread\|serialize\|lcore' modules/l2/control/ modules/l2/datapath/ --include='*.c' --include='*.h' | grep -i 'comment\|//\|/\*' | head -10Length of output: 0
🏁 Script executed:
#!/bac
# Check how api_handlers are called (control plane threading)
rg -n 'api_handlers\|mac_entry_add' modules/l2/control/api_handlers.c -B5 | head -30Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Check header for any documented threading guarantee
cat modules/l2/control/gr_l2_control.h | grep -A15 'mac_entry_add\|mac_entry_del\|mac_table_flush'
Length of output: 820
🏁 Script executed:
#!/bin/bash
# Look for event_base initialization and callback context
rg -n 'event_base\|EV_PERSIST\|mac_aging_callback' modules/l2/control/mac_table.c -A3 -B3 | tail -40Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check if there's a main event loop or worker model that might serialize
rg -n 'event_base_dispatch\|event_loop\|worker.*thread' modules/l2/ --type=c -B2 -A2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Full inspection of destroy and aging callback to understand mutation scope
sed -n '108,160p; 415,480p' modules/l2/control/mac_table.cLength of output: 2604
Unsynchronized entries list and count confirmed; races between add/del/aging
The table->entries linked list and table->count field are unprotected. While the hash table uses lock-free concurrency, the list mutations in mac_entry_add (lines 161–236), mac_entry_del (238–286), mac_table_flush (322–381), and especially mac_aging_callback (415–480) can race when called from multiple contexts (datapath rx, control plane API, aging timer).
The aging callback specifically walks and removes entries without locking (lines 431–475), risking corruption if add/del execute concurrently. Add per-bridge synchronization (lock or RCU) around entries list and count mutations.
🤖 Prompt for AI Agents
In modules/l2/control/mac_table.c around lines 50-52, the mac_tables array
currently exposes unsynchronized mutations: the table->entries linked list and
table->count are accessed/modified without protection, causing races between
mac_entry_add (lines ~161–236), mac_entry_del (238–286), mac_table_flush
(322–381) and mac_aging_callback (415–480). Fix by adding a per-bridge
synchronization primitive (e.g., a pthread_mutex_t or rwlock in struct
bridge_mac_table) and initialize/destroy it when creating/tearing down tables;
then take the lock in write mode for all mutations (mac_entry_add,
mac_entry_del, mac_table_flush and any removal inside mac_aging_callback) and in
read mode (or drop to shared lock) for pure walkers; ensure the aging timer
callback acquires the same lock before traversing/removing entries and update
reference/count modifications under the lock to prevent list corruption and
stale counts.
| struct l2_bridge_trace { | ||
| uint16_t bridge_id; | ||
| uint16_t src_iface; | ||
| uint16_t dst_iface; | ||
| struct rte_ether_addr src_mac; | ||
| struct rte_ether_addr dst_mac; | ||
| uint8_t action; // 0=learn, 1=forward, 2=flood, 3=to_bridge_iface, 4=drop | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trace action encoding for DROP is inconsistent with formatter
trace->action uses 0–4 with 4 documented as DROP, but l2_bridge_trace_format only defines 4 strings and indexes them with t->action % 4. As a result, DROP (4) is formatted as "learn", and the "learn" state (0) is effectively unused.
Suggest aligning the encoding with the formatter, e.g.:
-struct l2_bridge_trace {
+struct l2_bridge_trace {
@@
- uint8_t action; // 0=learn, 1=forward, 2=flood, 3=to_bridge_iface, 4=drop
+ uint8_t action; // 0=learn, 1=forward, 2=flood, 3=to_bridge_iface, 4=drop
@@
-static int l2_bridge_trace_format(char *buf, size_t len, const void *data, size_t data_len) {
- const char *actions[] = {"learn", "forward", "flood", "to_bridge"};
+static int l2_bridge_trace_format(char *buf, size_t len, const void *data, size_t data_len) {
+ const char *actions[] = {"learn", "forward", "flood", "to_bridge_iface", "drop"};
@@
- actions[t->action % 4],
+ actions[t->action],This way DROP is rendered correctly and you still have room for a future “learn” state if you decide to use action == 0.
Also applies to: 147-156, 165-183
| // Get bridge information | ||
| bridge = bridge_get(bridge_id); | ||
| if (bridge == NULL) { | ||
| edge = DROP; | ||
| goto next; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uninitialized eth used in trace path when bridge_get fails
If bridge_get(bridge_id) returns NULL, execution jumps to next and the trace block still dereferences eth, which has never been assigned in that path. That’s undefined behavior and can corrupt traces or crash under tracing.
Consider ensuring eth is always initialized before any goto next that can reach the trace block, e.g. by moving the header extraction earlier:
- mbuf = objs[i];
- iface = (struct iface *)mbuf_data(mbuf)->iface;
- bridge_id = iface->domain_id;
-
- // Get bridge information
- bridge = bridge_get(bridge_id);
+ mbuf = objs[i];
+ eth = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);
+ iface = (struct iface *)mbuf_data(mbuf)->iface;
+ bridge_id = iface->domain_id;
+
+ // Get bridge information
+ bridge = bridge_get(bridge_id);
@@
- eth = rte_pktmbuf_mtod(mbuf, struct rte_ether_hdr *);Alternatively, you could skip filling MAC fields in the trace when bridge == NULL, but the above keeps tracing consistent.
Also applies to: 135-157
🤖 Prompt for AI Agents
In modules/l2/datapath/l2_bridge.c around lines 53-59 (and similarly at
135-157), the variable `eth` can remain uninitialized when bridge_get(bridge_id)
returns NULL and execution jumps to the trace path, causing undefined behavior;
ensure `eth` is assigned before any goto that can reach the trace block (for
example, move the header extraction/eth = ... up before calling bridge_get or
otherwise initialize `eth` to a safe default), or guard the trace MAC field
population so it is skipped when bridge == NULL; update both occurrences so the
trace code never dereferences an uninitialized `eth`.
a2b6fcb to
a70c7c5
Compare
df3b54f to
8aa1e24
Compare
|
Good catch with the new trace copy helper... I wonder if a more global helper for creating a packet copy should be added. This will avoid missing copying those. |
|
@david-marchand yes, definitely ! |
ca03e32 to
576f560
Compare
576f560 to
04b82be
Compare
479653b to
f543d56
Compare
e1c00b1 to
b73345f
Compare
Bridge interfaces belong to both a VRF and a bridge domain, and so can't be part of a union. Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Grout is meant to be a router. Yet, for some usecases (vxlan EVPN, SRv6 L2 encap, ... ), it is required to be able to switch packets in addition to route them. Add a 'l2' module to add a mac learning switch to grout. Expiration is based on libevent. Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Add a new type of interface to be able to route the packets received on the bridge. Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Add new events when a mac is added and removed to/from a bridge. Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Automate the creation of a bridge interface when a bridge is created. Instead of generating an interface name based on the bridge id "br%u", use the bridge name directly. Update l2 bridge test to use the proper name. Add a helper to create/delete a bridge. Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Add bridge interface type support to FRR, and populate the fields "bridge_ifindex" and "master_ifindex" to create the proper hierarchy in frr. Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Register the proper events when a mac is added/migrated/removed to a bridge, and forward the info to zebra. Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
EVPN use cases require to have a bridge as it pushes ip/mac tuples.
For VXLAN and SRv6 based EVPN, FRR will have to implement
DPLANE_OP_MAC_INSTALLandDPLANE_OP_MAC_DELETEto populate 'external' Mac addresses.This will require an additional flag to the mac table to signify the provenance, in addition to
dynamicandstatic.TODO:
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.