-
Notifications
You must be signed in to change notification settings - Fork 414
Bugfix issue 946 #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix issue 946 #947
Conversation
Summary of ChangesHello @uniqueni, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在修复一个错误(issue 946),通过改进 RDMA 上下文中最佳 GID 索引的选择机制来提高系统的鲁棒性。核心改动是引入了一种从系统文件读取网络设备信息的方法,以确保所选的 GID 索引与一个活跃且可用的网络接口相关联,从而避免潜在的连接问题。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
你好,这个 PR 通过读取 sysfs 来检查 GID 是否关联了网络设备,从而改进了 getBestGidIndex 的逻辑,优先选择有网络接口的 GID。这是一个合理的修复方向。我发现一个小问题,即在没有 GID 关联网络设备时的回退逻辑发生了变化,可能会从选择“第一个”符合条件的 GID 变为“最后一个”。我在代码中留下了具体的建议,希望能帮助你完善这个修复。
mooncake-transfer-engine/src/transport/rdma_transport/rdma_context.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in getBestGidIndex() where the function would select the first matching GID (IPv4-mapped + RoCE v2 or IB type) without verifying if it belongs to a local network interface. The fix ensures that GIDs associated with actual network devices are preferred.
Key changes:
- Added
readGidNdev()helper function to read network device information from sysfs - Modified
getBestGidIndex()to prioritize GIDs with associated network devices - Implemented fallback logic to retain original behavior when no network device is found
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
mooncake-transfer-engine/src/transport/rdma_transport/rdma_context.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, will merge after fixing building problems
thank you! It's fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uniqueni please fix format, thank you.
问题描述
当前的 getBestGidIndex() 函数会选择第一个符合条件的 GID(IPv4-mapped + RoCE v2 或 IB 类型)。但在有多个符合条件的 GID 时,可能会选到不属于本机网络接口的 GID。
实际场景:
GID[3]: ::ffff:26.208.8.250, RoCE v2 (不是本机 IP)
GID[7]: ::ffff:26.208.8.252, RoCE v2 (本机 eth1 的 IP)
当前逻辑会选择 GID[3] 并立即返回,但实际应该选择 GID[7]。
###解决方案
我考虑了两种修复方式:
方式一:从 sysfs 读取网络设备信息(已采用)
方式二:匹配本机 IP 地址
为什么选择方式一
请求 Review
请帮忙看看这个方案是否有问题,或者有更好的解决方式?