Skip to content

Balance_gate & O1 recompute configuration #10883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 28, 2025

Conversation

liuruyan
Copy link

@liuruyan liuruyan commented Jul 23, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

PR changes

Description

  1. 调整 mlp_bwd_dx 中变量生命周期
  2. zip、unzip 构造函数中去除无用 token_dispatcher
  3. 升级 fake_gate,支持设置 fakse_gate_restrict_balance 使得所有专家分配到 token 绝对均衡
  4. fake_gate 兼容 post_norm recompute
  5. 升级 recompute_fwd_gate_up flag 配置选项支持PP场景下选择O1 rc个数,且自适应first_k_dense_replace、num_hidden_layers数量:
    • 扩展 recompute_fwd_gate_up 语义,支持 BOOL 和 INTEGER
    • BOOL:默认用户不知道O1 recompute nums可选,选择True,代表他期望全部重计算,False代表关闭重计算
    • INTEGER:代表用户了解可选选项,可以填入O1重计算个数,0代表关闭,超出segment时默认全部重计算

Copy link

paddle-bot bot commented Jul 23, 2025

Thanks for your contribution!

@liuruyan liuruyan changed the title balance_gate Balance_gate & O1 recompute configuration Jul 28, 2025
@phlrain phlrain merged commit 884ca21 into PaddlePaddle:dsv3_dev Jul 28, 2025
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants