Skip to content

Conversation

liuruyan
Copy link

@liuruyan liuruyan commented Jul 4, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

Models

Description

为ds_v3添加了post_norm和gate的recompute重计算,反向时重新计算post_norm,以节省post_norm输出显存。
为组成recompute pylayer需调整模型结构,整体思路是尽量不更改现有接口,如需修改,则尽量修改接口输入避免修改接口输出,输入中尽量添加可选参数,避免影响其余模型及配置运行。
具体调整如下:

  1. 添加了 FusedNormGateFunc pylayer及 using_norm_gate_recompute config,实现post_norm+gate重计算功能
  2. 升级MoeGate、DeepseekV2MoE,添加可选分支判断是否执行 norm_gate_recompute
  3. 修改MoeGate接口,输入增加 norm_weight 及 norm_eps,输出增加 norm_out
  4. 修改DeepseekV2MoE构造函数,增加 norm_weight 及 norm_eps,并将super().forward中gate相关功能提取到子类forward中实现recompute,避免过多对父类(MoELayer)进行修改
  5. 父类(MoELayer)构造函数添加using_norm_gate_recompute,用来跳过gate计算,forward函数增加可选输入接受gate输出结构,避免修改输出接口

Copy link

paddle-bot bot commented Jul 4, 2025

Thanks for your contribution!

@liuruyan liuruyan changed the title add recompute for post_norm and moe_gate Add recompute for post_norm and moe_gate Jul 4, 2025
@liuruyan liuruyan closed this Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant