Add recompute for post_norm and moe_gate #10815
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before submitting
tests
folder. If there are codecov issues, please add tests cases first.PR types
New features
PR changes
Models
Description
为ds_v3添加了post_norm和gate的recompute重计算,反向时重新计算post_norm,以节省post_norm输出显存。
为组成recompute pylayer需调整模型结构,整体思路是尽量不更改现有接口,如需修改,则尽量修改接口输入避免修改接口输出,输入中尽量添加可选参数,避免影响其余模型及配置运行。
具体调整如下: