Support Megatron GRPO #6025

hjh0119 · 2025-09-30T09:09:56Z

No description provided.

hjh0119 · 2025-11-14T08:21:50Z

/gemini review

gemini-code-assist

Code Review

The pull request introduces comprehensive support for Megatron GRPO, a significant enhancement to the framework. This includes new command-line arguments, a dedicated MegatronGRPOTrainer, and extensive documentation updates in both Chinese and English. The integration with vLLM for efficient rollout generation and the implementation of dynamic sampling (DAPO) are key features. The changes are well-structured, addressing various aspects of distributed training and memory management through model/optimizer offloading. The new documentation provides valuable information for users to understand and utilize this new functionality.

docs/source/Megatron-SWIFT/Command-line-parameters.md

docs/source_en/Megatron-SWIFT/Command-line-parameters.md

examples/megatron/grpo/dense_server.sh

swift/megatron/argument/megatron_args.py

hjh0119 · 2025-11-14T09:01:52Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for Megatron GRPO, including extensive documentation, example scripts, and necessary code changes across arguments, trainers, and templates. The implementation is comprehensive, but I've identified a few issues. There are some minor formatting inconsistencies in the Chinese documentation and a broken link in the English documentation. More importantly, there are duplicate arguments in the example shell scripts which could cause issues, and a critical bug in the argument parsing logic where distributed-dependent checks are performed before the distributed environment is initialized. I've provided suggestions to fix these issues.

gemini-code-assist · 2025-11-14T09:09:45Z

swift/megatron/argument/megatron_args.py

+            self.per_device_generation_batch_size = self.generation_batch_size // world_size
+
+        _check_not_supported()
+        _check_batch_params()


The _check_batch_params() function is called within __post_init__, which runs before the distributed environment is initialized. This will cause a crash because torch.distributed.get_world_size() is called before torch.distributed.init_process_group().

To fix this, the batch parameter validation should be deferred until after the distributed environment is set up. I recommend moving this check to the MegatronGRPOTrainer at a point where the distributed context is guaranteed to be available, such as at the beginning of the first training step.

For example, you could remove the call from here and perform the check inside MegatronGRPOTrainer.forward_step with a flag to ensure it only runs once.

examples/megatron/grpo/dense_colocate.sh

examples/megatron/grpo/dense_server.sh

docs/source/Megatron-SWIFT/Command-line-parameters.md

docs/source_en/Megatron-SWIFT/Command-line-parameters.md

hjh0119 · 2025-11-14T10:07:37Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for Megatron GRPO, a significant new feature. The changes are comprehensive, covering documentation updates, new example scripts, and core implementation modifications to arguments and trainers. The implementation appears robust, with careful consideration for distributed training complexities and memory management, particularly with the vLLM integration. The code is well-structured and includes helpful checks for unsupported configurations. I have one suggestion regarding a documentation inconsistency.

docs/source/Instruction/Command-line-parameters.md

hjh0119 added 30 commits August 29, 2025 18:07

wip

4316425

init wip

5d46eae

args wip

5828229

Merge remote-tracking branch 'origin/main' into mega-grpo

a82cec4

reuse _prepare_rollout_engine

0689b76

merge main

46593cf

mega wip

3da8756

Merge remote-tracking branch 'origin' into mega-grpo

2ca7ac1

wip

d9ec029

override train_step wip

7c56f9f

remove override train_step to grpo

686fc74

Merge remote-tracking branch 'origin' into mega-grpo

095bcbd

sync weight wip

4d9457b

rollout wip

f52d5e1

Merge remote-tracking branch 'origin' into mega-grpo

155d4fb

modify mini_batch_size to generation batch size

3c69c39

wip

eebdd47

loss wip

de6ecfe

fix repeat n

4569e54

Merge remote-tracking branch 'origin' into mega-grpo

f118935

fix padding to multiple of tp_size

9cb84e3

compute loss

8627aa3

fix logps

2292cf8

logging & patch VL

bbe5f39

fix rollout_group & rollout judgement

6a2940c

fix step

486c3d4

merge main

7e8e6b0

move old base trainer to newer

c68d976

fix

6b1653c

offload utils

d4a9dcc

hjh0119 added 10 commits November 14, 2025 15:07

docs

1936a83

merge main

500408a

update doc

bdcaa51

update doc & args check

027ca57

clean up

360da42

clean up

6259a34

clean up

5fdde02

clean up

b5be0ce

clean up

d8c7c3b

align scale_rewards

2aaa1e5

hjh0119 changed the title ~~[WIP] Megatron GRPO~~ Support Megatron GRPO Nov 14, 2025

hjh0119 marked this pull request as ready for review November 14, 2025 08:18

merge main

7540743

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

aggressive_empty_cache before wake up weights

29ecb32

docs

ad00c5c

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

hjh0119 added 4 commits November 14, 2025 17:10

sleep level doc

5977fe5

fix kl metrics

5ab6d37

fix arxiv link & fix kl metric

2841fb9

revert script

2a97c64

tastelikefeet approved these changes Nov 14, 2025

View reviewed changes

Jintao-Huang approved these changes Nov 14, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

docs/source/Instruction/Command-line-parameters.md Show resolved Hide resolved

revert server_base_url doc

40706d7

hjh0119 merged commit 50968b5 into modelscope:main Nov 14, 2025
1 of 2 checks passed

hjh0119 deleted the mega-grpo branch November 14, 2025 12:47

Support Megatron GRPO #6025

Support Megatron GRPO #6025

Uh oh!

Conversation

hjh0119 commented Sep 30, 2025

Uh oh!

hjh0119 commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants