Skip to content

Conversation

@hjh0119
Copy link
Collaborator

@hjh0119 hjh0119 commented Sep 30, 2025

No description provided.

@hjh0119 hjh0119 changed the title [WIP] Megatron GRPO Support Megatron GRPO Nov 14, 2025
@hjh0119 hjh0119 marked this pull request as ready for review November 14, 2025 08:18
@hjh0119
Copy link
Collaborator Author

hjh0119 commented Nov 14, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces comprehensive support for Megatron GRPO, a significant enhancement to the framework. This includes new command-line arguments, a dedicated MegatronGRPOTrainer, and extensive documentation updates in both Chinese and English. The integration with vLLM for efficient rollout generation and the implementation of dynamic sampling (DAPO) are key features. The changes are well-structured, addressing various aspects of distributed training and memory management through model/optimizer offloading. The new documentation provides valuable information for users to understand and utilize this new functionality.

@hjh0119
Copy link
Collaborator Author

hjh0119 commented Nov 14, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Megatron GRPO, including extensive documentation, example scripts, and necessary code changes across arguments, trainers, and templates. The implementation is comprehensive, but I've identified a few issues. There are some minor formatting inconsistencies in the Chinese documentation and a broken link in the English documentation. More importantly, there are duplicate arguments in the example shell scripts which could cause issues, and a critical bug in the argument parsing logic where distributed-dependent checks are performed before the distributed environment is initialized. I've provided suggestions to fix these issues.

self.per_device_generation_batch_size = self.generation_batch_size // world_size

_check_not_supported()
_check_batch_params()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The _check_batch_params() function is called within __post_init__, which runs before the distributed environment is initialized. This will cause a crash because torch.distributed.get_world_size() is called before torch.distributed.init_process_group().

To fix this, the batch parameter validation should be deferred until after the distributed environment is set up. I recommend moving this check to the MegatronGRPOTrainer at a point where the distributed context is guaranteed to be available, such as at the beginning of the first training step.

For example, you could remove the call from here and perform the check inside MegatronGRPOTrainer.forward_step with a flag to ensure it only runs once.

@hjh0119
Copy link
Collaborator Author

hjh0119 commented Nov 14, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Megatron GRPO, a significant new feature. The changes are comprehensive, covering documentation updates, new example scripts, and core implementation modifications to arguments and trainers. The implementation appears robust, with careful consideration for distributed training complexities and memory management, particularly with the vLLM integration. The code is well-structured and includes helpful checks for unsupported configurations. I have one suggestion regarding a documentation inconsistency.

@hjh0119 hjh0119 merged commit 50968b5 into modelscope:main Nov 14, 2025
1 of 2 checks passed
@hjh0119 hjh0119 deleted the mega-grpo branch November 14, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants