-
Notifications
You must be signed in to change notification settings - Fork 971
Support Megatron GRPO #6025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Megatron GRPO #6025
Conversation
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces comprehensive support for Megatron GRPO, a significant enhancement to the framework. This includes new command-line arguments, a dedicated MegatronGRPOTrainer, and extensive documentation updates in both Chinese and English. The integration with vLLM for efficient rollout generation and the implementation of dynamic sampling (DAPO) are key features. The changes are well-structured, addressing various aspects of distributed training and memory management through model/optimizer offloading. The new documentation provides valuable information for users to understand and utilize this new functionality.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for Megatron GRPO, including extensive documentation, example scripts, and necessary code changes across arguments, trainers, and templates. The implementation is comprehensive, but I've identified a few issues. There are some minor formatting inconsistencies in the Chinese documentation and a broken link in the English documentation. More importantly, there are duplicate arguments in the example shell scripts which could cause issues, and a critical bug in the argument parsing logic where distributed-dependent checks are performed before the distributed environment is initialized. I've provided suggestions to fix these issues.
| self.per_device_generation_batch_size = self.generation_batch_size // world_size | ||
|
|
||
| _check_not_supported() | ||
| _check_batch_params() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _check_batch_params() function is called within __post_init__, which runs before the distributed environment is initialized. This will cause a crash because torch.distributed.get_world_size() is called before torch.distributed.init_process_group().
To fix this, the batch parameter validation should be deferred until after the distributed environment is set up. I recommend moving this check to the MegatronGRPOTrainer at a point where the distributed context is guaranteed to be available, such as at the beginning of the first training step.
For example, you could remove the call from here and perform the check inside MegatronGRPOTrainer.forward_step with a flag to ensure it only runs once.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for Megatron GRPO, a significant new feature. The changes are comprehensive, covering documentation updates, new example scripts, and core implementation modifications to arguments and trainers. The implementation appears robust, with careful consideration for distributed training complexities and memory management, particularly with the vLLM integration. The code is well-structured and includes helpful checks for unsupported configurations. I have one suggestion regarding a documentation inconsistency.
No description provided.