Skip to content

v1.1.3

Choose a tag to compare

@LovelyBuggies LovelyBuggies released this 05 Oct 00:46
· 20 commits to main since this release
d775bec

This version should work with CoMLRL v1.1.3.

Changelog

  1. Remove the hard-coded code-level-logging at each node, since we don't expect users to inspect the details during training, at the cost of huge VRAM usage.
  2. Change the default value of hyperparameters according to the Dr. GRPO style, and learning rate to be 2e-5 and no more bandit external mode since it is equivalent to the magrpo in single-turn.
  3. Optimize the code format and group the closed params together.
  4. Add MBPP dataset.
image Plain fails 2; expert fails 1; level feedback not fail yet.