Skip to content

Commit cd060e2

Browse files
change reward to return (#16)
* Remove handoff/early-termination/turn-weights; add discount=0.9 default; update configs and README * Clean configs/scripts: remove handoff/turn weights/early termination mentions; add discount; minor prints; README cleanup * Fix duplicate discount arg in MAGRPOConfig init * yes * clean * Update mt_code_logger.py * rm ours * remove redundant * Update mt_code_logger.py * set reward shift default to be -4 * change to -2.1 * cross joint * set joint mode to be cross
1 parent 9a8608d commit cd060e2

21 files changed

+130
-4042
lines changed

README.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,12 @@ python LLM_Collaboration_with_MARL/train_grpo.py \
5252
python LLM_Collaboration_with_MARL/train_magrpo.py \
5353
--config LLM_Collaboration_with_MARL/configs/mt_magrpo_che_config.yaml \
5454
--override dataset.train_split='test[16:]' dataset.eval_split='test[:16]' \
55-
magrpo.num_turns=2 magrpo.turn_gradient_weights=[1.5,0.5]
55+
magrpo.num_turns=2
56+
57+
# Enable code-level training metrics (expensive; default is off)
58+
python LLM_Collaboration_with_MARL/train_magrpo.py \
59+
--config LLM_Collaboration_with_MARL/configs/magrpo_he_config.yaml \
60+
--override magrpo.log_code_levels=true
5661
```
5762
## Multi-Turn Settings
5863

@@ -94,14 +99,3 @@ python LLM_Collaboration_with_MARL/train_magrpo.py \
9499
--config LLM_Collaboration_with_MARL/configs/mt_magrpo_che_config.yaml \
95100
--override external.mode='level_feedback' external.sandbox_slice=-2
96101
```
97-
98-
### Handoff Strategy
99-
100-
In MAGRPO/GRPO multi-turn training, we hand off one prior completion per agent to keep compute bounded. The trainer selects this per the `handoff` mode: **default `random`**, or `best`. Selection happens in the CoMLRL trainer; external modes simply format the next-turn prompts using the provided completions. Configure via `magrpo.handoff` or `grpo.handoff` in your config or `--override`.
101-
102-
103-
```bash
104-
python LLM_Collaboration_with_MARL/train_magrpo.py \
105-
--config LLM_Collaboration_with_MARL/configs/mt_magrpo_he_config.yaml \
106-
--override external.mode='plain' magrpo.handoff='best'
107-
```

0 commit comments

Comments
 (0)