Skip to content

Commit f9ec5ec

Browse files
committed
Update README.md
1 parent 7ad5985 commit f9ec5ec

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ python LLM_Collaboration_with_MARL/train_magrpo.py \
3636

3737
`magrpo.joint_mode` determine how to combine each agent's K generations into joint actions at each turn. 2 modes are supported: if set 'align' by default, each agent's k-th generation is paired with the other agents' k-th generations to form a joint action; if set 'cross', all combinations of the agents' K generations are used to form joint actions (K^N joint actions for N agents).
3838

39-
Since the number of samples will also grow exponentially with the number of turns, aligned joint will be **more flexible** (\#samples could not be a perfect power) and hence faster to train in wall time. However, using cross joint will be more sample efficient (much lower VRAM compare to 'align' when num_generations=K^N), it also performs better since the value estimation is more accurate.
39+
Since the number of samples will also grow exponentially with the number of turns, aligned joint will be **more flexible** (\#samples could not be a perfect power) and hence faster to train in wall time. However, using cross joint will be **more sample efficient** (much lower VRAM compare to 'align' when num_generations=K^N), it also performs better since the **value estimation is more accurate**.
4040

4141
### Number of Turns
4242

0 commit comments

Comments
 (0)