[Docs] Add RL example for RDT, vLLM, FSDP2, and GRPO. #58314

crypdick · 2025-10-30T10:28:01Z

Description

Demonstrates RDT in using RL post-training of an LLM against the GSM8K dataset.

Related issues

Companion PR to #57961

Signed-off-by: Ricardo Decal <rdecal@anyscale.com>

crypdick · 2025-10-30T20:14:05Z

doc/source/ray-core/api/rdt-rl-vllm-grpo-fsdp2/main.py

+
+
+@ray.remote(name=REGISTRY_NAME)
+class RayObjectRefRegistry:


This is my workaround for sending the RDT ObjectRef to the vLLM workers. collective_rpc() is unable to send the ObjectRef because of the way it serializes objects: https://gist.github.com/crypdick/8bd703085f5c8f8b2f4d2def58bac516

crypdick · 2025-10-30T20:19:07Z

doc/source/ray-core/api/rdt-rl-vllm-grpo-fsdp2/learner.py

+            self.optim.step()
+        else:
+            print(
+                "[WARNING] Skipping optimizer step due to zero gradients - all samples likely have same reward",


Without this, after the first training step the Generator state collapses and outputs nothing but end of sentence tokens: <s><s><s><s><s><s><s>. One simple way to fix this is to give smoother rewards, i.e. instead of giving all wrong answers 0 reward, giving a bit more reward if the answer is closer to the target.

github-actions · 2025-11-14T00:39:39Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Qiaolin-Yu · 2025-11-14T23:51:27Z

doc/source/ray-core/api/rdt-rl-vllm-grpo-fsdp2/learner.py

+        for name, weight in state_dict.items():
+            # FIXME: Qiaolin, remove deepcopy once the bug with sending the same weights multiple times is fixed.
+            name_weight = (name, weight)
+            name_weight = copy.deepcopy(name_weight)


Do we still need to use deepcopy here?

@Qiaolin-Yu It was necessary at the time I opened the PR, I am not sure if it has been fixed since then

[Docs] Add RL example for RDT, vLLM, FSDP2, and GRPO.

78d22f8

Signed-off-by: Ricardo Decal <rdecal@anyscale.com>

crypdick assigned stephanie-wang Oct 30, 2025

crypdick commented Oct 30, 2025

View reviewed changes

stephanie-wang assigned dayshah and Qiaolin-Yu Nov 13, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 14, 2025

Qiaolin-Yu reviewed Nov 14, 2025

View reviewed changes

github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Docs] Add RL example for RDT, vLLM, FSDP2, and GRPO. #58314

[Docs] Add RL example for RDT, vLLM, FSDP2, and GRPO. #58314

Uh oh!

crypdick commented Oct 30, 2025 •

edited

Loading

Uh oh!

crypdick Oct 30, 2025

Uh oh!

crypdick Oct 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Qiaolin-Yu Nov 14, 2025

Uh oh!

crypdick Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants



		@ray.remote(name=REGISTRY_NAME)
		class RayObjectRefRegistry:

[Docs] Add RL example for RDT, vLLM, FSDP2, and GRPO. #58314

Are you sure you want to change the base?

[Docs] Add RL example for RDT, vLLM, FSDP2, and GRPO. #58314

Uh oh!

Conversation

crypdick commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Uh oh!

crypdick Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

crypdick Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 14, 2025

Uh oh!

Qiaolin-Yu Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

crypdick Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

crypdick commented Oct 30, 2025 •

edited

Loading

crypdick Oct 30, 2025 •

edited

Loading