Skip to content

Commit a044032

Browse files
committed
Update
[ghstack-poisoned]
2 parents 44cfcca + e4283d7 commit a044032

File tree

28 files changed

+5289
-367
lines changed

28 files changed

+5289
-367
lines changed

.github/unittest/llm/scripts_llm/environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,4 @@ dependencies:
2222
- transformers
2323
- datasets
2424
- vllm
25+
- mcp

.github/unittest/llm/scripts_llm/install.sh

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,17 @@ python -m pip install -e . --no-build-isolation
6161

6262
# smoke test
6363
python -c "import torchrl"
64+
65+
# Install MCP dependencies for tool execution tests
66+
printf "* Installing MCP dependencies (uvx, Deno)\n"
67+
68+
# Install uvx (universal package runner)
69+
pip install uvx
70+
71+
# Install Deno (required by mcp-run-python)
72+
curl -fsSL https://deno.land/install.sh | sh
73+
export PATH="$HOME/.deno/bin:$PATH"
74+
75+
# Verify installations
76+
uvx --version || echo "Warning: uvx not installed"
77+
deno --version || echo "Warning: Deno not installed"

docs/source/reference/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ API Reference
1010
llms
1111
modules
1212
objectives
13+
services
1314
trainers
1415
utils
1516
config

docs/source/reference/llms.rst

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -930,7 +930,9 @@ Tools are usually implemented as transforms, and appended to a base environment
930930
such as :class:`~torchrl.envs.llm.ChatEnv`.
931931

932932
An example of a tool transform is the :class:`~torchrl.envs.llm.transforms.PythonInterpreter` transform, which is used
933-
to execute Python code in the context of the LLM.
933+
to execute Python code in the context of the LLM. The PythonInterpreter can optionally use a shared
934+
:class:`~torchrl.envs.llm.transforms.PythonExecutorService` for efficient resource usage across multiple environments.
935+
See :ref:`ref_services` for more details on the service registry system.
934936

935937
>>> from torchrl.envs.llm.transforms import PythonInterpreter
936938
>>> from torchrl.envs.llm import ChatEnv
@@ -1141,6 +1143,7 @@ By following these design principles, reward transforms can be effectively integ
11411143
KLRewardTransform
11421144
MCPToolTransform
11431145
PolicyVersion
1146+
PythonExecutorService
11441147
PythonInterpreter
11451148
RayDataLoadingPrimer
11461149
RetrieveKL
@@ -1155,20 +1158,22 @@ Objectives
11551158

11561159
LLM post-training requires specialized loss functions that are adapted to the unique characteristics of language models.
11571160

1158-
GRPO
1159-
~~~~
1160-
1161-
The :class:`~torchrl.objectives.llm.GRPOLoss` class is a thin wrapper around the :class:`~torchrl.objectives.PPOLoss` class
1162-
that codes the LLM-specific functionalities.
1161+
GRPO, DAPO, CISPO
1162+
^^^^^^^^^^^^^^^^^
11631163

11641164
.. currentmodule:: torchrl.objectives.llm
11651165

11661166
.. autosummary::
11671167
:toctree: generated/
11681168
:template: rl_template.rst
11691169

1170+
LLMLossOutput
11701171
GRPOLoss
11711172
GRPOLossOutput
1173+
CISPOLoss
1174+
CISPOLossOutput
1175+
DAPO
1176+
DAPOLossOutput
11721177
MCAdvantage
11731178

11741179
SFT

0 commit comments

Comments
 (0)