Skip to content

Conversation

Darktex
Copy link
Contributor

@Darktex Darktex commented Oct 21, 2025

Summary

This PR reorganizes and expands our RFC suite to establish a solid foundation for OpenEnv Phase 1. We're introducing two critical new RFCs (001: Abstractions, 003: MCP Support), renumbering existing RFCs for logical flow, and harmonizing terminology and cross-references across all documents.

RFC Structure Changes

Renumbering

  • RFC 002RFC 004: Actions as Tool Calls (actions-as-tool-calls)
  • RFC 001RFC 002: OpenEnv Framework Spec (env-spec)

New RFCs

  • RFC 001: OpenEnv Basic Abstractions (NEW)
  • RFC 003: MCP (Model Context Protocol) Support (NEW)

Final Suite

RFC 000: Project Phases (existing) RFC 001: OpenEnv Basic Abstractions (NEW) RFC 002: OpenEnv Framework Spec (renamed from 001) RFC 003: MCP Support (NEW) RFC 004: Actions as Tool Calls (renamed from 002)

What's New

RFC 001: OpenEnv Basic Abstractions

Purpose: Defines the foundational contract between agents and environments.

Key contributions:

  • Core type definitions: Action, Observation, State base classes
  • Environment interface: reset(), step(), state() methods (Gym-compatible)
  • Agent interface: Thin wrapper pattern for model/policy + tokenizer
  • Task/Dataset abstraction: PyTorch IterableDataset-compatible task loading
  • Architectural diagrams: Clarifies what belongs in Environment vs Agent vs Outer System

Why this matters: Establishes the core abstractions that everything else builds on. Makes explicit the boundaries between Agent (thin: model + tokenizer + history) and Environment (everything external: tools, sandbox, state, rewards).

Inspiration: Draws from both traditional RL (Gym/Gymnasium) and modern agentic frameworks (Smolagents, AgentKit), but maintains a clear separation of concerns.

RFC 003: MCP (Model Context Protocol) Support

Purpose: Defines how OpenEnv integrates with MCP to expose external tools to agents.

Key contributions:

  • Dual paradigm support: Traditional tool calling (explicit ToolCallAction) AND CodeAct (pre-imported tools in Python namespace)
  • MCP client integration: Architecture for connecting to MCP servers without custom MCP implementations
  • Tool registry & namespace injection: Dynamic Python function generation from MCP tool definitions
  • Deployment patterns: Docker Compose orchestration for MCP servers alongside environments
  • Security-first CodeAct: Pre-import tools instead of allowing arbitrary imports; filter import statements from agent code

Why this matters: MCP is becoming the standard for tool exposure in the AI ecosystem. This RFC ensures OpenEnv can leverage MCP servers while supporting both traditional and CodeAct agent patterns.

Key design decision: In CodeAct mode, tools are pre-imported into the execution namespace (no import statements from agent code). This is more secure, deterministic, and reliable than allowing arbitrary imports.

Harmonization Work

Beyond the new content, we've harmonized all RFCs to eliminate inconsistencies:

Fixed Cross-References

  • RFC 001: Fixed reference to unified action interface (was RFC 003, now correctly RFC 004)
  • RFC 002: Updated to reference RFC 004 for actions() method
  • RFC 003: Completed missing content section, fixed self-reference bug (referenced itself instead of RFC 004)
  • RFC 004: Added relationship note explaining connection to RFC 003

Consistent Terminology

  • Primary method: actions() for action discovery (universal to all env types)
  • Backward-compatible alias: tools() (for RFC 003 compatibility)
  • All RFCs now use this consistently

Complete Reference Chains

  • All RFCs now reference RFC 000 for phasing context
  • Cross-references use correct RFC numbers throughout
  • References sections list RFCs in logical order (000 → 001 → 002 → 003 → 004)

Added Missing Content

  • RFC 001: Added missing TypedDict import
  • RFC 003: Completed missing ~40 lines explaining MCP adaptation for CodeAct (was jumping from line 73 to system prompts without content)
  • RFC 004: Added clarifying note about relationship with RFC 003

Unified Design Notes

All RFCs now consistently explain:

  • Separation of concerns (Agent vs Environment vs Dataset vs Eval)
  • State ownership (external state in Environment, internal state in Agent)
  • Compatibility with both RL frameworks and agentic frameworks
  • PyTorch DataLoader integration

Logical Flow of the Suite

Reading order for understanding OpenEnv:

  1. RFC 000: Understand the phasing plan and what we're building towards
  2. RFC 001: Learn the core abstractions (Environment, Agent, Task, etc.)
  3. RFC 002: See how environments are deployed and communicated with (Docker + HTTP)
  4. RFC 004: Understand the unified action interface inspired by MCP
  5. RFC 003: Learn how MCP tools integrate with the action system

@Darktex Darktex requested review from pankit-eng and zkwentz October 21, 2025 05:26
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 21, 2025
@init27
Copy link
Contributor

init27 commented Oct 21, 2025

Thanks so much this is awesome and really detailed, I have one big nit:

The numbers are triggering my OCD is it possible to renumber please? :D


RFC 000: Understand the phasing plan and what we're building towards
RFC 001: Learn the core abstractions (Environment, Agent, Task, etc.)
RFC 002: See how environments are deployed and communicated with (Docker + HTTP)
RFC 004: Understand the unified action interface inspired by MCP
RFC 003: Learn how MCP tools integrate with the action system

Copy link
Contributor

@pankit-eng pankit-eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left minor comments and open questions which can be taken up as decisions or open questions

Let's then look at the ingredients that need to belong to an abstraction and then we will introduce how we propose to group them.

1. **Tokenizer**. The model understands token IDs, not text. At some point you need to tokenize/detokenize. You _could_ have the inference server own this and simply communicate with text in/text out (e.g. like OpenAI API).
2. **Data/task loading**. Environments contain the necessary machinery to compute responses to the policy, but the policy makes its move when given a _task_ (e.g. a question). This comes from somewhere else: when training/testing, it comes from a dataset. When in production, it comes from a user while the model waits behind an endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, during post training, this data comes from replay buffer, right?
If yes, we can be precise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this data actually comes from the dataloader!

4. **Evals**. They are similar to rewards in that they compute some score based on what the policy did, but they differ in two key ways:
a. They are **data-dependent**. Evals are always connected to their dataset, and they can assume a specific format for it.
b. They are **aggregated**. Unlike rewards where you get a score per-sample, here the score that matters is after aggregation.
5. **Tools**. External functions that the agent may or may not call while solving its task. They may be local or remote. These are often standardized using MCP. There are two schools of thought on whether a tool call should be a _whole_ action (traditional tool calling), or _part_ of an action (CodeAct paradigm). We will support both, *and* we will support converting from one to the other without requiring that users write their env twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For local ones like - environment's local file system, local code executor, are we suggesting that even those are MCP? If not, then this section is primarily about external tools.

### Environments vs Agents
As mentioned before, an area of confusion is how to draw abstraction boundaries between Agents and Environments.

<claude: draw me an ASCII with two boxes, one being the Agent and the other being the Environment. One arrow goes from Agent to Environment and it's labeled Action, and the other goes from the Environment to the Agent and it's labeled Observation>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMAO busted!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... now you have to keep it

#### Proposed Abstractions
This is the contract that we are proposing. We feel it strikes a good balance between supporting single-turn environments for LLM post-training (such as the GSM8K) while also extending to the more complex agentic tasks, such as [Tau-Bench](https://arxiv.org/abs/2406.12045). We are aiming for flexibility, so we know we may not get this right the first time. We encourage strong feedback to this RFC so that we can improve on it!

These are the key abstractions that we expect. Note that in this project we only implement the "Environment" abstraction under the our meaning. You can map to other "agents" or "environment" abstractions by writing adapters to and from OpenEnvs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement the "Environment" abstraction under the our meaning

implement the "Environment" abstraction under our meaning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased

class Task(Generic[T]):
"""Represents a single task instance.

Tasks are provided to the environment at reset time. They contain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, an agent loads up a task from the dataset loader. This is a pull based load. However, the agent has no reference to datasetloader interface. I would have expected to see agent holding a reference and then using the loaded task to translate into an action.

Also, what happens when the agent makes multiple calls into the environment based on observation on each call? Meaning - at what point does the agent load the next task?

│ │ ) │ │
│ │ env.step(action) │ │
│ │ │ │
│ │ # CodeAct style (NEW) │ │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQs:

  1. Is it possible that the same client uses CodeAct as well as traditional ToolAction as well?
  2. The env server needs to know whether the client is using CodeAct or not - primarily to know that the code to be executed needs to persist its state in the interpreter which is applicable to bash execution as well as python.


#### 1. MCP Client Library

We need an MCP client that can run inside our Python execution environments. We have three options:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need MCP client irrespective of python execution env or not. For traditional tool calling - env acts as a proxy and hence, it will forward the call to the right MCP server.

Unrelated - have we covered the secrets aspect of MCP tool calling here? the container will need to mount secrets as ENV vars or depending on the provider for MCP server calling.


### Integration with Environment Interface

#### Traditional Tool Calling (RFC 003 Style)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of Q: Does agent need to know the local tools available at its disposal? Like some special library or anything or even a sql server that agent can run SQL commands against?


CMD ["python", "my_tools_server.py"]
```
## Open Questions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to add any open questions from comments in here or decision points.

…tions

- RFC 000: Project phases and approach
- RFC 001: Core abstractions (Environment, Agent, Task, Dataset)
- RFC 002: Environment specification (reset, step, state APIs)
- RFC 003: MCP integration for tools (remote + local, CodeAct support)
- RFC 004: Unified action interface for all environment types

This PR reorganizes and expands our RFC structure:
- Renamed old RFC 002 → RFC 004
- Renamed old RFC 001 → RFC 002
- Added new RFC 001 (abstractions) and RFC 003 (MCP)
- Updated RFC 000 with review feedback
- Harmonized language and cross-references across all RFCs
@Darktex Darktex merged commit b422145 into main Oct 21, 2025
1 check passed
@Darktex
Copy link
Contributor Author

Darktex commented Oct 21, 2025

Oooof, I asked Claude to rebase this for me and instead it merged it :/ I will write another PR to address these comments lol

Darktex added a commit that referenced this pull request Oct 21, 2025
- Change 'customers' to 'community' (more open source appropriate)
- Clarify data/task loading during post-training vs testing vs production
- Expand tools definition to distinguish remote vs local tools
- Remove obsolete Claude comment placeholder
- Fix typo: 'under the our' → 'under our'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants