Remake RFC directory, add new RFCs #57

Darktex · 2025-10-21T05:26:05Z

Summary

This PR reorganizes and expands our RFC suite to establish a solid foundation for OpenEnv Phase 1. We're introducing two critical new RFCs (001: Abstractions, 003: MCP Support), renumbering existing RFCs for logical flow, and harmonizing terminology and cross-references across all documents.

RFC Structure Changes

Renumbering

RFC 002 → RFC 004: Actions as Tool Calls (actions-as-tool-calls)
RFC 001 → RFC 002: OpenEnv Framework Spec (env-spec)

New RFCs

RFC 001: OpenEnv Basic Abstractions (NEW)
RFC 003: MCP (Model Context Protocol) Support (NEW)

Final Suite

RFC 000: Project Phases (existing) RFC 001: OpenEnv Basic Abstractions (NEW) RFC 002: OpenEnv Framework Spec (renamed from 001) RFC 003: MCP Support (NEW) RFC 004: Actions as Tool Calls (renamed from 002)

What's New

RFC 001: OpenEnv Basic Abstractions

Purpose: Defines the foundational contract between agents and environments.

Key contributions:

Core type definitions: Action, Observation, State base classes
Environment interface: reset(), step(), state() methods (Gym-compatible)
Agent interface: Thin wrapper pattern for model/policy + tokenizer
Task/Dataset abstraction: PyTorch IterableDataset-compatible task loading
Architectural diagrams: Clarifies what belongs in Environment vs Agent vs Outer System

Why this matters: Establishes the core abstractions that everything else builds on. Makes explicit the boundaries between Agent (thin: model + tokenizer + history) and Environment (everything external: tools, sandbox, state, rewards).

Inspiration: Draws from both traditional RL (Gym/Gymnasium) and modern agentic frameworks (Smolagents, AgentKit), but maintains a clear separation of concerns.

RFC 003: MCP (Model Context Protocol) Support

Purpose: Defines how OpenEnv integrates with MCP to expose external tools to agents.

Key contributions:

Dual paradigm support: Traditional tool calling (explicit ToolCallAction) AND CodeAct (pre-imported tools in Python namespace)
MCP client integration: Architecture for connecting to MCP servers without custom MCP implementations
Tool registry & namespace injection: Dynamic Python function generation from MCP tool definitions
Deployment patterns: Docker Compose orchestration for MCP servers alongside environments
Security-first CodeAct: Pre-import tools instead of allowing arbitrary imports; filter import statements from agent code

Why this matters: MCP is becoming the standard for tool exposure in the AI ecosystem. This RFC ensures OpenEnv can leverage MCP servers while supporting both traditional and CodeAct agent patterns.

Key design decision: In CodeAct mode, tools are pre-imported into the execution namespace (no import statements from agent code). This is more secure, deterministic, and reliable than allowing arbitrary imports.

Harmonization Work

Beyond the new content, we've harmonized all RFCs to eliminate inconsistencies:

Fixed Cross-References

RFC 001: Fixed reference to unified action interface (was RFC 003, now correctly RFC 004)
RFC 002: Updated to reference RFC 004 for actions() method
RFC 003: Completed missing content section, fixed self-reference bug (referenced itself instead of RFC 004)
RFC 004: Added relationship note explaining connection to RFC 003

Consistent Terminology

Primary method: actions() for action discovery (universal to all env types)
Backward-compatible alias: tools() (for RFC 003 compatibility)
All RFCs now use this consistently

Complete Reference Chains

All RFCs now reference RFC 000 for phasing context
Cross-references use correct RFC numbers throughout
References sections list RFCs in logical order (000 → 001 → 002 → 003 → 004)

Added Missing Content

RFC 001: Added missing TypedDict import
RFC 003: Completed missing ~40 lines explaining MCP adaptation for CodeAct (was jumping from line 73 to system prompts without content)
RFC 004: Added clarifying note about relationship with RFC 003

Unified Design Notes

All RFCs now consistently explain:

Separation of concerns (Agent vs Environment vs Dataset vs Eval)
State ownership (external state in Environment, internal state in Agent)
Compatibility with both RL frameworks and agentic frameworks
PyTorch DataLoader integration

Logical Flow of the Suite

Reading order for understanding OpenEnv:

RFC 000: Understand the phasing plan and what we're building towards
RFC 001: Learn the core abstractions (Environment, Agent, Task, etc.)
RFC 002: See how environments are deployed and communicated with (Docker + HTTP)
RFC 004: Understand the unified action interface inspired by MCP
RFC 003: Learn how MCP tools integrate with the action system

init27 · 2025-10-21T05:30:42Z

Thanks so much this is awesome and really detailed, I have one big nit:

The numbers are triggering my OCD is it possible to renumber please? :D


RFC 000: Understand the phasing plan and what we're building towards
RFC 001: Learn the core abstractions (Environment, Agent, Task, etc.)
RFC 002: See how environments are deployed and communicated with (Docker + HTTP)
RFC 004: Understand the unified action interface inspired by MCP
RFC 003: Learn how MCP tools integrate with the action system

pankit-eng

Left minor comments and open questions which can be taken up as decisions or open questions

rfcs/001-abstractions.md

pankit-eng · 2025-10-21T07:03:08Z

rfcs/001-abstractions.md

+Let's then look at the ingredients that need to belong to an abstraction and then we will introduce how we propose to group them.
+
+1. **Tokenizer**. The model understands token IDs, not text. At some point you need to tokenize/detokenize. You _could_ have the inference server own this and simply communicate with text in/text out (e.g. like OpenAI API).
+2. **Data/task loading**. Environments contain the necessary machinery to compute responses to the policy, but the policy makes its move when given a _task_ (e.g. a question). This comes from somewhere else: when training/testing, it comes from a dataset. When in production, it comes from a user while the model waits behind an endpoint.


Specifically, during post training, this data comes from replay buffer, right?
If yes, we can be precise.

No, this data actually comes from the dataloader!

pankit-eng · 2025-10-21T07:05:39Z

rfcs/001-abstractions.md

+4. **Evals**. They are similar to rewards in that they compute some score based on what the policy did, but they differ in two key ways:
+    a. They are **data-dependent**. Evals are always connected to their dataset, and they can assume a specific format for it.
+    b. They are **aggregated**. Unlike rewards where you get a score per-sample, here the score that matters is after aggregation.
+5. **Tools**. External functions that the agent may or may not call while solving its task. They may be local or remote. These are often standardized using MCP. There are two schools of thought on whether a tool call should be a _whole_ action (traditional tool calling), or _part_ of an action (CodeAct paradigm). We will support both, *and* we will support converting from one to the other without requiring that users write their env twice.


For local ones like - environment's local file system, local code executor, are we suggesting that even those are MCP? If not, then this section is primarily about external tools.

pankit-eng · 2025-10-21T07:07:28Z

rfcs/001-abstractions.md

+### Environments vs Agents
+As mentioned before, an area of confusion is how to draw abstraction boundaries between Agents and Environments.
+
+<claude: draw me an ASCII with two boxes, one being the Agent and the other being the Environment. One arrow goes from Agent to Environment and it's labeled Action, and the other goes from the Environment to the Agent and it's labeled Observation>


remove this?

LMAO busted!

well... now you have to keep it

pankit-eng · 2025-10-21T07:09:19Z

rfcs/001-abstractions.md

+#### Proposed Abstractions
+This is the contract that we are proposing. We feel it strikes a good balance between supporting single-turn environments for LLM post-training (such as the GSM8K) while also extending to the more complex agentic tasks, such as [Tau-Bench](https://arxiv.org/abs/2406.12045). We are aiming for flexibility, so we know we may not get this right the first time. We encourage strong feedback to this RFC so that we can improve on it!
+
+These are the key abstractions that we expect. Note that in this project we only implement the "Environment" abstraction under the our meaning. You can map to other "agents" or "environment" abstractions by writing adapters to and from OpenEnvs.


implement the "Environment" abstraction under the our meaning

implement the "Environment" abstraction under our meaning?

pankit-eng · 2025-10-21T07:17:36Z

rfcs/001-abstractions.md

+class Task(Generic[T]):
+    """Represents a single task instance.
+
+    Tasks are provided to the environment at reset time. They contain


IIUC, an agent loads up a task from the dataset loader. This is a pull based load. However, the agent has no reference to datasetloader interface. I would have expected to see agent holding a reference and then using the loaded task to translate into an action.

Also, what happens when the agent makes multiple calls into the environment based on observation on each call? Meaning - at what point does the agent load the next task?

pankit-eng · 2025-10-21T07:32:12Z

rfcs/003-mcp-support.md

+│  │  )                                                      │   │
+│  │  env.step(action)                                       │   │
+│  │                                                         │   │
+│  │  # CodeAct style (NEW)                                  │   │


QQs:

Is it possible that the same client uses CodeAct as well as traditional ToolAction as well?

The env server needs to know whether the client is using CodeAct or not - primarily to know that the code to be executed needs to persist its state in the interpreter which is applicable to bash execution as well as python.

pankit-eng · 2025-10-21T07:35:35Z

rfcs/003-mcp-support.md

+
+#### 1. MCP Client Library
+
+We need an MCP client that can run inside our Python execution environments. We have three options:


We will need MCP client irrespective of python execution env or not. For traditional tool calling - env acts as a proxy and hence, it will forward the call to the right MCP server.

Unrelated - have we covered the secrets aspect of MCP tool calling here? the container will need to mount secrets as ENV vars or depending on the provider for MCP server calling.

pankit-eng · 2025-10-21T07:39:06Z

rfcs/003-mcp-support.md

+
+### Integration with Environment Interface
+
+#### Traditional Tool Calling (RFC 003 Style)


More of Q: Does agent need to know the local tools available at its disposal? Like some special library or anything or even a sql server that agent can run SQL commands against?

pankit-eng · 2025-10-21T07:42:05Z

rfcs/003-mcp-support.md

+
+CMD ["python", "my_tools_server.py"]
+```
+## Open Questions


Please feel free to add any open questions from comments in here or decision points.

…tions - RFC 000: Project phases and approach - RFC 001: Core abstractions (Environment, Agent, Task, Dataset) - RFC 002: Environment specification (reset, step, state APIs) - RFC 003: MCP integration for tools (remote + local, CodeAct support) - RFC 004: Unified action interface for all environment types This PR reorganizes and expands our RFC structure: - Renamed old RFC 002 → RFC 004 - Renamed old RFC 001 → RFC 002 - Added new RFC 001 (abstractions) and RFC 003 (MCP) - Updated RFC 000 with review feedback - Harmonized language and cross-references across all RFCs

Darktex · 2025-10-21T17:29:16Z

Oooof, I asked Claude to rebase this for me and instead it merged it :/ I will write another PR to address these comments lol

- Change 'customers' to 'community' (more open source appropriate) - Clarify data/task loading during post-training vs testing vs production - Expand tools definition to distinguish remote vs local tools - Remove obsolete Claude comment placeholder - Fix typo: 'under the our' → 'under our'

Darktex requested review from pankit-eng and zkwentz October 21, 2025 05:26

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 21, 2025

pankit-eng approved these changes Oct 21, 2025

View reviewed changes

Darktex force-pushed the rfc-abstractions branch from 888a8de to b422145 Compare October 21, 2025 17:26

Darktex merged commit b422145 into main Oct 21, 2025
1 check passed


		#### 1. MCP Client Library

		We need an MCP client that can run inside our Python execution environments. We have three options:


		### Integration with Environment Interface

		#### Traditional Tool Calling (RFC 003 Style)

Remake RFC directory, add new RFCs #57

Remake RFC directory, add new RFCs #57

Uh oh!

Conversation

Darktex commented Oct 21, 2025

Summary

RFC Structure Changes

Renumbering

New RFCs

Final Suite

What's New

RFC 001: OpenEnv Basic Abstractions

RFC 003: MCP (Model Context Protocol) Support

Harmonization Work

Fixed Cross-References

Consistent Terminology

Complete Reference Chains

Added Missing Content

Unified Design Notes

Logical Flow of the Suite

Uh oh!

init27 commented Oct 21, 2025

Uh oh!

pankit-eng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Darktex commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants