WIP: citations, intelligent model selection, and other optimizations for DeepOrchestrator #370

saqadri · 2025-08-09T04:06:15Z

Use llm_selector to intelligently decide when to use a smart/fast/cheap model during the workflow. For example, the lead planner should always be smart.
Start prototyping citations -- currently starting with referencing tool calls, but might switch to a different approach
Batch knowledge extraction -- bulk extract knowledge at the end of every step instead of blocking each task. In theory this should speed things up

Summary by CodeRabbit

New Features
- Added configurable options for efficiency, knowledge extraction mode (per-task or batch), adaptive effort scaling, and artifact persistence.
- Introduced the ability to generate minimal agents for faster execution.
- Task outputs can now be saved as workspace artifacts.
- Knowledge items and task results now include citation/provenance details for improved traceability.
Improvements
- Enhanced task planning and execution guidance with new rules and notes for more effective and authoritative results.
- Dynamic adjustment of execution budgets based on objective complexity.
- Improved handling of model preferences during planning, verification, and synthesis phases.
Bug Fixes
- Improved exception safety and error handling in knowledge extraction and artifact persistence processes.

…for deep orchestrator

coderabbitai · 2025-08-09T04:06:23Z

Walkthrough

This update introduces new configuration options and modes to the deep orchestrator workflow, including dynamic effort scaling, batch knowledge extraction, and lean agent design. It adds provenance/citation tracking for knowledge items, enables artifact persistence of task outputs, and allows for more granular control over LLM planning and execution behaviors. Several data models and classes are extended to support these new features.

Changes

Cohort / File(s)	Change Summary
Execution Configuration Enhancements `src/mcp_agent/workflows/deep_orchestrator/config.py`	Added six new fields to `ExecutionConfig` for plan verification attempts, knowledge extraction mode, batch concurrency, lean agent design, dynamic effort scaling, and artifact persistence. No removals or changes to existing fields.
Knowledge Extraction Provenance `src/mcp_agent/workflows/deep_orchestrator/knowledge.py`	Added logic to attach provenance/citation info from the LLM to each extracted knowledge item, using the last three tool calls if available. Handles exceptions gracefully. No changes to extraction flow or public interfaces.
Citation Field in Data Models `src/mcp_agent/workflows/deep_orchestrator/models.py`	Added optional `citation` field to both `KnowledgeItem` and `TaskResult` dataclasses for provenance/citation metadata. No other changes to logic or structure.
Orchestrator Workflow and Dynamic Scaling `src/mcp_agent/workflows/deep_orchestrator/orchestrator.py`	- Passes new config flags (`knowledge_extraction_mode`, `lean_agent_design`) to `TaskExecutor`. - Implements dynamic effort scaling via an "EffortAssessor" LLM agent, adjusting execution/context budgets. - Adds batch knowledge extraction after step execution. - Makes plan verification attempts configurable. - Applies model preferences to LLM calls for planning, verification, and synthesis. - Improves error handling and logging.
Planner Prompt Updates `src/mcp_agent/workflows/deep_orchestrator/prompts.py`	Added new rules and notes to the planner instruction prompt: scaling effort to query complexity, promoting asynchronous/loosely-coupled sub-tasks, and favoring authoritative sources/tools. No changes to function or class declarations.
Task Executor Modes and Artifact Persistence `src/mcp_agent/workflows/deep_orchestrator/task_executor.py`	- Constructor extended for `knowledge_extraction_mode` and `lean_agent_design`. - Task execution now supports model preference assignment and artifact persistence. - Knowledge extraction is conditional on mode. - Lean agent design mode skips LLM agent design, creating a minimal agent. - Minor code rearrangements and comments.
Tool Call Provenance in LLM `src/mcp_agent/workflows/llm/augmented_llm.py`	- Adds private `_last_tool_calls` attribute to track tool call metadata. - `post_tool_call` now records tool call details. - New method `get_and_clear_tool_provenance` returns and clears provenance data. - Changes are exception-safe and do not alter existing tool logic.

Sequence Diagram(s)

Dynamic Effort Scaling and Batch Knowledge Extraction

sequenceDiagram
    participant User
    participant DeepOrchestrator
    participant EffortAssessor (LLM)
    participant TaskExecutor
    participant LLM
    participant Workspace

    User->>DeepOrchestrator: submit objective
    DeepOrchestrator->>EffortAssessor (LLM): assess objective complexity
    EffortAssessor (LLM)-->>DeepOrchestrator: recommended budgets
    DeepOrchestrator->>TaskExecutor: initialize with config (incl. knowledge_extraction_mode, lean_agent_design)
    loop Workflow Steps
        DeepOrchestrator->>TaskExecutor: execute step
        TaskExecutor->>LLM: execute tasks
        LLM-->>TaskExecutor: task results
        TaskExecutor->>Workspace: (optional) save artifacts
        DeepOrchestrator->>DeepOrchestrator: collect step results
        alt knowledge_extraction_mode == "batch"
            DeepOrchestrator->>LLM: batch extract knowledge from step results
            LLM-->>DeepOrchestrator: knowledge items with citation
            DeepOrchestrator->>Workspace: add knowledge to memory
        end
    end
    DeepOrchestrator-->>User: final synthesis/results

Tool Call Provenance and Knowledge Citation

sequenceDiagram
    participant TaskExecutor
    participant AugmentedLLM
    participant KnowledgeExtractor

    TaskExecutor->>AugmentedLLM: execute tool call
    AugmentedLLM->>AugmentedLLM: record tool call metadata
    AugmentedLLM-->>TaskExecutor: tool call result
    TaskExecutor->>KnowledgeExtractor: extract knowledge
    KnowledgeExtractor->>AugmentedLLM: get_and_clear_tool_provenance
    AugmentedLLM-->>KnowledgeExtractor: provenance data
    KnowledgeExtractor-->>TaskExecutor: knowledge items (with citation)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

In rabbit burrows deep and wide,
New configs and knowledge now reside.
With provenance tracked and effort scaled,
Batch wisdom gathered, artifacts detailed.
Lean agents hop with nimble might—
Oh, what a burrow of features tonight!
🐇✨

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/deep_orchestrator_rev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (1)

src/mcp_agent/workflows/deep_orchestrator/task_executor.py (1)

368-396: Lean agent design path is a solid optimization.

Good bypass of the designer round-trip with concise, focused instruction. Consider exposing the behaviors and tips via config to tune per-deployment without code changes.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e66bd0a and 64aea2d.

📒 Files selected for processing (7)

src/mcp_agent/workflows/deep_orchestrator/config.py (1 hunks)
src/mcp_agent/workflows/deep_orchestrator/knowledge.py (1 hunks)
src/mcp_agent/workflows/deep_orchestrator/models.py (2 hunks)
src/mcp_agent/workflows/deep_orchestrator/orchestrator.py (7 hunks)
src/mcp_agent/workflows/deep_orchestrator/prompts.py (2 hunks)
src/mcp_agent/workflows/deep_orchestrator/task_executor.py (5 hunks)
src/mcp_agent/workflows/llm/augmented_llm.py (3 hunks)

🧰 Additional context used

🧠 Learnings (4)

📚 Learning: 2025-07-22T18:59:49.368Z

Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/{workflows,tasks}/@(conversation_workflow,task_functions).py : Context consolidation must occur every N turns (default 3) to prevent lost-in-middle-turns, as configured.

Applied to files:

src/mcp_agent/workflows/deep_orchestrator/prompts.py

📚 Learning: 2025-07-22T18:59:49.368Z

Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/tasks/{task_functions,llm_evaluators,quality_control}.py : Implement mandatory quality pipeline with LLM-as-judge pattern, evaluating responses on seven research-based quality dimensions.

Applied to files:

src/mcp_agent/workflows/deep_orchestrator/knowledge.py

📚 Learning: 2025-07-22T18:59:49.368Z

Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/**/*.py : Use mcp-agent's Agent abstraction for ALL LLM interactions, including quality evaluation, to ensure consistent tool access, logging, and error handling.

Applied to files:

src/mcp_agent/workflows/llm/augmented_llm.py

📚 Learning: 2025-07-22T18:59:49.368Z

Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/utils/config.py : Configuration values such as quality_threshold, max_refinement_attempts, consolidation_interval, and evaluator_model_provider must be loaded from mcp_agent.config.yaml.

Applied to files:

src/mcp_agent/workflows/deep_orchestrator/config.py

🧬 Code Graph Analysis (2)

src/mcp_agent/workflows/deep_orchestrator/knowledge.py (2)

src/mcp_agent/workflows/llm/augmented_llm.py (3)

get_and_clear_tool_provenance (537-541)

get (89-90)

get (112-113)

src/mcp_agent/workflows/deep_orchestrator/models.py (1)

KnowledgeItem (41-62)

src/mcp_agent/workflows/deep_orchestrator/task_executor.py (6)

src/mcp_agent/workflows/llm/augmented_llm.py (3)

RequestParams (119-164)

generate_str (177-182)

generate_str (301-306)

src/mcp_agent/agents/agent.py (2)

attach_llm (156-192)

Agent (61-991)

src/mcp_agent/workflows/deep_orchestrator/models.py (1)

TaskStatus (16-23)

src/mcp_agent/workflows/deep_orchestrator/memory.py (1)

save_artifact (62-80)

src/mcp_agent/workflows/deep_orchestrator/knowledge.py (1)

extract_knowledge (47-147)

src/mcp_agent/workflows/deep_orchestrator/prompts.py (1)

build_agent_instruction (381-416)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks / test

🔇 Additional comments (13)

src/mcp_agent/workflows/llm/augmented_llm.py (1)

537-542: LGTM: simple, clear accessor

The get-and-clear approach matches the intended one-shot consumption.

src/mcp_agent/workflows/deep_orchestrator/models.py (1)

77-79: Ensure TaskResult.citation is propagated to downstream consumers

Confirm orchestrator/knowledge extraction uses TaskResult.citation (e.g., as a primary source of provenance for knowledge items).

If not yet wired, I can patch knowledge.py to prefer task_result.citation and only fall back to LLM provenance.

src/mcp_agent/workflows/deep_orchestrator/prompts.py (1)

70-71: LGTM: planning guidance clarifies effort scaling and source quality

The added rules push towards scalable planning and authoritative sources; consistent with orchestration goals.

Also applies to: 79-80
src/mcp_agent/workflows/deep_orchestrator/task_executor.py (3)

291-297: Per-task knowledge extraction toggle looks good.

Conditional extraction aligns with the new batch mode handled in the orchestrator. No issues.

313-314: No functional change.

Comment-only change; nothing to address.

235-246: Verify CreateMessageRequestParams Supports modelPreferences & Refactor Repeated Logic

It looks like RequestParams (which subclasses an external CreateMessageRequestParams from mcp.types) may not officially expose a modelPreferences field, and we couldn’t locate its definition in this repo. Before swallowing any errors, please confirm that dynamic assignment of modelPreferences is safe. Also, the same augmentation logic is duplicated in two branches of task_executor.py.

• Locations needing attention:
– src/mcp_agent/workflows/deep_orchestrator/task_executor.py:235–246
– src/mcp_agent/workflows/deep_orchestrator/task_executor.py:249–260

• Please verify that CreateMessageRequestParams actually defines (or allows) modelPreferences. If it does, consider extracting and reusing this helper:
 class TaskExecutor:
+    def _prepare_request_params(self, base: RequestParams | None) -> RequestParams:
+        rp = base or RequestParams(max_iterations=10)
+        # Only set modelPreferences if it's a supported attribute and we have context prefs
+        exec_prefs = getattr(self.context, "executor_model_preferences", None)
+        if exec_prefs is not None and getattr(rp, "modelPreferences", None) is None:
+            setattr(rp, "modelPreferences", exec_prefs)
+        return rp

     async def _execute(self, agent, task_context, request_params):
-        if isinstance(agent, AugmentedLLM):
-            rp = request_params or RequestParams(max_iterations=10)
-            try:
-                if not getattr(rp, "modelPreferences", None):
-                    rp.modelPreferences = getattr(self.context, "executor_model_preferences", None)
-            except Exception:
-                pass
+        if isinstance(agent, AugmentedLLM):
+            rp = self._prepare_request_params(request_params)
             output = await agent.generate_str(message=task_context, request_params=rp)
         else:
             async with agent:
-                rp = request_params or RequestParams(max_iterations=10)
-                try:
-                    if not getattr(rp, "modelPreferences", None):
-                        rp.modelPreferences = getattr(self.context, "executor_model_preferences", None)
-                except Exception:
-                    pass
+                rp = self._prepare_request_params(request_params)
                 llm = await agent.attach_llm(self.llm_factory)
                 output = await llm.generate_str(message=task_context, request_params=rp)
src/mcp_agent/workflows/deep_orchestrator/orchestrator.py (7)

204-206: Propagating new TaskExecutor flags – LGTM.

Flags align with TaskExecutor’s constructor and tie back to config.

436-458: Batch knowledge extraction per step – LGTM.

Correctly filters only current-step results and uses configured concurrency. Exceptions scoped and logged.

511-519: Planner model preference application – LGTM.

Using RequestParams with planner model preferences is consistent with executor paths.

521-523: Configurable plan verification attempts – LGTM.

Respects config with sane minimum.

566-568: Passing RequestParams to planning LLM – LGTM.

Ensures intelligent model selection during plan generation.

736-744: Synthesis RequestParams with model preferences – LGTM.

Appropriate iteration budget and preference injection.

295-335: RequestParams supports modelPreferences and AugmentedLLM handles it safely
Confirmed:

RequestParams (src/mcp_agent/workflows/llm/augmented_llm.py:119) inherits from CreateMessageRequestParams, which defines modelPreferences.

AugmentedLLM (src/mcp_agent/workflows/llm/augmented_llm.py:587–602) checks for request_params.modelPreferences and applies its values when tracing.

No changes required.

coderabbitai · 2025-08-09T04:12:40Z

src/mcp_agent/workflows/deep_orchestrator/config.py

+    # Efficiency and robustness controls
+    max_plan_verification_attempts: int = 4
+    """Maximum attempts to repair/verify a plan before proceeding"""
+
+    # Knowledge extraction strategy
+    knowledge_extraction_mode: str = "batch"
+    """Either 'per_task' or 'batch' (default) to extract knowledge after a step"""
+
+    knowledge_batch_max_concurrent: int = 3
+    """Max concurrent knowledge extraction tasks when in batch mode"""
+
+    # Token/cost optimization
+    lean_agent_design: bool = False
+    """If true, skip designer LLM call and create minimal agents for tasks"""
+
+    # Adaptive effort scaling based on objective complexity
+    dynamic_effort_scaling: bool = False
+    """If true, adjust execution/context budgets based on objective complexity"""
+
+    # Artifact persistence
+    save_task_outputs_to_artifacts: bool = True
+    """If true, persist each successful task's output into the workspace artifacts"""
+


🛠️ Refactor suggestion

Type/validate new fields (Literal and Field constraints) to prevent invalid configs

Constrain knowledge_extraction_mode to "per_task" | "batch".

Add ge=1 constraints for counts.

- max_plan_verification_attempts: int = 4 + max_plan_verification_attempts: int = Field(default=4, ge=1) @@ - knowledge_extraction_mode: str = "batch" + knowledge_extraction_mode: Literal["per_task", "batch"] = "batch" @@ - knowledge_batch_max_concurrent: int = 3 + knowledge_batch_max_concurrent: int = Field(default=3, ge=1)

Also add these imports at the top:

from typing import List, Optional, Literal from pydantic import BaseModel, ConfigDict, Field

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/config.py around lines 34 to 56, the new configuration fields lack type constraints and validation. Update knowledge_extraction_mode to use Literal["per_task", "batch"] to restrict its values. Add ge=1 constraints using Field for integer count fields like max_plan_verification_attempts and knowledge_batch_max_concurrent to ensure they are positive. Also, add the imports from typing (List, Optional, Literal) and from pydantic (BaseModel, ConfigDict, Field) at the top of the file as specified.

coderabbitai · 2025-08-09T04:12:40Z

src/mcp_agent/workflows/deep_orchestrator/knowledge.py

+                # Attach provenance/citation if available from the calling LLM
+                citation = None
+                try:
+                    if hasattr(llm, "get_and_clear_tool_provenance"):
+                        prov = llm.get_and_clear_tool_provenance()
+                        if prov:
+                            citation = {"tools": prov[-3:]}  # last few tool calls
+                except Exception:
+                    citation = None
+


🛠️ Refactor suggestion

⚠️ Potential issue

Fetch provenance once (outside the loop) and prefer TaskResult.citation; current code clears it per-item

Calling get_and_clear_tool_provenance inside the loop clears provenance after the first item; subsequent items get None.

Provenance should usually come from the task execution (TaskResult.citation), not the extraction LLM instance.

Within the loop, use a pre-fetched citation:

- # Attach provenance/citation if available from the calling LLM - citation = None - try: - if hasattr(llm, "get_and_clear_tool_provenance"): - prov = llm.get_and_clear_tool_provenance() - if prov: - citation = {"tools": prov[-3:]} # last few tool calls - except Exception: - citation = None + # Use pre-fetched citation (if any) + citation = citation_info

Add this block once right after initializing knowledge_items (outside this hunk):

# Pre-fetch citation once: prefer TaskResult.citation, else LLM provenance citation_info = None try: if getattr(task_result, "citation", None): citation_info = task_result.citation elif hasattr(llm, "get_and_clear_tool_provenance"): prov = llm.get_and_clear_tool_provenance() if prov: citation_info = {"tools": prov[-3:]} # last few tool calls except Exception: citation_info = None

Optionally sanitize provenance before storing (mask secrets, cap sizes) similar to augmented_llm._sanitize_provenance_args.

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/knowledge.py around lines 106 to 115, the code calls llm.get_and_clear_tool_provenance inside a loop, which clears provenance after the first iteration causing subsequent items to get None. To fix this, move the provenance fetching outside the loop by pre-fetching citation_info once after initializing knowledge_items, preferring task_result.citation if available, otherwise falling back to llm.get_and_clear_tool_provenance. Then, inside the loop, use this pre-fetched citation_info instead of calling the method repeatedly. Optionally, sanitize the provenance data before storing it to mask secrets or limit size.

coderabbitai · 2025-08-09T04:12:40Z

src/mcp_agent/workflows/deep_orchestrator/knowledge.py

+                        citation=citation,
                    )


🛠️ Refactor suggestion

Use the pre-fetched citation variable when constructing KnowledgeItem

Tie item to the same provenance snapshot.

- citation=citation, + citation=citation_info,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

citation=citation,

)

citation=citation_info,

)

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/knowledge.py at lines 123 to 124, the KnowledgeItem construction should use the pre-fetched citation variable instead of creating or fetching a new one. Update the code to pass the existing citation variable to ensure the item is tied to the same provenance snapshot.

coderabbitai · 2025-08-09T04:12:40Z

src/mcp_agent/workflows/deep_orchestrator/models.py

+    # Added citation/provenance
+    citation: Dict[str, Any] | None = None



🛠️ Refactor suggestion

Include citation in KnowledgeItem.to_dict to avoid dropping provenance

Without adding citation to to_dict, provenance is lost when serializing.

Apply this diff to to_dict:

def to_dict(self) -> Dict[str, Any]: """Convert to dictionary representation.""" return { "key": self.key, "value": self.value, "source": self.source, "timestamp": self.timestamp.isoformat(), "confidence": self.confidence, "category": self.category, + "citation": self.citation, }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Added citation/provenance

citation: Dict[str, Any] | None = None

def to_dict(self) -> Dict[str, Any]:

"""Convert to dictionary representation."""

return {

"key": self.key,

"value": self.value,

"source": self.source,

"timestamp": self.timestamp.isoformat(),

"confidence": self.confidence,

"category": self.category,

+ "citation": self.citation,

}

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/models.py around lines 50 to 52, the citation attribute is defined but not included in the KnowledgeItem.to_dict method, causing loss of provenance information during serialization. Update the to_dict method to include the citation field in the returned dictionary so that citation data is preserved when converting the object to a dictionary.

coderabbitai · 2025-08-09T04:12:40Z

src/mcp_agent/workflows/deep_orchestrator/orchestrator.py

+        # Optional dynamic effort scaling inspired by Anthropic research heuristics
+        if getattr(self.config.execution, "dynamic_effort_scaling", False):
+            # Cheap LLM pass to assess complexity and suggest scaling
+            assessor = Agent(
+                name="EffortAssessor",
+                instruction=(
+                    "Assess objective complexity and recommend iteration/replan/context budgets."
+                ),
+                context=self.context,
+            )
+            llm = self.llm_factory(assessor)
+            try:
+                rec = await llm.generate_structured(  # type: ignore[arg-type]
+                    message=(
+                        f"<assess>Objective: {self.objective}\n"
+                        "Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
+                    ),
+                    response_model=dict,  # Loose schema to keep it cheap
+                    request_params=RequestParams(max_iterations=1, temperature=0.1),
+                )
+                mi = int(
+                    rec.get("max_iterations", self.config.execution.max_iterations)
+                )
+                mr = int(rec.get("max_replans", self.config.execution.max_replans))
+                tcb = int(
+                    rec.get(
+                        "task_context_budget", self.config.context.task_context_budget
+                    )
+                )
+                self.config.execution.max_iterations = max(
+                    self.config.execution.max_iterations, mi
+                )
+                self.config.execution.max_replans = max(
+                    self.config.execution.max_replans, mr
+                )
+                self.config.context.task_context_budget = max(
+                    self.config.context.task_context_budget, tcb
+                )
+            except Exception:
+                pass
+


🛠️ Refactor suggestion

⚠️ Potential issue

Dynamic effort scaling: avoid dict as response model and add bounds.

Passing response_model=dict to generate_structured may fail if the implementation expects a Pydantic model. Prefer using generate_str + json.loads, or define a minimal Pydantic model.

Consider clamping recommended values to sane upper bounds to avoid runaway budgets from a bad assessor output.

Optional: allow assessor_model_preferences from context to force cheap model.

Example fix using JSON:

@@ - try: - rec = await llm.generate_structured( # type: ignore[arg-type] - message=( - f"<assess>Objective: {self.objective}\n" - "Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>" - ), - response_model=dict, # Loose schema to keep it cheap - request_params=RequestParams(max_iterations=1, temperature=0.1), - ) + try: + rp = RequestParams(max_iterations=1, temperature=0.1) + # Prefer cheap/fast model if available + try: + setattr(rp, "modelPreferences", getattr(self.context, "assessor_model_preferences", None)) + except Exception: + pass + raw = await llm.generate_str( + message=( + f"<assess>Objective: {self.objective}\n" + "Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>" + ), + request_params=rp, + ) + import json + rec = json.loads(raw) if raw else {} @@ - self.config.execution.max_iterations = max( - self.config.execution.max_iterations, mi - ) - self.config.execution.max_replans = max( - self.config.execution.max_replans, mr - ) - self.config.context.task_context_budget = max( - self.config.context.task_context_budget, tcb - ) + # Clamp updates to prevent extreme values (tunable caps) + cap_iter = getattr(self.config.execution, "max_iterations_cap", 50) + cap_replans = getattr(self.config.execution, "max_replans_cap", 10) + cap_ctx = getattr(self.config.context, "task_context_budget_cap", 20000) + self.config.execution.max_iterations = min( + max(self.config.execution.max_iterations, mi), cap_iter + ) + self.config.execution.max_replans = min( + max(self.config.execution.max_replans, mr), cap_replans + ) + self.config.context.task_context_budget = min( + max(self.config.context.task_context_budget, tcb), cap_ctx + )

If you prefer structured parsing, define a tiny Pydantic model with optional ints and defaults.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Optional dynamic effort scaling inspired by Anthropic research heuristics

if getattr(self.config.execution, "dynamic_effort_scaling", False):

# Cheap LLM pass to assess complexity and suggest scaling

assessor = Agent(

name="EffortAssessor",

instruction=(

"Assess objective complexity and recommend iteration/replan/context budgets."

),

context=self.context,

)

llm = self.llm_factory(assessor)

try:

rec = await llm.generate_structured( # type: ignore[arg-type]

message=(

f"<assess>Objective: {self.objective}\n"

"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"

),

response_model=dict, # Loose schema to keep it cheap

request_params=RequestParams(max_iterations=1, temperature=0.1),

)

mi = int(

rec.get("max_iterations", self.config.execution.max_iterations)

)

mr = int(rec.get("max_replans", self.config.execution.max_replans))

tcb = int(

rec.get(

"task_context_budget", self.config.context.task_context_budget

)

)

self.config.execution.max_iterations = max(

self.config.execution.max_iterations, mi

)

self.config.execution.max_replans = max(

self.config.execution.max_replans, mr

)

self.config.context.task_context_budget = max(

self.config.context.task_context_budget, tcb

)

except Exception:

pass

# Optional dynamic effort scaling inspired by Anthropic research heuristics

if getattr(self.config.execution, "dynamic_effort_scaling", False):

# Cheap LLM pass to assess complexity and suggest scaling

assessor = Agent(

name="EffortAssessor",

instruction=(

"Assess objective complexity and recommend iteration/replan/context budgets."

),

context=self.context,

)

llm = self.llm_factory(assessor)

try:

rp = RequestParams(max_iterations=1, temperature=0.1)

# Prefer cheap/fast model if available

try:

setattr(

rp,

"modelPreferences",

getattr(self.context, "assessor_model_preferences", None),

)

except Exception:

pass

raw = await llm.generate_str(

message=(

f"<assess>Objective: {self.objective}\n"

"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"

),

request_params=rp,

)

import json

rec = json.loads(raw) if raw else {}

mi = int(

rec.get("max_iterations", self.config.execution.max_iterations)

)

mr = int(rec.get("max_replans", self.config.execution.max_replans))

tcb = int(

rec.get(

"task_context_budget",

self.config.context.task_context_budget,

)

)

# Clamp updates to prevent extreme values (tunable caps)

cap_iter = getattr(self.config.execution, "max_iterations_cap", 50)

cap_replans = getattr(self.config.execution, "max_replans_cap", 10)

cap_ctx = getattr(self.config.context, "task_context_budget_cap", 20000)

self.config.execution.max_iterations = min(

max(self.config.execution.max_iterations, mi), cap_iter

)

self.config.execution.max_replans = min(

max(self.config.execution.max_replans, mr), cap_replans

)

self.config.context.task_context_budget = min(

max(self.config.context.task_context_budget, tcb), cap_ctx

)

except Exception:

pass

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/orchestrator.py around lines 295 to 335, the current use of response_model=dict in llm.generate_structured may cause errors if the method expects a Pydantic model. Replace this by either using generate_str combined with json.loads to parse the response or define a minimal Pydantic model with optional integer fields and defaults for max_iterations, max_replans, and task_context_budget. Additionally, after parsing the recommended values, clamp them to predefined sane upper bounds before updating the config to prevent excessively large budgets from faulty assessor outputs. Optionally, incorporate assessor_model_preferences from the context to select a cheaper model for the assessor.

coderabbitai · 2025-08-09T04:12:41Z

src/mcp_agent/workflows/deep_orchestrator/orchestrator.py

+        rp = RequestParams(max_iterations=1)
+        try:
+            rp.modelPreferences = getattr(
+                self.context, "verifier_model_preferences", None
+            )
+        except Exception:
+            pass



⚠️ Potential issue

Verifier RequestParams prepared but not used.

rp is created (with model preferences) but not passed to generate_structured. This prevents the intended model selection from taking effect.

Apply:

- result = await llm.generate_structured( - message=prompt, response_model=VerificationResult - ) + result = await llm.generate_structured( + message=prompt, response_model=VerificationResult, request_params=rp + )

Also applies to: 653-655

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/orchestrator.py around lines 635 to 642 and similarly at lines 653 to 655, the RequestParams object 'rp' is created and configured but not passed to the generate_structured function, so the model preferences are not applied. To fix this, modify the calls to generate_structured to include 'rp' as an argument, ensuring the model preferences in 'rp' are used during generation.

coderabbitai · 2025-08-09T04:12:41Z

src/mcp_agent/workflows/deep_orchestrator/task_executor.py

+        knowledge_extraction_mode: str = "per_task",
+        lean_agent_design: bool = False,
    ):


🛠️ Refactor suggestion

Document and validate new constructor flags.

Add both knowledge_extraction_mode and lean_agent_design to the Args docstring and validate knowledge_extraction_mode against allowed values to prevent typos.

Example:

class TaskExecutor: @@ def __init__( @@ - knowledge_extraction_mode: str = "per_task", + knowledge_extraction_mode: str = "per_task", lean_agent_design: bool = False, ): @@ Args: @@ enable_parallel: Whether to enable parallel execution + knowledge_extraction_mode: Either "per_task" or "batch". Controls when knowledge extraction runs. + lean_agent_design: If True, skip full agent design and use a minimal specialized agent.

Optional runtime guard:

@@ - self.knowledge_extraction_mode = knowledge_extraction_mode + if knowledge_extraction_mode not in ("per_task", "batch"): + raise ValueError("knowledge_extraction_mode must be 'per_task' or 'batch'") + self.knowledge_extraction_mode = knowledge_extraction_mode

Also applies to: 81-83

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/task_executor.py around lines 53 to 55 and 81 to 83, the new constructor flags knowledge_extraction_mode and lean_agent_design are missing documentation and validation. Update the Args section of the constructor's docstring to include descriptions for both flags. Add a validation step for knowledge_extraction_mode to check if its value is within the allowed set of options, raising an error if not, to prevent typos and invalid inputs.

coderabbitai · 2025-08-09T04:12:41Z

src/mcp_agent/workflows/deep_orchestrator/task_executor.py

+            # Persist artifact if enabled
+            try:
+                if getattr(
+                    getattr(self.context, "orchestrator_config", object()),
+                    "execution",
+                    object(),
+                ):
+                    cfg = self.context.orchestrator_config  # type: ignore[attr-defined]
+                    if getattr(cfg.execution, "save_task_outputs_to_artifacts", True):
+                        artifact_name = f"task_{task.name}.txt"
+                        self.memory.save_artifact(
+                            artifact_name, output, to_filesystem=True
+                        )
+                        result.artifacts[artifact_name] = output
+            except Exception:
+                pass
+


🛠️ Refactor suggestion

⚠️ Potential issue

Artifact persistence gate always truthy; fix guard and default behavior.

The nested getattr check with object() sentinels makes the if-condition always truthy; then accessing self.context.orchestrator_config can raise, which is silently swallowed by the broad try/except. Also, defaulting to True when the flag is missing may unexpectedly write artifacts.

Apply safer gating and conservative default:

- try: - if getattr( - getattr(self.context, "orchestrator_config", object()), - "execution", - object(), - ): - cfg = self.context.orchestrator_config # type: ignore[attr-defined] - if getattr(cfg.execution, "save_task_outputs_to_artifacts", True): - artifact_name = f"task_{task.name}.txt" - self.memory.save_artifact( - artifact_name, output, to_filesystem=True - ) - result.artifacts[artifact_name] = output - except Exception: - pass + try: + cfg = getattr(self.context, "orchestrator_config", None) + exec_cfg = getattr(cfg, "execution", None) if cfg else None + if getattr(exec_cfg, "save_task_outputs_to_artifacts", False): + artifact_name = f"task_{task.name}.txt" + self.memory.save_artifact(artifact_name, output, to_filesystem=True) + # Consider storing only a reference or truncated preview to reduce memory bloat + result.artifacts[artifact_name] = f"[artifact]{artifact_name}" + except Exception as e: + logger.debug(f"Skipping artifact persistence: {e}")

Note: WorkspaceMemory.save_artifact only writes to disk if use_filesystem is enabled (see memory.py), so this remains aligned with config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Persist artifact if enabled

try:

if getattr(

getattr(self.context, "orchestrator_config", object()),

"execution",

object(),

):

cfg = self.context.orchestrator_config # type: ignore[attr-defined]

if getattr(cfg.execution, "save_task_outputs_to_artifacts", True):

artifact_name = f"task_{task.name}.txt"

self.memory.save_artifact(

artifact_name, output, to_filesystem=True

)

result.artifacts[artifact_name] = output

except Exception:

pass

# Persist artifact if enabled

try:

cfg = getattr(self.context, "orchestrator_config", None)

exec_cfg = getattr(cfg, "execution", None) if cfg else None

if getattr(exec_cfg, "save_task_outputs_to_artifacts", False):

artifact_name = f"task_{task.name}.txt"

self.memory.save_artifact(artifact_name, output, to_filesystem=True)

# Consider storing only a reference or truncated preview to reduce memory bloat

result.artifacts[artifact_name] = f"[artifact]{artifact_name}"

except Exception as e:

logger.debug(f"Skipping artifact persistence: {e}")

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/deep_orchestrator/task_executor.py lines 267 to 283, the current nested getattr checks use object() as default, making the condition always truthy and risking attribute errors that are silently caught. To fix this, replace the nested getattr calls with a single safer check that verifies if self.context has an orchestrator_config attribute and if that config has an execution attribute. Then, explicitly check if the save_task_outputs_to_artifacts flag exists and is True, defaulting to False if missing to avoid unintended artifact saving. This ensures the guard condition is accurate and prevents silent failures.

coderabbitai · 2025-08-09T04:12:41Z

src/mcp_agent/workflows/llm/augmented_llm.py

+        # Track last tool call metadata for provenance
+        self._last_tool_calls: list[dict[str, Any]] = []



🛠️ Refactor suggestion

Bound the provenance buffer to avoid unbounded growth

Add a simple cap to keep in-memory provenance bounded.

# Track last tool call metadata for provenance - self._last_tool_calls: list[dict[str, Any]] = [] + self._last_tool_calls: list[dict[str, Any]] = [] + self._tool_prov_max = 100 # cap in-memory provenance to avoid unbounded growth

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Track last tool call metadata for provenance

self._last_tool_calls: list[dict[str, Any]] = []

# Track last tool call metadata for provenance

self._last_tool_calls: list[dict[str, Any]] = []

self._tool_prov_max = 100 # cap in-memory provenance to avoid unbounded growth

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/llm/augmented_llm.py around lines 255 to 257, the _last_tool_calls list is currently unbounded, which can lead to excessive memory usage. Implement a fixed-size buffer or cap the list length by limiting the number of stored provenance entries. For example, after appending a new entry, remove the oldest entries if the list exceeds a predefined maximum size to keep memory usage bounded.

coderabbitai · 2025-08-09T04:12:41Z

src/mcp_agent/workflows/llm/augmented_llm.py

+        # Record minimal provenance for citations
+        try:
+            self._last_tool_calls.append(
+                {
+                    "tool": request.params.name,
+                    "arguments": request.params.arguments,
+                    "tool_call_id": tool_call_id,
+                    "isError": result.isError,
+                }
+            )
+        except Exception:
+            pass
        return result


🛠️ Refactor suggestion

⚠️ Potential issue

Sanitize tool arguments, trim the buffer, and avoid silent failure

Mask secrets and cap argument size to reduce PII/secret leakage.

Trim the buffer to the configured max.

Log at debug instead of swallowing errors silently.

# Record minimal provenance for citations try: - self._last_tool_calls.append( + args = request.params.arguments or {} + safe_args = self._sanitize_provenance_args(args) + self._last_tool_calls.append( { - "tool": request.params.name, - "arguments": request.params.arguments, + "tool": request.params.name, + "arguments": safe_args, "tool_call_id": tool_call_id, "isError": result.isError, } ) + # Trim to max + if len(self._last_tool_calls) > getattr(self, "_tool_prov_max", 100): + del self._last_tool_calls[0 : len(self._last_tool_calls) - self._tool_prov_max] except Exception: - pass + if getattr(self, "logger", None): + self.logger.debug("Failed to record tool provenance", exc_info=True)

Add this helper method in the class (outside this hunk):

def _sanitize_provenance_args(self, args: dict[str, Any]) -> dict[str, Any]: sanitized: dict[str, Any] = {} for k, v in (args or {}).items(): key = str(k).lower() if any(s in key for s in ("secret", "token", "password", "apikey", "api_key", "authorization", "auth")): sanitized[k] = "***" continue s = str(v) sanitized[k] = s if len(s) <= 500 else s[:500] + "..." return sanitized

🤖 Prompt for AI Agents

In src/mcp_agent/workflows/llm/augmented_llm.py around lines 439 to 451, the code appends tool call details without sanitizing arguments or handling errors properly. Fix this by adding a helper method _sanitize_provenance_args to mask secrets and trim argument strings to 500 characters. Then, replace request.params.arguments with the sanitized version before appending. Also, change the except block to log the exception at debug level instead of silently passing.

WIP: citations, intelligent model selection, and other optimizations …

64aea2d

…for deep orchestrator

coderabbitai bot reviewed Aug 9, 2025

View reviewed changes

		# Added citation/provenance
		citation: Dict[str, Any] \| None = None

-    # Added citation/provenance
-    citation: Dict[str, Any] | None = None
+     def to_dict(self) -> Dict[str, Any]:
+         """Convert to dictionary representation."""
+         return {
+             "key": self.key,
+             "value": self.value,
+             "source": self.source,
+             "timestamp": self.timestamp.isoformat(),
+             "confidence": self.confidence,
+             "category": self.category,
++            "citation": self.citation,
+         }

		# Track last tool call metadata for provenance
		self._last_tool_calls: list[dict[str, Any]] = []

WIP: citations, intelligent model selection, and other optimizations for DeepOrchestrator #370

Are you sure you want to change the base?

WIP: citations, intelligent model selection, and other optimizations for DeepOrchestrator #370

Uh oh!

Conversation

saqadri commented Aug 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Dynamic Effort Scaling and Batch Knowledge Extraction

Tool Call Provenance and Knowledge Citation

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

saqadri commented Aug 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)