Skip to content

Conversation

saqadri
Copy link
Collaborator

@saqadri saqadri commented Aug 9, 2025

  • Use llm_selector to intelligently decide when to use a smart/fast/cheap model during the workflow. For example, the lead planner should always be smart.
  • Start prototyping citations -- currently starting with referencing tool calls, but might switch to a different approach
  • Batch knowledge extraction -- bulk extract knowledge at the end of every step instead of blocking each task. In theory this should speed things up

Summary by CodeRabbit

  • New Features

    • Added configurable options for efficiency, knowledge extraction mode (per-task or batch), adaptive effort scaling, and artifact persistence.
    • Introduced the ability to generate minimal agents for faster execution.
    • Task outputs can now be saved as workspace artifacts.
    • Knowledge items and task results now include citation/provenance details for improved traceability.
  • Improvements

    • Enhanced task planning and execution guidance with new rules and notes for more effective and authoritative results.
    • Dynamic adjustment of execution budgets based on objective complexity.
    • Improved handling of model preferences during planning, verification, and synthesis phases.
  • Bug Fixes

    • Improved exception safety and error handling in knowledge extraction and artifact persistence processes.

Copy link

coderabbitai bot commented Aug 9, 2025

Walkthrough

This update introduces new configuration options and modes to the deep orchestrator workflow, including dynamic effort scaling, batch knowledge extraction, and lean agent design. It adds provenance/citation tracking for knowledge items, enables artifact persistence of task outputs, and allows for more granular control over LLM planning and execution behaviors. Several data models and classes are extended to support these new features.

Changes

Cohort / File(s) Change Summary
Execution Configuration Enhancements
src/mcp_agent/workflows/deep_orchestrator/config.py
Added six new fields to ExecutionConfig for plan verification attempts, knowledge extraction mode, batch concurrency, lean agent design, dynamic effort scaling, and artifact persistence. No removals or changes to existing fields.
Knowledge Extraction Provenance
src/mcp_agent/workflows/deep_orchestrator/knowledge.py
Added logic to attach provenance/citation info from the LLM to each extracted knowledge item, using the last three tool calls if available. Handles exceptions gracefully. No changes to extraction flow or public interfaces.
Citation Field in Data Models
src/mcp_agent/workflows/deep_orchestrator/models.py
Added optional citation field to both KnowledgeItem and TaskResult dataclasses for provenance/citation metadata. No other changes to logic or structure.
Orchestrator Workflow and Dynamic Scaling
src/mcp_agent/workflows/deep_orchestrator/orchestrator.py
- Passes new config flags (knowledge_extraction_mode, lean_agent_design) to TaskExecutor.
- Implements dynamic effort scaling via an "EffortAssessor" LLM agent, adjusting execution/context budgets.
- Adds batch knowledge extraction after step execution.
- Makes plan verification attempts configurable.
- Applies model preferences to LLM calls for planning, verification, and synthesis.
- Improves error handling and logging.
Planner Prompt Updates
src/mcp_agent/workflows/deep_orchestrator/prompts.py
Added new rules and notes to the planner instruction prompt: scaling effort to query complexity, promoting asynchronous/loosely-coupled sub-tasks, and favoring authoritative sources/tools. No changes to function or class declarations.
Task Executor Modes and Artifact Persistence
src/mcp_agent/workflows/deep_orchestrator/task_executor.py
- Constructor extended for knowledge_extraction_mode and lean_agent_design.
- Task execution now supports model preference assignment and artifact persistence.
- Knowledge extraction is conditional on mode.
- Lean agent design mode skips LLM agent design, creating a minimal agent.
- Minor code rearrangements and comments.
Tool Call Provenance in LLM
src/mcp_agent/workflows/llm/augmented_llm.py
- Adds private _last_tool_calls attribute to track tool call metadata.
- post_tool_call now records tool call details.
- New method get_and_clear_tool_provenance returns and clears provenance data.
- Changes are exception-safe and do not alter existing tool logic.

Sequence Diagram(s)

Dynamic Effort Scaling and Batch Knowledge Extraction

sequenceDiagram
    participant User
    participant DeepOrchestrator
    participant EffortAssessor (LLM)
    participant TaskExecutor
    participant LLM
    participant Workspace

    User->>DeepOrchestrator: submit objective
    DeepOrchestrator->>EffortAssessor (LLM): assess objective complexity
    EffortAssessor (LLM)-->>DeepOrchestrator: recommended budgets
    DeepOrchestrator->>TaskExecutor: initialize with config (incl. knowledge_extraction_mode, lean_agent_design)
    loop Workflow Steps
        DeepOrchestrator->>TaskExecutor: execute step
        TaskExecutor->>LLM: execute tasks
        LLM-->>TaskExecutor: task results
        TaskExecutor->>Workspace: (optional) save artifacts
        DeepOrchestrator->>DeepOrchestrator: collect step results
        alt knowledge_extraction_mode == "batch"
            DeepOrchestrator->>LLM: batch extract knowledge from step results
            LLM-->>DeepOrchestrator: knowledge items with citation
            DeepOrchestrator->>Workspace: add knowledge to memory
        end
    end
    DeepOrchestrator-->>User: final synthesis/results
Loading

Tool Call Provenance and Knowledge Citation

sequenceDiagram
    participant TaskExecutor
    participant AugmentedLLM
    participant KnowledgeExtractor

    TaskExecutor->>AugmentedLLM: execute tool call
    AugmentedLLM->>AugmentedLLM: record tool call metadata
    AugmentedLLM-->>TaskExecutor: tool call result
    TaskExecutor->>KnowledgeExtractor: extract knowledge
    KnowledgeExtractor->>AugmentedLLM: get_and_clear_tool_provenance
    AugmentedLLM-->>KnowledgeExtractor: provenance data
    KnowledgeExtractor-->>TaskExecutor: knowledge items (with citation)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

In rabbit burrows deep and wide,
New configs and knowledge now reside.
With provenance tracked and effort scaled,
Batch wisdom gathered, artifacts detailed.
Lean agents hop with nimble might—
Oh, what a burrow of features tonight!
🐇✨

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/deep_orchestrator_rev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (1)
src/mcp_agent/workflows/deep_orchestrator/task_executor.py (1)

368-396: Lean agent design path is a solid optimization.

Good bypass of the designer round-trip with concise, focused instruction. Consider exposing the behaviors and tips via config to tune per-deployment without code changes.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e66bd0a and 64aea2d.

📒 Files selected for processing (7)
  • src/mcp_agent/workflows/deep_orchestrator/config.py (1 hunks)
  • src/mcp_agent/workflows/deep_orchestrator/knowledge.py (1 hunks)
  • src/mcp_agent/workflows/deep_orchestrator/models.py (2 hunks)
  • src/mcp_agent/workflows/deep_orchestrator/orchestrator.py (7 hunks)
  • src/mcp_agent/workflows/deep_orchestrator/prompts.py (2 hunks)
  • src/mcp_agent/workflows/deep_orchestrator/task_executor.py (5 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm.py (3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/{workflows,tasks}/@(conversation_workflow,task_functions).py : Context consolidation must occur every N turns (default 3) to prevent lost-in-middle-turns, as configured.

Applied to files:

  • src/mcp_agent/workflows/deep_orchestrator/prompts.py
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/tasks/{task_functions,llm_evaluators,quality_control}.py : Implement mandatory quality pipeline with LLM-as-judge pattern, evaluating responses on seven research-based quality dimensions.

Applied to files:

  • src/mcp_agent/workflows/deep_orchestrator/knowledge.py
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/**/*.py : Use mcp-agent's Agent abstraction for ALL LLM interactions, including quality evaluation, to ensure consistent tool access, logging, and error handling.

Applied to files:

  • src/mcp_agent/workflows/llm/augmented_llm.py
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/utils/config.py : Configuration values such as quality_threshold, max_refinement_attempts, consolidation_interval, and evaluator_model_provider must be loaded from mcp_agent.config.yaml.

Applied to files:

  • src/mcp_agent/workflows/deep_orchestrator/config.py
🧬 Code Graph Analysis (2)
src/mcp_agent/workflows/deep_orchestrator/knowledge.py (2)
src/mcp_agent/workflows/llm/augmented_llm.py (3)
  • get_and_clear_tool_provenance (537-541)
  • get (89-90)
  • get (112-113)
src/mcp_agent/workflows/deep_orchestrator/models.py (1)
  • KnowledgeItem (41-62)
src/mcp_agent/workflows/deep_orchestrator/task_executor.py (6)
src/mcp_agent/workflows/llm/augmented_llm.py (3)
  • RequestParams (119-164)
  • generate_str (177-182)
  • generate_str (301-306)
src/mcp_agent/agents/agent.py (2)
  • attach_llm (156-192)
  • Agent (61-991)
src/mcp_agent/workflows/deep_orchestrator/models.py (1)
  • TaskStatus (16-23)
src/mcp_agent/workflows/deep_orchestrator/memory.py (1)
  • save_artifact (62-80)
src/mcp_agent/workflows/deep_orchestrator/knowledge.py (1)
  • extract_knowledge (47-147)
src/mcp_agent/workflows/deep_orchestrator/prompts.py (1)
  • build_agent_instruction (381-416)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks / test
🔇 Additional comments (13)
src/mcp_agent/workflows/llm/augmented_llm.py (1)

537-542: LGTM: simple, clear accessor

The get-and-clear approach matches the intended one-shot consumption.

src/mcp_agent/workflows/deep_orchestrator/models.py (1)

77-79: Ensure TaskResult.citation is propagated to downstream consumers

Confirm orchestrator/knowledge extraction uses TaskResult.citation (e.g., as a primary source of provenance for knowledge items).

If not yet wired, I can patch knowledge.py to prefer task_result.citation and only fall back to LLM provenance.

src/mcp_agent/workflows/deep_orchestrator/prompts.py (1)

70-71: LGTM: planning guidance clarifies effort scaling and source quality

The added rules push towards scalable planning and authoritative sources; consistent with orchestration goals.

Also applies to: 79-80

src/mcp_agent/workflows/deep_orchestrator/task_executor.py (3)

291-297: Per-task knowledge extraction toggle looks good.

Conditional extraction aligns with the new batch mode handled in the orchestrator. No issues.


313-314: No functional change.

Comment-only change; nothing to address.


235-246: Verify CreateMessageRequestParams Supports modelPreferences & Refactor Repeated Logic

It looks like RequestParams (which subclasses an external CreateMessageRequestParams from mcp.types) may not officially expose a modelPreferences field, and we couldn’t locate its definition in this repo. Before swallowing any errors, please confirm that dynamic assignment of modelPreferences is safe. Also, the same augmentation logic is duplicated in two branches of task_executor.py.

• Locations needing attention:
– src/mcp_agent/workflows/deep_orchestrator/task_executor.py:235–246
– src/mcp_agent/workflows/deep_orchestrator/task_executor.py:249–260

• Please verify that CreateMessageRequestParams actually defines (or allows) modelPreferences. If it does, consider extracting and reusing this helper:

 class TaskExecutor:
+    def _prepare_request_params(self, base: RequestParams | None) -> RequestParams:
+        rp = base or RequestParams(max_iterations=10)
+        # Only set modelPreferences if it's a supported attribute and we have context prefs
+        exec_prefs = getattr(self.context, "executor_model_preferences", None)
+        if exec_prefs is not None and getattr(rp, "modelPreferences", None) is None:
+            setattr(rp, "modelPreferences", exec_prefs)
+        return rp

     async def _execute(self, agent, task_context, request_params):
-        if isinstance(agent, AugmentedLLM):
-            rp = request_params or RequestParams(max_iterations=10)
-            try:
-                if not getattr(rp, "modelPreferences", None):
-                    rp.modelPreferences = getattr(self.context, "executor_model_preferences", None)
-            except Exception:
-                pass
+        if isinstance(agent, AugmentedLLM):
+            rp = self._prepare_request_params(request_params)
             output = await agent.generate_str(message=task_context, request_params=rp)
         else:
             async with agent:
-                rp = request_params or RequestParams(max_iterations=10)
-                try:
-                    if not getattr(rp, "modelPreferences", None):
-                        rp.modelPreferences = getattr(self.context, "executor_model_preferences", None)
-                except Exception:
-                    pass
+                rp = self._prepare_request_params(request_params)
                 llm = await agent.attach_llm(self.llm_factory)
                 output = await llm.generate_str(message=task_context, request_params=rp)
src/mcp_agent/workflows/deep_orchestrator/orchestrator.py (7)

204-206: Propagating new TaskExecutor flags – LGTM.

Flags align with TaskExecutor’s constructor and tie back to config.


436-458: Batch knowledge extraction per step – LGTM.

Correctly filters only current-step results and uses configured concurrency. Exceptions scoped and logged.


511-519: Planner model preference application – LGTM.

Using RequestParams with planner model preferences is consistent with executor paths.


521-523: Configurable plan verification attempts – LGTM.

Respects config with sane minimum.


566-568: Passing RequestParams to planning LLM – LGTM.

Ensures intelligent model selection during plan generation.


736-744: Synthesis RequestParams with model preferences – LGTM.

Appropriate iteration budget and preference injection.


295-335: RequestParams supports modelPreferences and AugmentedLLM handles it safely
Confirmed:

  • RequestParams (src/mcp_agent/workflows/llm/augmented_llm.py:119) inherits from CreateMessageRequestParams, which defines modelPreferences.
  • AugmentedLLM (src/mcp_agent/workflows/llm/augmented_llm.py:587–602) checks for request_params.modelPreferences and applies its values when tracing.

No changes required.

Comment on lines +34 to +56
# Efficiency and robustness controls
max_plan_verification_attempts: int = 4
"""Maximum attempts to repair/verify a plan before proceeding"""

# Knowledge extraction strategy
knowledge_extraction_mode: str = "batch"
"""Either 'per_task' or 'batch' (default) to extract knowledge after a step"""

knowledge_batch_max_concurrent: int = 3
"""Max concurrent knowledge extraction tasks when in batch mode"""

# Token/cost optimization
lean_agent_design: bool = False
"""If true, skip designer LLM call and create minimal agents for tasks"""

# Adaptive effort scaling based on objective complexity
dynamic_effort_scaling: bool = False
"""If true, adjust execution/context budgets based on objective complexity"""

# Artifact persistence
save_task_outputs_to_artifacts: bool = True
"""If true, persist each successful task's output into the workspace artifacts"""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Type/validate new fields (Literal and Field constraints) to prevent invalid configs

  • Constrain knowledge_extraction_mode to "per_task" | "batch".
  • Add ge=1 constraints for counts.
-    max_plan_verification_attempts: int = 4
+    max_plan_verification_attempts: int = Field(default=4, ge=1)
@@
-    knowledge_extraction_mode: str = "batch"
+    knowledge_extraction_mode: Literal["per_task", "batch"] = "batch"
@@
-    knowledge_batch_max_concurrent: int = 3
+    knowledge_batch_max_concurrent: int = Field(default=3, ge=1)

Also add these imports at the top:

from typing import List, Optional, Literal
from pydantic import BaseModel, ConfigDict, Field
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/config.py around lines 34 to 56,
the new configuration fields lack type constraints and validation. Update
knowledge_extraction_mode to use Literal["per_task", "batch"] to restrict its
values. Add ge=1 constraints using Field for integer count fields like
max_plan_verification_attempts and knowledge_batch_max_concurrent to ensure they
are positive. Also, add the imports from typing (List, Optional, Literal) and
from pydantic (BaseModel, ConfigDict, Field) at the top of the file as
specified.

Comment on lines +106 to +115
# Attach provenance/citation if available from the calling LLM
citation = None
try:
if hasattr(llm, "get_and_clear_tool_provenance"):
prov = llm.get_and_clear_tool_provenance()
if prov:
citation = {"tools": prov[-3:]} # last few tool calls
except Exception:
citation = None

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fetch provenance once (outside the loop) and prefer TaskResult.citation; current code clears it per-item

  • Calling get_and_clear_tool_provenance inside the loop clears provenance after the first item; subsequent items get None.
  • Provenance should usually come from the task execution (TaskResult.citation), not the extraction LLM instance.

Within the loop, use a pre-fetched citation:

-                # Attach provenance/citation if available from the calling LLM
-                citation = None
-                try:
-                    if hasattr(llm, "get_and_clear_tool_provenance"):
-                        prov = llm.get_and_clear_tool_provenance()
-                        if prov:
-                            citation = {"tools": prov[-3:]}  # last few tool calls
-                except Exception:
-                    citation = None
+                # Use pre-fetched citation (if any)
+                citation = citation_info

Add this block once right after initializing knowledge_items (outside this hunk):

# Pre-fetch citation once: prefer TaskResult.citation, else LLM provenance
citation_info = None
try:
    if getattr(task_result, "citation", None):
        citation_info = task_result.citation
    elif hasattr(llm, "get_and_clear_tool_provenance"):
        prov = llm.get_and_clear_tool_provenance()
        if prov:
            citation_info = {"tools": prov[-3:]}  # last few tool calls
except Exception:
    citation_info = None

Optionally sanitize provenance before storing (mask secrets, cap sizes) similar to augmented_llm._sanitize_provenance_args.

🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/knowledge.py around lines 106 to
115, the code calls llm.get_and_clear_tool_provenance inside a loop, which
clears provenance after the first iteration causing subsequent items to get
None. To fix this, move the provenance fetching outside the loop by pre-fetching
citation_info once after initializing knowledge_items, preferring
task_result.citation if available, otherwise falling back to
llm.get_and_clear_tool_provenance. Then, inside the loop, use this pre-fetched
citation_info instead of calling the method repeatedly. Optionally, sanitize the
provenance data before storing it to mask secrets or limit size.

Comment on lines +123 to 124
citation=citation,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the pre-fetched citation variable when constructing KnowledgeItem

Tie item to the same provenance snapshot.

-                        citation=citation,
+                        citation=citation_info,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
citation=citation,
)
citation=citation_info,
)
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/knowledge.py at lines 123 to 124,
the KnowledgeItem construction should use the pre-fetched citation variable
instead of creating or fetching a new one. Update the code to pass the existing
citation variable to ensure the item is tied to the same provenance snapshot.

Comment on lines +50 to 52
# Added citation/provenance
citation: Dict[str, Any] | None = None

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Include citation in KnowledgeItem.to_dict to avoid dropping provenance

Without adding citation to to_dict, provenance is lost when serializing.

Apply this diff to to_dict:

     def to_dict(self) -> Dict[str, Any]:
         """Convert to dictionary representation."""
         return {
             "key": self.key,
             "value": self.value,
             "source": self.source,
             "timestamp": self.timestamp.isoformat(),
             "confidence": self.confidence,
             "category": self.category,
+            "citation": self.citation,
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Added citation/provenance
citation: Dict[str, Any] | None = None
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary representation."""
return {
"key": self.key,
"value": self.value,
"source": self.source,
"timestamp": self.timestamp.isoformat(),
"confidence": self.confidence,
"category": self.category,
+ "citation": self.citation,
}
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/models.py around lines 50 to 52,
the citation attribute is defined but not included in the KnowledgeItem.to_dict
method, causing loss of provenance information during serialization. Update the
to_dict method to include the citation field in the returned dictionary so that
citation data is preserved when converting the object to a dictionary.

Comment on lines +295 to +335
# Optional dynamic effort scaling inspired by Anthropic research heuristics
if getattr(self.config.execution, "dynamic_effort_scaling", False):
# Cheap LLM pass to assess complexity and suggest scaling
assessor = Agent(
name="EffortAssessor",
instruction=(
"Assess objective complexity and recommend iteration/replan/context budgets."
),
context=self.context,
)
llm = self.llm_factory(assessor)
try:
rec = await llm.generate_structured( # type: ignore[arg-type]
message=(
f"<assess>Objective: {self.objective}\n"
"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
),
response_model=dict, # Loose schema to keep it cheap
request_params=RequestParams(max_iterations=1, temperature=0.1),
)
mi = int(
rec.get("max_iterations", self.config.execution.max_iterations)
)
mr = int(rec.get("max_replans", self.config.execution.max_replans))
tcb = int(
rec.get(
"task_context_budget", self.config.context.task_context_budget
)
)
self.config.execution.max_iterations = max(
self.config.execution.max_iterations, mi
)
self.config.execution.max_replans = max(
self.config.execution.max_replans, mr
)
self.config.context.task_context_budget = max(
self.config.context.task_context_budget, tcb
)
except Exception:
pass

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Dynamic effort scaling: avoid dict as response model and add bounds.

  • Passing response_model=dict to generate_structured may fail if the implementation expects a Pydantic model. Prefer using generate_str + json.loads, or define a minimal Pydantic model.
  • Consider clamping recommended values to sane upper bounds to avoid runaway budgets from a bad assessor output.
  • Optional: allow assessor_model_preferences from context to force cheap model.

Example fix using JSON:

@@
-            try:
-                rec = await llm.generate_structured(  # type: ignore[arg-type]
-                    message=(
-                        f"<assess>Objective: {self.objective}\n"
-                        "Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
-                    ),
-                    response_model=dict,  # Loose schema to keep it cheap
-                    request_params=RequestParams(max_iterations=1, temperature=0.1),
-                )
+            try:
+                rp = RequestParams(max_iterations=1, temperature=0.1)
+                # Prefer cheap/fast model if available
+                try:
+                    setattr(rp, "modelPreferences", getattr(self.context, "assessor_model_preferences", None))
+                except Exception:
+                    pass
+                raw = await llm.generate_str(
+                    message=(
+                        f"<assess>Objective: {self.objective}\n"
+                        "Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
+                    ),
+                    request_params=rp,
+                )
+                import json
+                rec = json.loads(raw) if raw else {}
@@
-                self.config.execution.max_iterations = max(
-                    self.config.execution.max_iterations, mi
-                )
-                self.config.execution.max_replans = max(
-                    self.config.execution.max_replans, mr
-                )
-                self.config.context.task_context_budget = max(
-                    self.config.context.task_context_budget, tcb
-                )
+                # Clamp updates to prevent extreme values (tunable caps)
+                cap_iter = getattr(self.config.execution, "max_iterations_cap", 50)
+                cap_replans = getattr(self.config.execution, "max_replans_cap", 10)
+                cap_ctx = getattr(self.config.context, "task_context_budget_cap", 20000)
+                self.config.execution.max_iterations = min(
+                    max(self.config.execution.max_iterations, mi), cap_iter
+                )
+                self.config.execution.max_replans = min(
+                    max(self.config.execution.max_replans, mr), cap_replans
+                )
+                self.config.context.task_context_budget = min(
+                    max(self.config.context.task_context_budget, tcb), cap_ctx
+                )

If you prefer structured parsing, define a tiny Pydantic model with optional ints and defaults.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Optional dynamic effort scaling inspired by Anthropic research heuristics
if getattr(self.config.execution, "dynamic_effort_scaling", False):
# Cheap LLM pass to assess complexity and suggest scaling
assessor = Agent(
name="EffortAssessor",
instruction=(
"Assess objective complexity and recommend iteration/replan/context budgets."
),
context=self.context,
)
llm = self.llm_factory(assessor)
try:
rec = await llm.generate_structured( # type: ignore[arg-type]
message=(
f"<assess>Objective: {self.objective}\n"
"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
),
response_model=dict, # Loose schema to keep it cheap
request_params=RequestParams(max_iterations=1, temperature=0.1),
)
mi = int(
rec.get("max_iterations", self.config.execution.max_iterations)
)
mr = int(rec.get("max_replans", self.config.execution.max_replans))
tcb = int(
rec.get(
"task_context_budget", self.config.context.task_context_budget
)
)
self.config.execution.max_iterations = max(
self.config.execution.max_iterations, mi
)
self.config.execution.max_replans = max(
self.config.execution.max_replans, mr
)
self.config.context.task_context_budget = max(
self.config.context.task_context_budget, tcb
)
except Exception:
pass
# Optional dynamic effort scaling inspired by Anthropic research heuristics
if getattr(self.config.execution, "dynamic_effort_scaling", False):
# Cheap LLM pass to assess complexity and suggest scaling
assessor = Agent(
name="EffortAssessor",
instruction=(
"Assess objective complexity and recommend iteration/replan/context budgets."
),
context=self.context,
)
llm = self.llm_factory(assessor)
try:
rp = RequestParams(max_iterations=1, temperature=0.1)
# Prefer cheap/fast model if available
try:
setattr(
rp,
"modelPreferences",
getattr(self.context, "assessor_model_preferences", None),
)
except Exception:
pass
raw = await llm.generate_str(
message=(
f"<assess>Objective: {self.objective}\n"
"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
),
request_params=rp,
)
import json
rec = json.loads(raw) if raw else {}
mi = int(
rec.get("max_iterations", self.config.execution.max_iterations)
)
mr = int(rec.get("max_replans", self.config.execution.max_replans))
tcb = int(
rec.get(
"task_context_budget",
self.config.context.task_context_budget,
)
)
# Clamp updates to prevent extreme values (tunable caps)
cap_iter = getattr(self.config.execution, "max_iterations_cap", 50)
cap_replans = getattr(self.config.execution, "max_replans_cap", 10)
cap_ctx = getattr(self.config.context, "task_context_budget_cap", 20000)
self.config.execution.max_iterations = min(
max(self.config.execution.max_iterations, mi), cap_iter
)
self.config.execution.max_replans = min(
max(self.config.execution.max_replans, mr), cap_replans
)
self.config.context.task_context_budget = min(
max(self.config.context.task_context_budget, tcb), cap_ctx
)
except Exception:
pass
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/orchestrator.py around lines 295 to
335, the current use of response_model=dict in llm.generate_structured may cause
errors if the method expects a Pydantic model. Replace this by either using
generate_str combined with json.loads to parse the response or define a minimal
Pydantic model with optional integer fields and defaults for max_iterations,
max_replans, and task_context_budget. Additionally, after parsing the
recommended values, clamp them to predefined sane upper bounds before updating
the config to prevent excessively large budgets from faulty assessor outputs.
Optionally, incorporate assessor_model_preferences from the context to select a
cheaper model for the assessor.

Comment on lines +635 to 642
rp = RequestParams(max_iterations=1)
try:
rp.modelPreferences = getattr(
self.context, "verifier_model_preferences", None
)
except Exception:
pass

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Verifier RequestParams prepared but not used.

rp is created (with model preferences) but not passed to generate_structured. This prevents the intended model selection from taking effect.

Apply:

-        result = await llm.generate_structured(
-            message=prompt, response_model=VerificationResult
-        )
+        result = await llm.generate_structured(
+            message=prompt, response_model=VerificationResult, request_params=rp
+        )

Also applies to: 653-655

🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/orchestrator.py around lines 635 to
642 and similarly at lines 653 to 655, the RequestParams object 'rp' is created
and configured but not passed to the generate_structured function, so the model
preferences are not applied. To fix this, modify the calls to
generate_structured to include 'rp' as an argument, ensuring the model
preferences in 'rp' are used during generation.

Comment on lines +53 to 55
knowledge_extraction_mode: str = "per_task",
lean_agent_design: bool = False,
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Document and validate new constructor flags.

Add both knowledge_extraction_mode and lean_agent_design to the Args docstring and validate knowledge_extraction_mode against allowed values to prevent typos.

Example:

 class TaskExecutor:
@@
     def __init__(
@@
-        knowledge_extraction_mode: str = "per_task",
+        knowledge_extraction_mode: str = "per_task",
         lean_agent_design: bool = False,
     ):
@@
         Args:
@@
             enable_parallel: Whether to enable parallel execution
+            knowledge_extraction_mode: Either "per_task" or "batch". Controls when knowledge extraction runs.
+            lean_agent_design: If True, skip full agent design and use a minimal specialized agent.

Optional runtime guard:

@@
-        self.knowledge_extraction_mode = knowledge_extraction_mode
+        if knowledge_extraction_mode not in ("per_task", "batch"):
+            raise ValueError("knowledge_extraction_mode must be 'per_task' or 'batch'")
+        self.knowledge_extraction_mode = knowledge_extraction_mode

Also applies to: 81-83

🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/task_executor.py around lines 53 to
55 and 81 to 83, the new constructor flags knowledge_extraction_mode and
lean_agent_design are missing documentation and validation. Update the Args
section of the constructor's docstring to include descriptions for both flags.
Add a validation step for knowledge_extraction_mode to check if its value is
within the allowed set of options, raising an error if not, to prevent typos and
invalid inputs.

Comment on lines +267 to +283
# Persist artifact if enabled
try:
if getattr(
getattr(self.context, "orchestrator_config", object()),
"execution",
object(),
):
cfg = self.context.orchestrator_config # type: ignore[attr-defined]
if getattr(cfg.execution, "save_task_outputs_to_artifacts", True):
artifact_name = f"task_{task.name}.txt"
self.memory.save_artifact(
artifact_name, output, to_filesystem=True
)
result.artifacts[artifact_name] = output
except Exception:
pass

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Artifact persistence gate always truthy; fix guard and default behavior.

The nested getattr check with object() sentinels makes the if-condition always truthy; then accessing self.context.orchestrator_config can raise, which is silently swallowed by the broad try/except. Also, defaulting to True when the flag is missing may unexpectedly write artifacts.

Apply safer gating and conservative default:

-            try:
-                if getattr(
-                    getattr(self.context, "orchestrator_config", object()),
-                    "execution",
-                    object(),
-                ):
-                    cfg = self.context.orchestrator_config  # type: ignore[attr-defined]
-                    if getattr(cfg.execution, "save_task_outputs_to_artifacts", True):
-                        artifact_name = f"task_{task.name}.txt"
-                        self.memory.save_artifact(
-                            artifact_name, output, to_filesystem=True
-                        )
-                        result.artifacts[artifact_name] = output
-            except Exception:
-                pass
+            try:
+                cfg = getattr(self.context, "orchestrator_config", None)
+                exec_cfg = getattr(cfg, "execution", None) if cfg else None
+                if getattr(exec_cfg, "save_task_outputs_to_artifacts", False):
+                    artifact_name = f"task_{task.name}.txt"
+                    self.memory.save_artifact(artifact_name, output, to_filesystem=True)
+                    # Consider storing only a reference or truncated preview to reduce memory bloat
+                    result.artifacts[artifact_name] = f"[artifact]{artifact_name}"
+            except Exception as e:
+                logger.debug(f"Skipping artifact persistence: {e}")

Note: WorkspaceMemory.save_artifact only writes to disk if use_filesystem is enabled (see memory.py), so this remains aligned with config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Persist artifact if enabled
try:
if getattr(
getattr(self.context, "orchestrator_config", object()),
"execution",
object(),
):
cfg = self.context.orchestrator_config # type: ignore[attr-defined]
if getattr(cfg.execution, "save_task_outputs_to_artifacts", True):
artifact_name = f"task_{task.name}.txt"
self.memory.save_artifact(
artifact_name, output, to_filesystem=True
)
result.artifacts[artifact_name] = output
except Exception:
pass
# Persist artifact if enabled
try:
cfg = getattr(self.context, "orchestrator_config", None)
exec_cfg = getattr(cfg, "execution", None) if cfg else None
if getattr(exec_cfg, "save_task_outputs_to_artifacts", False):
artifact_name = f"task_{task.name}.txt"
self.memory.save_artifact(artifact_name, output, to_filesystem=True)
# Consider storing only a reference or truncated preview to reduce memory bloat
result.artifacts[artifact_name] = f"[artifact]{artifact_name}"
except Exception as e:
logger.debug(f"Skipping artifact persistence: {e}")
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/task_executor.py lines 267 to 283,
the current nested getattr checks use object() as default, making the condition
always truthy and risking attribute errors that are silently caught. To fix
this, replace the nested getattr calls with a single safer check that verifies
if self.context has an orchestrator_config attribute and if that config has an
execution attribute. Then, explicitly check if the
save_task_outputs_to_artifacts flag exists and is True, defaulting to False if
missing to avoid unintended artifact saving. This ensures the guard condition is
accurate and prevents silent failures.

Comment on lines +255 to 257
# Track last tool call metadata for provenance
self._last_tool_calls: list[dict[str, Any]] = []

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Bound the provenance buffer to avoid unbounded growth

Add a simple cap to keep in-memory provenance bounded.

         # Track last tool call metadata for provenance
-        self._last_tool_calls: list[dict[str, Any]] = []
+        self._last_tool_calls: list[dict[str, Any]] = []
+        self._tool_prov_max = 100  # cap in-memory provenance to avoid unbounded growth
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Track last tool call metadata for provenance
self._last_tool_calls: list[dict[str, Any]] = []
# Track last tool call metadata for provenance
self._last_tool_calls: list[dict[str, Any]] = []
self._tool_prov_max = 100 # cap in-memory provenance to avoid unbounded growth
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/llm/augmented_llm.py around lines 255 to 257, the
_last_tool_calls list is currently unbounded, which can lead to excessive memory
usage. Implement a fixed-size buffer or cap the list length by limiting the
number of stored provenance entries. For example, after appending a new entry,
remove the oldest entries if the list exceeds a predefined maximum size to keep
memory usage bounded.

Comment on lines +439 to 451
# Record minimal provenance for citations
try:
self._last_tool_calls.append(
{
"tool": request.params.name,
"arguments": request.params.arguments,
"tool_call_id": tool_call_id,
"isError": result.isError,
}
)
except Exception:
pass
return result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Sanitize tool arguments, trim the buffer, and avoid silent failure

  • Mask secrets and cap argument size to reduce PII/secret leakage.
  • Trim the buffer to the configured max.
  • Log at debug instead of swallowing errors silently.
         # Record minimal provenance for citations
         try:
-            self._last_tool_calls.append(
+            args = request.params.arguments or {}
+            safe_args = self._sanitize_provenance_args(args)
+            self._last_tool_calls.append(
                 {
-                    "tool": request.params.name,
-                    "arguments": request.params.arguments,
+                    "tool": request.params.name,
+                    "arguments": safe_args,
                     "tool_call_id": tool_call_id,
                     "isError": result.isError,
                 }
             )
+            # Trim to max
+            if len(self._last_tool_calls) > getattr(self, "_tool_prov_max", 100):
+                del self._last_tool_calls[0 : len(self._last_tool_calls) - self._tool_prov_max]
         except Exception:
-            pass
+            if getattr(self, "logger", None):
+                self.logger.debug("Failed to record tool provenance", exc_info=True)

Add this helper method in the class (outside this hunk):

def _sanitize_provenance_args(self, args: dict[str, Any]) -> dict[str, Any]:
    sanitized: dict[str, Any] = {}
    for k, v in (args or {}).items():
        key = str(k).lower()
        if any(s in key for s in ("secret", "token", "password", "apikey", "api_key", "authorization", "auth")):
            sanitized[k] = "***"
            continue
        s = str(v)
        sanitized[k] = s if len(s) <= 500 else s[:500] + "..."
    return sanitized
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/llm/augmented_llm.py around lines 439 to 451, the
code appends tool call details without sanitizing arguments or handling errors
properly. Fix this by adding a helper method _sanitize_provenance_args to mask
secrets and trim argument strings to 500 characters. Then, replace
request.params.arguments with the sanitized version before appending. Also,
change the except block to log the exception at debug level instead of silently
passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant