-
Notifications
You must be signed in to change notification settings - Fork 766
WIP: citations, intelligent model selection, and other optimizations for DeepOrchestrator #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…for deep orchestrator
WalkthroughThis update introduces new configuration options and modes to the deep orchestrator workflow, including dynamic effort scaling, batch knowledge extraction, and lean agent design. It adds provenance/citation tracking for knowledge items, enables artifact persistence of task outputs, and allows for more granular control over LLM planning and execution behaviors. Several data models and classes are extended to support these new features. Changes
Sequence Diagram(s)Dynamic Effort Scaling and Batch Knowledge ExtractionsequenceDiagram
participant User
participant DeepOrchestrator
participant EffortAssessor (LLM)
participant TaskExecutor
participant LLM
participant Workspace
User->>DeepOrchestrator: submit objective
DeepOrchestrator->>EffortAssessor (LLM): assess objective complexity
EffortAssessor (LLM)-->>DeepOrchestrator: recommended budgets
DeepOrchestrator->>TaskExecutor: initialize with config (incl. knowledge_extraction_mode, lean_agent_design)
loop Workflow Steps
DeepOrchestrator->>TaskExecutor: execute step
TaskExecutor->>LLM: execute tasks
LLM-->>TaskExecutor: task results
TaskExecutor->>Workspace: (optional) save artifacts
DeepOrchestrator->>DeepOrchestrator: collect step results
alt knowledge_extraction_mode == "batch"
DeepOrchestrator->>LLM: batch extract knowledge from step results
LLM-->>DeepOrchestrator: knowledge items with citation
DeepOrchestrator->>Workspace: add knowledge to memory
end
end
DeepOrchestrator-->>User: final synthesis/results
Tool Call Provenance and Knowledge CitationsequenceDiagram
participant TaskExecutor
participant AugmentedLLM
participant KnowledgeExtractor
TaskExecutor->>AugmentedLLM: execute tool call
AugmentedLLM->>AugmentedLLM: record tool call metadata
AugmentedLLM-->>TaskExecutor: tool call result
TaskExecutor->>KnowledgeExtractor: extract knowledge
KnowledgeExtractor->>AugmentedLLM: get_and_clear_tool_provenance
AugmentedLLM-->>KnowledgeExtractor: provenance data
KnowledgeExtractor-->>TaskExecutor: knowledge items (with citation)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Poem
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (1)
src/mcp_agent/workflows/deep_orchestrator/task_executor.py (1)
368-396
: Lean agent design path is a solid optimization.Good bypass of the designer round-trip with concise, focused instruction. Consider exposing the behaviors and tips via config to tune per-deployment without code changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/mcp_agent/workflows/deep_orchestrator/config.py
(1 hunks)src/mcp_agent/workflows/deep_orchestrator/knowledge.py
(1 hunks)src/mcp_agent/workflows/deep_orchestrator/models.py
(2 hunks)src/mcp_agent/workflows/deep_orchestrator/orchestrator.py
(7 hunks)src/mcp_agent/workflows/deep_orchestrator/prompts.py
(2 hunks)src/mcp_agent/workflows/deep_orchestrator/task_executor.py
(5 hunks)src/mcp_agent/workflows/llm/augmented_llm.py
(3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/{workflows,tasks}/@(conversation_workflow,task_functions).py : Context consolidation must occur every N turns (default 3) to prevent lost-in-middle-turns, as configured.
Applied to files:
src/mcp_agent/workflows/deep_orchestrator/prompts.py
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/tasks/{task_functions,llm_evaluators,quality_control}.py : Implement mandatory quality pipeline with LLM-as-judge pattern, evaluating responses on seven research-based quality dimensions.
Applied to files:
src/mcp_agent/workflows/deep_orchestrator/knowledge.py
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/**/*.py : Use mcp-agent's Agent abstraction for ALL LLM interactions, including quality evaluation, to ensure consistent tool access, logging, and error handling.
Applied to files:
src/mcp_agent/workflows/llm/augmented_llm.py
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/utils/config.py : Configuration values such as quality_threshold, max_refinement_attempts, consolidation_interval, and evaluator_model_provider must be loaded from mcp_agent.config.yaml.
Applied to files:
src/mcp_agent/workflows/deep_orchestrator/config.py
🧬 Code Graph Analysis (2)
src/mcp_agent/workflows/deep_orchestrator/knowledge.py (2)
src/mcp_agent/workflows/llm/augmented_llm.py (3)
get_and_clear_tool_provenance
(537-541)get
(89-90)get
(112-113)src/mcp_agent/workflows/deep_orchestrator/models.py (1)
KnowledgeItem
(41-62)
src/mcp_agent/workflows/deep_orchestrator/task_executor.py (6)
src/mcp_agent/workflows/llm/augmented_llm.py (3)
RequestParams
(119-164)generate_str
(177-182)generate_str
(301-306)src/mcp_agent/agents/agent.py (2)
attach_llm
(156-192)Agent
(61-991)src/mcp_agent/workflows/deep_orchestrator/models.py (1)
TaskStatus
(16-23)src/mcp_agent/workflows/deep_orchestrator/memory.py (1)
save_artifact
(62-80)src/mcp_agent/workflows/deep_orchestrator/knowledge.py (1)
extract_knowledge
(47-147)src/mcp_agent/workflows/deep_orchestrator/prompts.py (1)
build_agent_instruction
(381-416)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks / test
🔇 Additional comments (13)
src/mcp_agent/workflows/llm/augmented_llm.py (1)
537-542
: LGTM: simple, clear accessorThe get-and-clear approach matches the intended one-shot consumption.
src/mcp_agent/workflows/deep_orchestrator/models.py (1)
77-79
: Ensure TaskResult.citation is propagated to downstream consumersConfirm orchestrator/knowledge extraction uses TaskResult.citation (e.g., as a primary source of provenance for knowledge items).
If not yet wired, I can patch knowledge.py to prefer task_result.citation and only fall back to LLM provenance.
src/mcp_agent/workflows/deep_orchestrator/prompts.py (1)
70-71
: LGTM: planning guidance clarifies effort scaling and source qualityThe added rules push towards scalable planning and authoritative sources; consistent with orchestration goals.
Also applies to: 79-80
src/mcp_agent/workflows/deep_orchestrator/task_executor.py (3)
291-297
: Per-task knowledge extraction toggle looks good.Conditional extraction aligns with the new batch mode handled in the orchestrator. No issues.
313-314
: No functional change.Comment-only change; nothing to address.
235-246
: Verify CreateMessageRequestParams Supports modelPreferences & Refactor Repeated LogicIt looks like
RequestParams
(which subclasses an externalCreateMessageRequestParams
frommcp.types
) may not officially expose amodelPreferences
field, and we couldn’t locate its definition in this repo. Before swallowing any errors, please confirm that dynamic assignment ofmodelPreferences
is safe. Also, the same augmentation logic is duplicated in two branches oftask_executor.py
.• Locations needing attention:
– src/mcp_agent/workflows/deep_orchestrator/task_executor.py:235–246
– src/mcp_agent/workflows/deep_orchestrator/task_executor.py:249–260• Please verify that
CreateMessageRequestParams
actually defines (or allows)modelPreferences
. If it does, consider extracting and reusing this helper:class TaskExecutor: + def _prepare_request_params(self, base: RequestParams | None) -> RequestParams: + rp = base or RequestParams(max_iterations=10) + # Only set modelPreferences if it's a supported attribute and we have context prefs + exec_prefs = getattr(self.context, "executor_model_preferences", None) + if exec_prefs is not None and getattr(rp, "modelPreferences", None) is None: + setattr(rp, "modelPreferences", exec_prefs) + return rp async def _execute(self, agent, task_context, request_params): - if isinstance(agent, AugmentedLLM): - rp = request_params or RequestParams(max_iterations=10) - try: - if not getattr(rp, "modelPreferences", None): - rp.modelPreferences = getattr(self.context, "executor_model_preferences", None) - except Exception: - pass + if isinstance(agent, AugmentedLLM): + rp = self._prepare_request_params(request_params) output = await agent.generate_str(message=task_context, request_params=rp) else: async with agent: - rp = request_params or RequestParams(max_iterations=10) - try: - if not getattr(rp, "modelPreferences", None): - rp.modelPreferences = getattr(self.context, "executor_model_preferences", None) - except Exception: - pass + rp = self._prepare_request_params(request_params) llm = await agent.attach_llm(self.llm_factory) output = await llm.generate_str(message=task_context, request_params=rp)src/mcp_agent/workflows/deep_orchestrator/orchestrator.py (7)
204-206
: Propagating new TaskExecutor flags – LGTM.Flags align with TaskExecutor’s constructor and tie back to config.
436-458
: Batch knowledge extraction per step – LGTM.Correctly filters only current-step results and uses configured concurrency. Exceptions scoped and logged.
511-519
: Planner model preference application – LGTM.Using RequestParams with planner model preferences is consistent with executor paths.
521-523
: Configurable plan verification attempts – LGTM.Respects config with sane minimum.
566-568
: Passing RequestParams to planning LLM – LGTM.Ensures intelligent model selection during plan generation.
736-744
: Synthesis RequestParams with model preferences – LGTM.Appropriate iteration budget and preference injection.
295-335
: RequestParams supports modelPreferences and AugmentedLLM handles it safely
Confirmed:
RequestParams
(src/mcp_agent/workflows/llm/augmented_llm.py:119) inherits fromCreateMessageRequestParams
, which definesmodelPreferences
.AugmentedLLM
(src/mcp_agent/workflows/llm/augmented_llm.py:587–602) checks forrequest_params.modelPreferences
and applies its values when tracing.No changes required.
# Efficiency and robustness controls | ||
max_plan_verification_attempts: int = 4 | ||
"""Maximum attempts to repair/verify a plan before proceeding""" | ||
|
||
# Knowledge extraction strategy | ||
knowledge_extraction_mode: str = "batch" | ||
"""Either 'per_task' or 'batch' (default) to extract knowledge after a step""" | ||
|
||
knowledge_batch_max_concurrent: int = 3 | ||
"""Max concurrent knowledge extraction tasks when in batch mode""" | ||
|
||
# Token/cost optimization | ||
lean_agent_design: bool = False | ||
"""If true, skip designer LLM call and create minimal agents for tasks""" | ||
|
||
# Adaptive effort scaling based on objective complexity | ||
dynamic_effort_scaling: bool = False | ||
"""If true, adjust execution/context budgets based on objective complexity""" | ||
|
||
# Artifact persistence | ||
save_task_outputs_to_artifacts: bool = True | ||
"""If true, persist each successful task's output into the workspace artifacts""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Type/validate new fields (Literal and Field constraints) to prevent invalid configs
- Constrain knowledge_extraction_mode to "per_task" | "batch".
- Add ge=1 constraints for counts.
- max_plan_verification_attempts: int = 4
+ max_plan_verification_attempts: int = Field(default=4, ge=1)
@@
- knowledge_extraction_mode: str = "batch"
+ knowledge_extraction_mode: Literal["per_task", "batch"] = "batch"
@@
- knowledge_batch_max_concurrent: int = 3
+ knowledge_batch_max_concurrent: int = Field(default=3, ge=1)
Also add these imports at the top:
from typing import List, Optional, Literal
from pydantic import BaseModel, ConfigDict, Field
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/config.py around lines 34 to 56,
the new configuration fields lack type constraints and validation. Update
knowledge_extraction_mode to use Literal["per_task", "batch"] to restrict its
values. Add ge=1 constraints using Field for integer count fields like
max_plan_verification_attempts and knowledge_batch_max_concurrent to ensure they
are positive. Also, add the imports from typing (List, Optional, Literal) and
from pydantic (BaseModel, ConfigDict, Field) at the top of the file as
specified.
# Attach provenance/citation if available from the calling LLM | ||
citation = None | ||
try: | ||
if hasattr(llm, "get_and_clear_tool_provenance"): | ||
prov = llm.get_and_clear_tool_provenance() | ||
if prov: | ||
citation = {"tools": prov[-3:]} # last few tool calls | ||
except Exception: | ||
citation = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fetch provenance once (outside the loop) and prefer TaskResult.citation; current code clears it per-item
- Calling get_and_clear_tool_provenance inside the loop clears provenance after the first item; subsequent items get None.
- Provenance should usually come from the task execution (TaskResult.citation), not the extraction LLM instance.
Within the loop, use a pre-fetched citation:
- # Attach provenance/citation if available from the calling LLM
- citation = None
- try:
- if hasattr(llm, "get_and_clear_tool_provenance"):
- prov = llm.get_and_clear_tool_provenance()
- if prov:
- citation = {"tools": prov[-3:]} # last few tool calls
- except Exception:
- citation = None
+ # Use pre-fetched citation (if any)
+ citation = citation_info
Add this block once right after initializing knowledge_items (outside this hunk):
# Pre-fetch citation once: prefer TaskResult.citation, else LLM provenance
citation_info = None
try:
if getattr(task_result, "citation", None):
citation_info = task_result.citation
elif hasattr(llm, "get_and_clear_tool_provenance"):
prov = llm.get_and_clear_tool_provenance()
if prov:
citation_info = {"tools": prov[-3:]} # last few tool calls
except Exception:
citation_info = None
Optionally sanitize provenance before storing (mask secrets, cap sizes) similar to augmented_llm._sanitize_provenance_args.
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/knowledge.py around lines 106 to
115, the code calls llm.get_and_clear_tool_provenance inside a loop, which
clears provenance after the first iteration causing subsequent items to get
None. To fix this, move the provenance fetching outside the loop by pre-fetching
citation_info once after initializing knowledge_items, preferring
task_result.citation if available, otherwise falling back to
llm.get_and_clear_tool_provenance. Then, inside the loop, use this pre-fetched
citation_info instead of calling the method repeatedly. Optionally, sanitize the
provenance data before storing it to mask secrets or limit size.
citation=citation, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use the pre-fetched citation variable when constructing KnowledgeItem
Tie item to the same provenance snapshot.
- citation=citation,
+ citation=citation_info,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
citation=citation, | |
) | |
citation=citation_info, | |
) |
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/knowledge.py at lines 123 to 124,
the KnowledgeItem construction should use the pre-fetched citation variable
instead of creating or fetching a new one. Update the code to pass the existing
citation variable to ensure the item is tied to the same provenance snapshot.
# Added citation/provenance | ||
citation: Dict[str, Any] | None = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Include citation in KnowledgeItem.to_dict to avoid dropping provenance
Without adding citation to to_dict, provenance is lost when serializing.
Apply this diff to to_dict:
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary representation."""
return {
"key": self.key,
"value": self.value,
"source": self.source,
"timestamp": self.timestamp.isoformat(),
"confidence": self.confidence,
"category": self.category,
+ "citation": self.citation,
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Added citation/provenance | |
citation: Dict[str, Any] | None = None | |
def to_dict(self) -> Dict[str, Any]: | |
"""Convert to dictionary representation.""" | |
return { | |
"key": self.key, | |
"value": self.value, | |
"source": self.source, | |
"timestamp": self.timestamp.isoformat(), | |
"confidence": self.confidence, | |
"category": self.category, | |
+ "citation": self.citation, | |
} |
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/models.py around lines 50 to 52,
the citation attribute is defined but not included in the KnowledgeItem.to_dict
method, causing loss of provenance information during serialization. Update the
to_dict method to include the citation field in the returned dictionary so that
citation data is preserved when converting the object to a dictionary.
# Optional dynamic effort scaling inspired by Anthropic research heuristics | ||
if getattr(self.config.execution, "dynamic_effort_scaling", False): | ||
# Cheap LLM pass to assess complexity and suggest scaling | ||
assessor = Agent( | ||
name="EffortAssessor", | ||
instruction=( | ||
"Assess objective complexity and recommend iteration/replan/context budgets." | ||
), | ||
context=self.context, | ||
) | ||
llm = self.llm_factory(assessor) | ||
try: | ||
rec = await llm.generate_structured( # type: ignore[arg-type] | ||
message=( | ||
f"<assess>Objective: {self.objective}\n" | ||
"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>" | ||
), | ||
response_model=dict, # Loose schema to keep it cheap | ||
request_params=RequestParams(max_iterations=1, temperature=0.1), | ||
) | ||
mi = int( | ||
rec.get("max_iterations", self.config.execution.max_iterations) | ||
) | ||
mr = int(rec.get("max_replans", self.config.execution.max_replans)) | ||
tcb = int( | ||
rec.get( | ||
"task_context_budget", self.config.context.task_context_budget | ||
) | ||
) | ||
self.config.execution.max_iterations = max( | ||
self.config.execution.max_iterations, mi | ||
) | ||
self.config.execution.max_replans = max( | ||
self.config.execution.max_replans, mr | ||
) | ||
self.config.context.task_context_budget = max( | ||
self.config.context.task_context_budget, tcb | ||
) | ||
except Exception: | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Dynamic effort scaling: avoid dict as response model and add bounds.
- Passing
response_model=dict
togenerate_structured
may fail if the implementation expects a Pydantic model. Prefer usinggenerate_str
+json.loads
, or define a minimal Pydantic model. - Consider clamping recommended values to sane upper bounds to avoid runaway budgets from a bad assessor output.
- Optional: allow
assessor_model_preferences
from context to force cheap model.
Example fix using JSON:
@@
- try:
- rec = await llm.generate_structured( # type: ignore[arg-type]
- message=(
- f"<assess>Objective: {self.objective}\n"
- "Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
- ),
- response_model=dict, # Loose schema to keep it cheap
- request_params=RequestParams(max_iterations=1, temperature=0.1),
- )
+ try:
+ rp = RequestParams(max_iterations=1, temperature=0.1)
+ # Prefer cheap/fast model if available
+ try:
+ setattr(rp, "modelPreferences", getattr(self.context, "assessor_model_preferences", None))
+ except Exception:
+ pass
+ raw = await llm.generate_str(
+ message=(
+ f"<assess>Objective: {self.objective}\n"
+ "Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>"
+ ),
+ request_params=rp,
+ )
+ import json
+ rec = json.loads(raw) if raw else {}
@@
- self.config.execution.max_iterations = max(
- self.config.execution.max_iterations, mi
- )
- self.config.execution.max_replans = max(
- self.config.execution.max_replans, mr
- )
- self.config.context.task_context_budget = max(
- self.config.context.task_context_budget, tcb
- )
+ # Clamp updates to prevent extreme values (tunable caps)
+ cap_iter = getattr(self.config.execution, "max_iterations_cap", 50)
+ cap_replans = getattr(self.config.execution, "max_replans_cap", 10)
+ cap_ctx = getattr(self.config.context, "task_context_budget_cap", 20000)
+ self.config.execution.max_iterations = min(
+ max(self.config.execution.max_iterations, mi), cap_iter
+ )
+ self.config.execution.max_replans = min(
+ max(self.config.execution.max_replans, mr), cap_replans
+ )
+ self.config.context.task_context_budget = min(
+ max(self.config.context.task_context_budget, tcb), cap_ctx
+ )
If you prefer structured parsing, define a tiny Pydantic model with optional ints and defaults.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Optional dynamic effort scaling inspired by Anthropic research heuristics | |
if getattr(self.config.execution, "dynamic_effort_scaling", False): | |
# Cheap LLM pass to assess complexity and suggest scaling | |
assessor = Agent( | |
name="EffortAssessor", | |
instruction=( | |
"Assess objective complexity and recommend iteration/replan/context budgets." | |
), | |
context=self.context, | |
) | |
llm = self.llm_factory(assessor) | |
try: | |
rec = await llm.generate_structured( # type: ignore[arg-type] | |
message=( | |
f"<assess>Objective: {self.objective}\n" | |
"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>" | |
), | |
response_model=dict, # Loose schema to keep it cheap | |
request_params=RequestParams(max_iterations=1, temperature=0.1), | |
) | |
mi = int( | |
rec.get("max_iterations", self.config.execution.max_iterations) | |
) | |
mr = int(rec.get("max_replans", self.config.execution.max_replans)) | |
tcb = int( | |
rec.get( | |
"task_context_budget", self.config.context.task_context_budget | |
) | |
) | |
self.config.execution.max_iterations = max( | |
self.config.execution.max_iterations, mi | |
) | |
self.config.execution.max_replans = max( | |
self.config.execution.max_replans, mr | |
) | |
self.config.context.task_context_budget = max( | |
self.config.context.task_context_budget, tcb | |
) | |
except Exception: | |
pass | |
# Optional dynamic effort scaling inspired by Anthropic research heuristics | |
if getattr(self.config.execution, "dynamic_effort_scaling", False): | |
# Cheap LLM pass to assess complexity and suggest scaling | |
assessor = Agent( | |
name="EffortAssessor", | |
instruction=( | |
"Assess objective complexity and recommend iteration/replan/context budgets." | |
), | |
context=self.context, | |
) | |
llm = self.llm_factory(assessor) | |
try: | |
rp = RequestParams(max_iterations=1, temperature=0.1) | |
# Prefer cheap/fast model if available | |
try: | |
setattr( | |
rp, | |
"modelPreferences", | |
getattr(self.context, "assessor_model_preferences", None), | |
) | |
except Exception: | |
pass | |
raw = await llm.generate_str( | |
message=( | |
f"<assess>Objective: {self.objective}\n" | |
"Return JSON with keys: max_iterations, max_replans, task_context_budget.</assess>" | |
), | |
request_params=rp, | |
) | |
import json | |
rec = json.loads(raw) if raw else {} | |
mi = int( | |
rec.get("max_iterations", self.config.execution.max_iterations) | |
) | |
mr = int(rec.get("max_replans", self.config.execution.max_replans)) | |
tcb = int( | |
rec.get( | |
"task_context_budget", | |
self.config.context.task_context_budget, | |
) | |
) | |
# Clamp updates to prevent extreme values (tunable caps) | |
cap_iter = getattr(self.config.execution, "max_iterations_cap", 50) | |
cap_replans = getattr(self.config.execution, "max_replans_cap", 10) | |
cap_ctx = getattr(self.config.context, "task_context_budget_cap", 20000) | |
self.config.execution.max_iterations = min( | |
max(self.config.execution.max_iterations, mi), cap_iter | |
) | |
self.config.execution.max_replans = min( | |
max(self.config.execution.max_replans, mr), cap_replans | |
) | |
self.config.context.task_context_budget = min( | |
max(self.config.context.task_context_budget, tcb), cap_ctx | |
) | |
except Exception: | |
pass |
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/orchestrator.py around lines 295 to
335, the current use of response_model=dict in llm.generate_structured may cause
errors if the method expects a Pydantic model. Replace this by either using
generate_str combined with json.loads to parse the response or define a minimal
Pydantic model with optional integer fields and defaults for max_iterations,
max_replans, and task_context_budget. Additionally, after parsing the
recommended values, clamp them to predefined sane upper bounds before updating
the config to prevent excessively large budgets from faulty assessor outputs.
Optionally, incorporate assessor_model_preferences from the context to select a
cheaper model for the assessor.
rp = RequestParams(max_iterations=1) | ||
try: | ||
rp.modelPreferences = getattr( | ||
self.context, "verifier_model_preferences", None | ||
) | ||
except Exception: | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verifier RequestParams prepared but not used.
rp
is created (with model preferences) but not passed to generate_structured
. This prevents the intended model selection from taking effect.
Apply:
- result = await llm.generate_structured(
- message=prompt, response_model=VerificationResult
- )
+ result = await llm.generate_structured(
+ message=prompt, response_model=VerificationResult, request_params=rp
+ )
Also applies to: 653-655
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/orchestrator.py around lines 635 to
642 and similarly at lines 653 to 655, the RequestParams object 'rp' is created
and configured but not passed to the generate_structured function, so the model
preferences are not applied. To fix this, modify the calls to
generate_structured to include 'rp' as an argument, ensuring the model
preferences in 'rp' are used during generation.
knowledge_extraction_mode: str = "per_task", | ||
lean_agent_design: bool = False, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Document and validate new constructor flags.
Add both knowledge_extraction_mode
and lean_agent_design
to the Args docstring and validate knowledge_extraction_mode
against allowed values to prevent typos.
Example:
class TaskExecutor:
@@
def __init__(
@@
- knowledge_extraction_mode: str = "per_task",
+ knowledge_extraction_mode: str = "per_task",
lean_agent_design: bool = False,
):
@@
Args:
@@
enable_parallel: Whether to enable parallel execution
+ knowledge_extraction_mode: Either "per_task" or "batch". Controls when knowledge extraction runs.
+ lean_agent_design: If True, skip full agent design and use a minimal specialized agent.
Optional runtime guard:
@@
- self.knowledge_extraction_mode = knowledge_extraction_mode
+ if knowledge_extraction_mode not in ("per_task", "batch"):
+ raise ValueError("knowledge_extraction_mode must be 'per_task' or 'batch'")
+ self.knowledge_extraction_mode = knowledge_extraction_mode
Also applies to: 81-83
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/task_executor.py around lines 53 to
55 and 81 to 83, the new constructor flags knowledge_extraction_mode and
lean_agent_design are missing documentation and validation. Update the Args
section of the constructor's docstring to include descriptions for both flags.
Add a validation step for knowledge_extraction_mode to check if its value is
within the allowed set of options, raising an error if not, to prevent typos and
invalid inputs.
# Persist artifact if enabled | ||
try: | ||
if getattr( | ||
getattr(self.context, "orchestrator_config", object()), | ||
"execution", | ||
object(), | ||
): | ||
cfg = self.context.orchestrator_config # type: ignore[attr-defined] | ||
if getattr(cfg.execution, "save_task_outputs_to_artifacts", True): | ||
artifact_name = f"task_{task.name}.txt" | ||
self.memory.save_artifact( | ||
artifact_name, output, to_filesystem=True | ||
) | ||
result.artifacts[artifact_name] = output | ||
except Exception: | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Artifact persistence gate always truthy; fix guard and default behavior.
The nested getattr check with object()
sentinels makes the if-condition always truthy; then accessing self.context.orchestrator_config
can raise, which is silently swallowed by the broad try/except. Also, defaulting to True when the flag is missing may unexpectedly write artifacts.
Apply safer gating and conservative default:
- try:
- if getattr(
- getattr(self.context, "orchestrator_config", object()),
- "execution",
- object(),
- ):
- cfg = self.context.orchestrator_config # type: ignore[attr-defined]
- if getattr(cfg.execution, "save_task_outputs_to_artifacts", True):
- artifact_name = f"task_{task.name}.txt"
- self.memory.save_artifact(
- artifact_name, output, to_filesystem=True
- )
- result.artifacts[artifact_name] = output
- except Exception:
- pass
+ try:
+ cfg = getattr(self.context, "orchestrator_config", None)
+ exec_cfg = getattr(cfg, "execution", None) if cfg else None
+ if getattr(exec_cfg, "save_task_outputs_to_artifacts", False):
+ artifact_name = f"task_{task.name}.txt"
+ self.memory.save_artifact(artifact_name, output, to_filesystem=True)
+ # Consider storing only a reference or truncated preview to reduce memory bloat
+ result.artifacts[artifact_name] = f"[artifact]{artifact_name}"
+ except Exception as e:
+ logger.debug(f"Skipping artifact persistence: {e}")
Note: WorkspaceMemory.save_artifact only writes to disk if use_filesystem
is enabled (see memory.py), so this remains aligned with config.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Persist artifact if enabled | |
try: | |
if getattr( | |
getattr(self.context, "orchestrator_config", object()), | |
"execution", | |
object(), | |
): | |
cfg = self.context.orchestrator_config # type: ignore[attr-defined] | |
if getattr(cfg.execution, "save_task_outputs_to_artifacts", True): | |
artifact_name = f"task_{task.name}.txt" | |
self.memory.save_artifact( | |
artifact_name, output, to_filesystem=True | |
) | |
result.artifacts[artifact_name] = output | |
except Exception: | |
pass | |
# Persist artifact if enabled | |
try: | |
cfg = getattr(self.context, "orchestrator_config", None) | |
exec_cfg = getattr(cfg, "execution", None) if cfg else None | |
if getattr(exec_cfg, "save_task_outputs_to_artifacts", False): | |
artifact_name = f"task_{task.name}.txt" | |
self.memory.save_artifact(artifact_name, output, to_filesystem=True) | |
# Consider storing only a reference or truncated preview to reduce memory bloat | |
result.artifacts[artifact_name] = f"[artifact]{artifact_name}" | |
except Exception as e: | |
logger.debug(f"Skipping artifact persistence: {e}") |
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/deep_orchestrator/task_executor.py lines 267 to 283,
the current nested getattr checks use object() as default, making the condition
always truthy and risking attribute errors that are silently caught. To fix
this, replace the nested getattr calls with a single safer check that verifies
if self.context has an orchestrator_config attribute and if that config has an
execution attribute. Then, explicitly check if the
save_task_outputs_to_artifacts flag exists and is True, defaulting to False if
missing to avoid unintended artifact saving. This ensures the guard condition is
accurate and prevents silent failures.
# Track last tool call metadata for provenance | ||
self._last_tool_calls: list[dict[str, Any]] = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Bound the provenance buffer to avoid unbounded growth
Add a simple cap to keep in-memory provenance bounded.
# Track last tool call metadata for provenance
- self._last_tool_calls: list[dict[str, Any]] = []
+ self._last_tool_calls: list[dict[str, Any]] = []
+ self._tool_prov_max = 100 # cap in-memory provenance to avoid unbounded growth
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Track last tool call metadata for provenance | |
self._last_tool_calls: list[dict[str, Any]] = [] | |
# Track last tool call metadata for provenance | |
self._last_tool_calls: list[dict[str, Any]] = [] | |
self._tool_prov_max = 100 # cap in-memory provenance to avoid unbounded growth |
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/llm/augmented_llm.py around lines 255 to 257, the
_last_tool_calls list is currently unbounded, which can lead to excessive memory
usage. Implement a fixed-size buffer or cap the list length by limiting the
number of stored provenance entries. For example, after appending a new entry,
remove the oldest entries if the list exceeds a predefined maximum size to keep
memory usage bounded.
# Record minimal provenance for citations | ||
try: | ||
self._last_tool_calls.append( | ||
{ | ||
"tool": request.params.name, | ||
"arguments": request.params.arguments, | ||
"tool_call_id": tool_call_id, | ||
"isError": result.isError, | ||
} | ||
) | ||
except Exception: | ||
pass | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Sanitize tool arguments, trim the buffer, and avoid silent failure
- Mask secrets and cap argument size to reduce PII/secret leakage.
- Trim the buffer to the configured max.
- Log at debug instead of swallowing errors silently.
# Record minimal provenance for citations
try:
- self._last_tool_calls.append(
+ args = request.params.arguments or {}
+ safe_args = self._sanitize_provenance_args(args)
+ self._last_tool_calls.append(
{
- "tool": request.params.name,
- "arguments": request.params.arguments,
+ "tool": request.params.name,
+ "arguments": safe_args,
"tool_call_id": tool_call_id,
"isError": result.isError,
}
)
+ # Trim to max
+ if len(self._last_tool_calls) > getattr(self, "_tool_prov_max", 100):
+ del self._last_tool_calls[0 : len(self._last_tool_calls) - self._tool_prov_max]
except Exception:
- pass
+ if getattr(self, "logger", None):
+ self.logger.debug("Failed to record tool provenance", exc_info=True)
Add this helper method in the class (outside this hunk):
def _sanitize_provenance_args(self, args: dict[str, Any]) -> dict[str, Any]:
sanitized: dict[str, Any] = {}
for k, v in (args or {}).items():
key = str(k).lower()
if any(s in key for s in ("secret", "token", "password", "apikey", "api_key", "authorization", "auth")):
sanitized[k] = "***"
continue
s = str(v)
sanitized[k] = s if len(s) <= 500 else s[:500] + "..."
return sanitized
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/llm/augmented_llm.py around lines 439 to 451, the
code appends tool call details without sanitizing arguments or handling errors
properly. Fix this by adding a helper method _sanitize_provenance_args to mask
secrets and trim argument strings to 500 characters. Then, replace
request.params.arguments with the sanitized version before appending. Also,
change the except block to log the exception at debug level instead of silently
passing.
llm_selector
to intelligently decide when to use a smart/fast/cheap model during the workflow. For example, the lead planner should always be smart.Summary by CodeRabbit
New Features
Improvements
Bug Fixes