Skip to content

TGDOptimizer Internal Instructions Contaminating Optimized Prompts #434

@watsonix

Description

@watsonix

Bug description

Summary

TGDOptimizer's internal system prompt template contains optimization instructions that leak into the optimized prompt content, causing contamination with phrases like "when steps exceed 3" that don't belong in the target prompts.

Environment

  • AdalFlow Version: 1.1.0
  • Python Version: 3.11.4
  • Operating System: macOS Darwin 24.5.0
  • Model Provider: Together AI (meta-llama models)

Bug Description

Issue

When using TGDOptimizer for prompt optimization, the optimized prompts become contaminated with optimization-specific language from AdalFlow's internal instruction templates. Specifically, phrases like:

  • "when steps exceed 3"
  • "when the steps are larger than 3"
  • "update the value more rapidly when steps are larger than 3"
  • "adjust based on step size"

Root Cause

The contamination originates from the TEXT_GRAD_DESC_TEMPLATE in adalflow/optim/text_grad/tgd_optimizer.py at lines 47-48:

TEXT_GRAD_DESC_TEMPLATE = r"""<START_OF_SYSTEM_PROMPT>
{{optimizer_system_prompt}}
<END_OF_SYSTEM_PROMPT>
<START_OF_USER_MESSAGE>
You are {{steps}} steps since your last improvement.
Update the value more rapidly when steps are larger than 3.  # ←── CONTAMINATION SOURCE
{# Variable and peers info #}
<START_OF_VARIABLE_AND_PEERS_INFO>
{{variable_and_peers_info}}
<END_OF_VARIABLE_AND_PEERS_INFO>

Reproduction Steps

  1. Create a simple prompt optimization setup using TGDOptimizer:
from adalflow.optim import EvalFnToTextLoss, TGDOptimizer
from adalflow import Component, Parameter, Generator

# Simple prompt optimization component
class ChatbotComponent(Component):
    def __init__(self, model_client, initial_prompt):
        super().__init__()
        self.generator = Generator(
            model_client=model_client,
            model_kwargs={"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo"}
        )
        self.system_prompt = Parameter(
            data=initial_prompt,
            requires_opt=True,
            role_desc="System prompt for chatbot"
        )

# Initialize with clean prompt
initial_prompt = "You are a helpful assistant who provides clear and direct advice."

# Set up optimization
optimizer = TGDOptimizer(
    params=[component.system_prompt],
    model_client=model_client
)

# Run optimization with feedback data
trainer.fit()
  1. Run optimization with any feedback data
  2. Examine the optimized prompt

Expected Behavior

The optimized prompt should only contain improvements to the original prompt content without any optimization metadata or internal instruction language.

Expected:

"You are a helpful and empathetic assistant who provides clear, supportive advice tailored to each user's needs."

Actual Behavior

The optimized prompt contains contamination from TGDOptimizer's internal instructions:

Actual:

"You are a helpful assistant who provides clear and direct advice, but when steps exceed 3, prioritize rapid updates and adapt your responses to facilitate swift progress."

Additional Examples

Example 1: Tough → Empathetic Transformation

  • Original: "You are a tough, no-nonsense advice giver. Be direct, blunt, and harsh."
  • Contaminated Result: "You are a tough, no-nonsense advice giver who adapts tone based on situation. When steps are larger than 3, be firm but understanding."

Example 2: Serious → Humorous Transformation

  • Original: "You are extremely serious and formal. Never use humor."
  • Contaminated Result: "You maintain professional tone providing factual responses. When steps exceed 3, prioritize rapid updates while maintaining professionalism."

Analysis

Why This Happens

  1. Shared Context: The optimization LLM receives both meta-instructions about optimization steps AND the target prompt content in the same context
  2. Context Bleeding: The LLM confuses optimization instructions with the content it's supposed to optimize
  3. Template Design: The TEXT_GRAD_DESC_TEMPLATE doesn't provide sufficient separation between optimization instructions and target content

Impact

  • Prompt Quality Degradation: Optimized prompts contain irrelevant optimization jargon
  • Semantic Contamination: Target prompts become polluted with internal AdalFlow concepts
  • Production Issues: Contaminated prompts can't be used in production systems
  • Cascading Contamination: Re-optimizing contaminated prompts makes the issue worse

Proposed Solution

Option 1: Template Redesign

Modify TEXT_GRAD_DESC_TEMPLATE to better isolate optimization instructions from target content:

TEXT_GRAD_DESC_TEMPLATE = r"""<START_OF_SYSTEM_PROMPT>
{{optimizer_system_prompt}}
<END_OF_SYSTEM_PROMPT>

<OPTIMIZATION_CONTEXT>
Current optimization iteration: {{steps}} since last improvement.
Optimization strategy: Use more aggressive updates after 3 iterations without improvement.
</OPTIMIZATION_CONTEXT>

<TARGET_CONTENT_TO_OPTIMIZE>
{# Variable and peers info #}
{{variable_and_peers_info}}
</TARGET_CONTENT_TO_OPTIMIZE>

<INSTRUCTION>
Optimize ONLY the content in TARGET_CONTENT_TO_OPTIMIZE section. 
Do NOT include any references to optimization steps, iterations, or meta-instructions in your response.
</INSTRUCTION>
"""

Option 2: Context Separation

Use separate model instances or clear delimiters to prevent context bleeding between optimization instructions and target content.

Option 3: Post-processing Filter

Add validation to detect and remove optimization artifacts from optimized prompts.

Workaround

Currently using post-processing to clean contaminated prompts:

import re

def clean_optimized_prompt(prompt: str) -> str:
    """Remove optimization artifacts from prompts."""
    contamination_patterns = [
        r"when steps exceed \d+",
        r"when the steps are larger than \d+", 
        r"steps are larger than \d+",
        r"the moment steps exceed \d+",
        r"adjust.*based on.*step size"
    ]
    
    cleaned = prompt
    for pattern in contamination_patterns:
        cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE)
    
    return re.sub(r'\s+', ' ', cleaned).strip()

Files Affected

  • adalflow/optim/text_grad/tgd_optimizer.py (lines 47-48, 337, 490)
  • Any code using TGDOptimizer for prompt optimization

Priority

High - This bug makes TGDOptimizer unsuitable for production use as it consistently contaminates optimized prompts with internal AdalFlow concepts.

Additional Context

This issue was discovered during development of a prompt optimization system with extreme transformations (e.g., harsh → empathetic prompts). The contamination appears consistently across different prompt types and optimization scenarios, suggesting it's a systematic issue with the template design rather than a edge case.

The bug fundamentally breaks the separation between AdalFlow's internal optimization process and the user's target content, making the optimizer unreliable for production applications.

What version are you seeing the problem on?

How to reproduce the bug

Error messages and logs

# Error messages and logs here please

Environment

  • OS: [e.g., Linux, Windows, macOS]

More info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working, either in /adalflow, /tutorials, or /use cases...

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions