Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Jul 24, 2025

Problem

Users running SafetyEvaluation with direct_attack evaluator twice using the same randomization_seed were getting different query sets. They expected 200 matching queries but only got 100 matches, indicating non-deterministic behavior despite using identical seeds.

Root Cause

The issue was in AdversarialTemplateHandler._get_content_harm_template_collections() where templates were processed in non-deterministic order:

# Before (problematic)
for key, value in plist.items():  # Dictionary iteration order not guaranteed
    if value["category"] == template_category:
        # Process template...

This caused:

  1. Templates retrieved from service could be stored in different dictionary orders between calls
  2. Different template processing order led to different parameter zipping in AdversarialSimulator
  3. Same randomization_seed produced different query sets across runs

Solution

Changed template processing to use sorted keys for deterministic ordering:

# After (fixed)
# Sort keys to ensure consistent ordering across different calls
# This ensures that templates are processed in the same order regardless of
# how they were retrieved from the service or stored in the dictionary
for key in sorted(plist.keys()):
    value = plist[key]
    if value["category"] == template_category:
        # Process template...

Impact

  • Fixed randomization inconsistency: Same randomization_seed now produces identical query sets
  • Zero breaking changes: Existing functionality preserved
  • Minimal code change: Only 1 line modified with maximum impact
  • Deterministic behavior: DirectAttackSimulator results are now reproducible

Testing

Added comprehensive test coverage:

  • test_template_handler_determinism.py: Validates template ordering consistency
  • test_direct_attack_determinism.py: Tests DirectAttackSimulator deterministic behavior

All tests confirm the fix resolves the issue:

  • Template ordering is deterministic regardless of service response order
  • Parameter zipping produces consistent results
  • Same randomization_seed produces identical query sets (100% match rate)

Example Usage

# This now works correctly:
outputs1 = await safety_evaluation(
    evaluators=[_SafetyEvaluator.DIRECT_ATTACK],
    target=target_function,
    randomization_seed=1
)

outputs2 = await safety_evaluation(
    evaluators=[_SafetyEvaluator.DIRECT_ATTACK], 
    target=target_function,
    randomization_seed=1  # Same seed
)

# Before fix: Only ~50% queries matched
# After fix: 100% queries match ✅

Closes issue where DirectAttackSimulator with identical randomization_seed produced different results.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits July 24, 2025 16:13
…g deterministic template ordering

Co-authored-by: slister1001 <103153180+slister1001@users.noreply.github.com>
Co-authored-by: slister1001 <103153180+slister1001@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] We are using the same randomization_seed to get the single-turn query set by invoking safety evaluation interface with direct attack evaluator in evaluation SDK twice—once for the treatment group and once for the control group. We expect the returned ... Fix DirectAttackSimulator randomization_seed inconsistency by ensuring deterministic template ordering Jul 24, 2025
Copilot finished work on behalf of slister1001 July 24, 2025 16:20
@Copilot Copilot AI requested a review from slister1001 July 24, 2025 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants