Skip to content

Agent testing automation πŸ€– by simulating users πŸ‘₯ and agents 🀝 with judge βš–οΈ(langwatch-scenario)

Notifications You must be signed in to change notification settings

kimtth/agent-auto-eval-azure-aoai-sk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Agent Auto-Evaluation with Azure OpenAI and Semantic Kernel

This project demonstrates an automated agent evaluation using langwatch scenario framework with Azure OpenAI services through Semantic Kernel. It implements multi-agent scenarios for testing conversational AI capabilities.

πŸ“ Overview

  • Multi-Agent Architecture: Specialized agents for weather, travel planning, and coordination
  • Azure OpenAI Integration: Uses Semantic Kernel with Azure OpenAI services
  • Automated Testing: Pytest-based test suite for agent interactions
  • Scenario Evaluation: Uses LangWatch Scenario for comprehensive agent testing

βš™οΈ Configuration

  1. Copy the sample environment file:
cp .env.sample .env
  1. Configure your Azure OpenAI settings in .env:
AZURE_OPENAI_ENDPOINT=https://<your-endpoint>.openai.azure.com
AZURE_OPENAI_API_KEY=<your-api-key>
AZURE_OPENAI_DEPLOYMENT_NAME=<your-deployment-name>

πŸ—οΈ Architecture

πŸ§‘β€πŸ’» Agent Types

  • 🌦️ WeatherAgent: Provides weather information and forecasts
  • 🧳 TravelPlannerAgent: Assists with trip planning and recommendations
  • 🀝 CoordinatorAgent: Manages multi-agent conversations and routing
  • πŸ—οΈ BaseAgent: Common functionality for all specialized agents

πŸ—οΈ Key Components

  • 🧠 Semantic Kernel Integration: Uses Azure OpenAI through Semantic Kernel
  • πŸ“ LangWatch Scenario: Evaluation framework for agent interactions
  • πŸ—„οΈ Caching System: Reduces API calls during testing
  • πŸ§ͺ Pytest Framework: Structured testing with async support

πŸš€ Usage

πŸ§ͺ Running Tests

Execute all agent tests:

pytest -m agent_test -v

OR

python sk_agent_scenario.py

Run specific tests:

# Simple agent interaction
pytest -k "test_simple_agent_interaction" -v

# Dynamic agent selection
pytest -k "test_dynamic_agent_selection" -v

# Multi-agent simulation
pytest -k "test_multi_agent_simulation" -v

🧩 Test Scenarios

  1. Simple Agent Interaction: Basic weather query to Paris
  2. Dynamic Agent Selection: Business trip planning with multiple considerations
  3. Multi-Agent Simulation: Comprehensive travel planning with agent collaboration

🧩 Console output

Scenario output

About

Agent testing automation πŸ€– by simulating users πŸ‘₯ and agents 🀝 with judge βš–οΈ(langwatch-scenario)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages