Add ART email search agent project #245

strickvl · 2025-12-04T19:36:05Z

Summary

This PR adds a new project demonstrating how to train an email search agent using OpenPipe ART (Agentic Reinforcement Training) with ZenML for production ML pipelines.

Train an RL agent using GRPO (Group Relative Policy Optimization) with RULER scoring
Track artifacts including scenarios, model checkpoints, and training metrics
Orchestrate on Kubernetes with GPU step operators for training
Evaluate models with automated correctness judging

The agent learns to search through emails and answer questions using LangGraph's ReAct pattern, starting from a Qwen 2.5 7B base model.

Architecture

data_preparation_pipeline (cached, no GPU)
  download_enron_data → create_database → load_scenarios

training_pipeline (GPU required)
  setup_art_model → train_agent (LangGraph rollouts + RULER + GRPO)

evaluation_pipeline (GPU required)  
  load_trained_model → run_inference → compute_metrics

Test plan

Verify data preparation pipeline runs locally
Test training pipeline with small config on GPU
Validate evaluation metrics computation
Test Kubernetes config on cloud cluster

This project demonstrates training an email search agent using OpenPipe ART (Agentic Reinforcement Training) with ZenML for production ML pipelines. Features: - GRPO training with RULER scoring for relative trajectory evaluation - LangGraph ReAct agent with email search tools - Three ZenML pipelines: data preparation, training, and evaluation - Kubernetes configs with GPU node affinity for production training - Enron email dataset with FTS5 full-text search

dagshub · 2025-12-04T19:36:09Z

Join the discussion on DagsHub!

- Add inference pipeline with DeploymentSettings for HTTP serving - Add single_inference step for real-time query processing - Add deployment.yaml config for HTTP service configuration - Update run.py with --pipeline deploy command - Update README with deployment documentation and examples The inference pipeline can be deployed as an HTTP service using ZenML Pipeline Deployments, enabling real-time email search queries via POST /invoke endpoint.

strickvl added 2 commits December 4, 2025 19:55

Add CLAUDE.md with repository guidance for Claude Code

09a9bf9

strickvl added enhancement New feature or request internal x-squad labels Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ART email search agent project #245

Add ART email search agent project #245

Uh oh!

strickvl commented Dec 4, 2025

Uh oh!

dagshub bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add ART email search agent project #245

Are you sure you want to change the base?

Add ART email search agent project #245

Uh oh!

Conversation

strickvl commented Dec 4, 2025

Summary

Architecture

Test plan

Uh oh!

dagshub bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants