[feat] Add `DaytonaRunner` for code `evaluators` #3258

junaway · 2025-12-20T00:10:55Z

No description provided.

vercel · 2025-12-20T00:11:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jan 7, 2026 9:34am

Copilot

Pull request overview

This PR implements and tests Daytona-based code evaluation functionality, transitioning from the legacy local sandbox to a new SDK-based approach. It includes improvements to code editor indentation handling for Python/code blocks and adds example evaluators for testing various dependencies and API endpoints.

Key Changes

Replaced legacy custom_code_run with new sdk_custom_code_run that uses the SDK's workflow-based evaluator system
Enhanced code editor to preserve exact indentation for Python/code (no transformations) while maintaining space-to-tab conversion for JSON/YAML
Added example evaluators for testing OpenAI, NumPy, and Agenta API endpoints in Daytona environments

Reviewed changes

Copilot reviewed 20 out of 25 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`api/oss/src/services/evaluators_service.py`	Implements new SDK-based custom code runner function that delegates to workflow system
`api/oss/src/resources/evaluators/evaluators.py`	Updates default code template with deprecation note for app_params
`sdk/agenta/sdk/workflows/runners/daytona.py`	Adds environment variables (OPENAI_API_KEY, AGENTA_HOST, AGENTA_CREDENTIALS) to sandbox
`sdk/agenta/sdk/workflows/runners/local.py`	Exposes built-in Python types (dict, list, str, etc.) to restricted environment
`sdk/agenta/sdk/decorators/running.py`	Adds fallback to request.credentials in credential resolution chain
`web/oss/src/components/Editor/plugins/code/utils/pasteUtils.ts`	Preserves exact indentation for Python/code, converts spaces to tabs for JSON/YAML
`web/oss/src/components/Editor/plugins/code/plugins/IndentationPlugin.tsx`	Uses 4 spaces for Python/code tab insertion, 2 spaces for JSON/YAML
`web/oss/src/components/Editor/plugins/code/plugins/AutoFormatAndValidateOnPastePlugin.tsx`	Skips indentation transformation for Python/code, maintains it for JSON/YAML
`examples/python/evaluators/openai/*.py`	Adds OpenAI SDK evaluators for testing API availability and exact match comparisons
`examples/python/evaluators/numpy/*.py`	Adds NumPy evaluators for testing library availability and character counting
`examples/python/evaluators/basic/*.py`	Adds basic evaluators using Python stdlib for string matching, length checks, JSON validation
`examples/python/evaluators/ag/*.py`	Adds Agenta API endpoint evaluators for health, secrets, and config endpoints
`examples/python/evaluators/*.md`	Provides comprehensive documentation (README, QUICKSTART, SUMMARY) for evaluators

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/python/evaluators/numpy/dependency_check.py

examples/python/evaluators/ag/secrets_check.py

examples/python/evaluators/ag/configs_check.py

sdk/agenta/sdk/workflows/runners/daytona.py

web/oss/src/components/Editor/plugins/code/utils/pasteUtils.ts

examples/python/evaluators/openai/exact_match.py

examples/python/evaluators/ag/health_check.py

examples/python/evaluators/openai/dependency_check.py

examples/python/evaluators/numpy/dependency_check.py

…ck-daytona-code-evaluator

Copilot

Pull request overview

Copilot reviewed 32 out of 37 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sdk/agenta/sdk/workflows/handlers.py

sdk/agenta/sdk/workflows/runners/daytona.py

api/oss/src/services/evaluators_service.py

examples/python/evaluators/openai/exact_match.py

sdk/agenta/sdk/workflows/runners/local.py

sdk/agenta/sdk/workflows/runners/daytona.py

examples/python/evaluators/openai/dependency_check.py

Add standard provider keys from vault as env vars Add templates Fix credentials (and thus secrets and traces) in evaluator playground

Copilot

Pull request overview

Copilot reviewed 55 out of 63 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 56 out of 64 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sdk/agenta/sdk/middleware/vault.py

sdk/agenta/sdk/middlewares/running/vault.py

sdk/agenta/sdk/contexts/running.py

sdk/agenta/sdk/types.py

sdk/agenta/sdk/middlewares/running/vault.py

Copilot

Pull request overview

Copilot reviewed 53 out of 61 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 55 out of 63 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (6)

sdk/agenta/sdk/types.py:1

The import re statement appears after class definitions (line 498 shows a class ending). Move this import to the top of the file with other imports to follow Python conventions and improve code organization.
web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/index.tsx:1
The use of any type defeats TypeScript's type safety. Consider defining a more specific type or using unknown if the structure is truly dynamic, then narrow it with type guards where needed.
sdk/agenta/sdk/workflows/runners/local.py:1
Using dict() instead of {} for creating an empty dictionary is less idiomatic and slightly less efficient. Use {} instead for consistency with Python conventions.
web/oss/src/components/Editor/plugins/code/plugins/IndentationPlugin.tsx:1
The hardcoded space strings for indentation could be defined as named constants (e.g., JSON_YAML_INDENT = " ", CODE_INDENT = " ") to improve maintainability and make the indentation standards more explicit.
sdk/agenta/sdk/middlewares/running/vault.py:1
The comment # pylint: disable=bare-except is misleading since the code actually catches Exception rather than using a bare except clause. Remove this comment as it's no longer accurate.
sdk/agenta/sdk/middleware/vault.py:1
The comment # pylint: disable=bare-except is misleading since the code actually catches Exception rather than using a bare except clause. Remove this comment as it's no longer accurate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jp-agenta added 9 commits December 19, 2025 11:29

adding evaluators (WIP)

f75cb59

adding evaluators (WIP)

c2c553a

fixing evaluators

5a8dcd0

Merge branch 'release/v0.69.5' into chore/check-daytona-code-evaluator

91e69e8

testing numpy/openai/agenta

a602930

fix typos in init

59f4797

confirm works with localhost if public host

6c297c9

fix playground

a717366

fix presets

b3d90f2

Copilot AI review requested due to automatic review settings December 20, 2025 00:10

vercel bot deployed to Preview December 20, 2025 00:11 View deployment

Copilot started reviewing on behalf of junaway December 20, 2025 00:11 View session

jp-agenta added 2 commits December 20, 2025 01:12

remove blaot

5bdc802

remove bloat

5304e0f

vercel bot deployed to Preview December 20, 2025 00:13 View deployment

fix daytona imports

00958cc

vercel bot deployed to Preview December 20, 2025 00:15 View deployment

remove openai key from daytona

4071a3d

vercel bot deployed to Preview December 20, 2025 00:16 View deployment

Copilot AI reviewed Dec 20, 2025

View reviewed changes

jp-agenta added 2 commits December 23, 2025 12:31

WIP add runtimes

7d3ac94

Merge branch 'fix/remove-autoevals-and-rag-evaluators' into chore/che…

84bbdaa

…ck-daytona-code-evaluator

vercel bot deployed to Preview December 23, 2025 11:32 View deployment

Copilot AI review requested due to automatic review settings December 23, 2025 11:39

Merge branch 'main' into chore/check-daytona-code-evaluator

a4ffa8c

Copilot started reviewing on behalf of junaway December 23, 2025 11:39 View session

vercel bot deployed to Preview December 23, 2025 11:40 View deployment

Copilot AI reviewed Dec 23, 2025

View reviewed changes

WIP

93d7bb5

Add standard provider keys from vault as env vars Add templates Fix credentials (and thus secrets and traces) in evaluator playground

Propagate custom code evaluator exception

840fcc1

Copilot AI review requested due to automatic review settings January 6, 2026 13:49

Copilot started reviewing on behalf of junaway January 6, 2026 13:49 View session

vercel bot deployed to Preview January 6, 2026 13:50 View deployment

Copilot AI reviewed Jan 6, 2026

View reviewed changes

Error copy fix

ce3ba20

vercel bot deployed to Preview January 6, 2026 13:58 View deployment

added docs for env vars

93eee32

Copilot AI review requested due to automatic review settings January 6, 2026 14:11

Copilot started reviewing on behalf of junaway January 6, 2026 14:11 View session

vercel bot deployed to Preview January 6, 2026 14:12 View deployment

Copilot AI reviewed Jan 6, 2026

View reviewed changes

Merge branch 'main' into chore/check-daytona-code-evaluator

33ebca9

vercel bot deployed to Preview January 6, 2026 14:27 View deployment

Merge branch 'release/v0.75.0' into chore/check-daytona-code-evaluator

9e7960f

Copilot AI review requested due to automatic review settings January 7, 2026 09:05

junaway changed the base branch from main to release/v0.75.0 January 7, 2026 09:05

Copilot started reviewing on behalf of junaway January 7, 2026 09:06 View session

vercel bot deployed to Preview January 7, 2026 09:06 View deployment

Copilot AI reviewed Jan 7, 2026

View reviewed changes

Fix poetry.lock

d9feb15

vercel bot deployed to Preview January 7, 2026 09:16 View deployment

fix var name in web

eef9477

Copilot AI review requested due to automatic review settings January 7, 2026 09:31

Copilot started reviewing on behalf of junaway January 7, 2026 09:31 View session

vercel bot deployed to Preview January 7, 2026 09:32 View deployment

fix fallback in js/ts evaluators

d8364c7

vercel bot deployed to Preview January 7, 2026 09:34 View deployment

Copilot AI reviewed Jan 7, 2026

View reviewed changes

junaway merged commit 6497bfe into release/v0.75.0 Jan 7, 2026
5 checks passed

[feat] Add DaytonaRunner for code evaluators #3258

[feat] Add DaytonaRunner for code evaluators #3258

Conversation

junaway commented Dec 20, 2025

Uh oh!

vercel bot commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[feat] Add `DaytonaRunner` for code `evaluators` #3258

[feat] Add `DaytonaRunner` for code `evaluators` #3258

vercel bot commented Dec 20, 2025 •

edited

Loading