-
Notifications
You must be signed in to change notification settings - Fork 437
[feat] Add DaytonaRunner for code evaluators
#3258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements and tests Daytona-based code evaluation functionality, transitioning from the legacy local sandbox to a new SDK-based approach. It includes improvements to code editor indentation handling for Python/code blocks and adds example evaluators for testing various dependencies and API endpoints.
Key Changes
- Replaced legacy
custom_code_runwith newsdk_custom_code_runthat uses the SDK's workflow-based evaluator system - Enhanced code editor to preserve exact indentation for Python/code (no transformations) while maintaining space-to-tab conversion for JSON/YAML
- Added example evaluators for testing OpenAI, NumPy, and Agenta API endpoints in Daytona environments
Reviewed changes
Copilot reviewed 20 out of 25 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
api/oss/src/services/evaluators_service.py |
Implements new SDK-based custom code runner function that delegates to workflow system |
api/oss/src/resources/evaluators/evaluators.py |
Updates default code template with deprecation note for app_params |
sdk/agenta/sdk/workflows/runners/daytona.py |
Adds environment variables (OPENAI_API_KEY, AGENTA_HOST, AGENTA_CREDENTIALS) to sandbox |
sdk/agenta/sdk/workflows/runners/local.py |
Exposes built-in Python types (dict, list, str, etc.) to restricted environment |
sdk/agenta/sdk/decorators/running.py |
Adds fallback to request.credentials in credential resolution chain |
web/oss/src/components/Editor/plugins/code/utils/pasteUtils.ts |
Preserves exact indentation for Python/code, converts spaces to tabs for JSON/YAML |
web/oss/src/components/Editor/plugins/code/plugins/IndentationPlugin.tsx |
Uses 4 spaces for Python/code tab insertion, 2 spaces for JSON/YAML |
web/oss/src/components/Editor/plugins/code/plugins/AutoFormatAndValidateOnPastePlugin.tsx |
Skips indentation transformation for Python/code, maintains it for JSON/YAML |
examples/python/evaluators/openai/*.py |
Adds OpenAI SDK evaluators for testing API availability and exact match comparisons |
examples/python/evaluators/numpy/*.py |
Adds NumPy evaluators for testing library availability and character counting |
examples/python/evaluators/basic/*.py |
Adds basic evaluators using Python stdlib for string matching, length checks, JSON validation |
examples/python/evaluators/ag/*.py |
Adds Agenta API endpoint evaluators for health, secrets, and config endpoints |
examples/python/evaluators/*.md |
Provides comprehensive documentation (README, QUICKSTART, SUMMARY) for evaluators |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ck-daytona-code-evaluator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 32 out of 37 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 55 out of 63 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 56 out of 64 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 53 out of 61 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 55 out of 63 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (6)
sdk/agenta/sdk/types.py:1
- The
import restatement appears after class definitions (line 498 shows a class ending). Move this import to the top of the file with other imports to follow Python conventions and improve code organization.
web/oss/src/components/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/index.tsx:1 - The use of
anytype defeats TypeScript's type safety. Consider defining a more specific type or usingunknownif the structure is truly dynamic, then narrow it with type guards where needed.
sdk/agenta/sdk/workflows/runners/local.py:1 - Using
dict()instead of{}for creating an empty dictionary is less idiomatic and slightly less efficient. Use{}instead for consistency with Python conventions.
web/oss/src/components/Editor/plugins/code/plugins/IndentationPlugin.tsx:1 - The hardcoded space strings for indentation could be defined as named constants (e.g.,
JSON_YAML_INDENT = " ",CODE_INDENT = " ") to improve maintainability and make the indentation standards more explicit.
sdk/agenta/sdk/middlewares/running/vault.py:1 - The comment
# pylint: disable=bare-exceptis misleading since the code actually catchesExceptionrather than using a bare except clause. Remove this comment as it's no longer accurate.
sdk/agenta/sdk/middleware/vault.py:1 - The comment
# pylint: disable=bare-exceptis misleading since the code actually catchesExceptionrather than using a bare except clause. Remove this comment as it's no longer accurate.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.