-
Notifications
You must be signed in to change notification settings - Fork 443
Feat/tp add trading example #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
timpara
wants to merge
10
commits into
codelion:main
Choose a base branch
from
uniperJim:feat/tp_add_trading_example
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+679
−2
Open
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
aa567a7
changes to make it work with Azure hosted openAI models
uniperJim 85d2979
Add optimal execution trading example
timpara 8f8a65a
Add optimal execution trading example
timpara 21f937a
Update README.md
timpara 515d864
Merge branch 'codelion:main' into feat/tp_add_trading_example
timpara e98d371
refactor: rename optimia_execution to evolution_orchestrator
timpara 4f92f04
fix: adapt OpenAI client for AI Foundry compatibility
timpara 16f6c62
fix: adapt OpenAI client for AI Foundry compatibility
timpara 2b02ea7
fix: adapt OpenAI client for AI Foundry compatibility
timpara 831df42
fix: adapt OpenAI client for AI Foundry compatibility
timpara File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# OpenEvolve Default Configuration | ||
# This file contains all available configuration options with sensible defaults | ||
# You can use this as a template for your own configuration | ||
|
||
# General settings | ||
max_iterations: 1000 # Maximum number of evolution iterations | ||
checkpoint_interval: 50 # Save checkpoints every N iterations | ||
log_level: "INFO" # Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) | ||
log_dir: null # Custom directory for logs (default: output_dir/logs) | ||
random_seed: null # Random seed for reproducibility (null = random) | ||
|
||
# Evolution settings | ||
diff_based_evolution: true # Use diff-based evolution (true) or full rewrites (false) | ||
allow_full_rewrites: false # Allow occasional full rewrites even in diff-based mode | ||
max_code_length: 10000 # Maximum allowed code length in characters | ||
|
||
# LLM configuration | ||
llm: | ||
models: | ||
- name: "o1" | ||
weight: 1.0 | ||
|
||
evaluator_models: | ||
- name: "o1" | ||
weight: 1.0 | ||
|
||
# Azure endpoint *root* – no path, no query string | ||
api_base: "https://<YOUR_BASE>.openai.azure.com/openai/deployments/<YOUR_DEPLOYMENT_eg_o1" | ||
|
||
# Tell the SDK which API flavour and version to use | ||
# api_type: "azure" | ||
# api_version: "2025-01-01-preview" | ||
api_key: YOUR_API_KEY # Or provide it directly here | ||
temperature: 0.7 | ||
top_p: 0.95 | ||
max_tokens: 4096 | ||
timeout: 60 | ||
retries: 3 | ||
retry_delay: 5 | ||
|
||
|
||
# Prompt configuration | ||
prompt: | ||
template_dir: null # Custom directory for prompt templates | ||
system_message: "You are an expert coder helping to improve programs through evolution." | ||
evaluator_system_message: "You are an expert code reviewer." | ||
|
||
# Number of examples to include in the prompt | ||
num_top_programs: 3 # Number of top-performing programs to include | ||
num_diverse_programs: 2 # Number of diverse programs to include | ||
|
||
# Template stochasticity | ||
use_template_stochasticity: true # Use random variations in templates for diversity | ||
template_variations: # Different phrasings for parts of the template | ||
improvement_suggestion: | ||
- "Here's how we could improve this code:" | ||
- "I suggest the following improvements:" | ||
- "We can enhance this code by:" | ||
|
||
# Note: meta-prompting features are not yet implemented | ||
|
||
# Database configuration | ||
database: | ||
# General settings | ||
db_path: null # Path to persist database (null = in-memory only) | ||
in_memory: true # Keep database in memory for faster access | ||
|
||
# Evolutionary parameters | ||
population_size: 1000 # Maximum number of programs to keep in memory | ||
archive_size: 100 # Size of elite archive | ||
num_islands: 5 # Number of islands for island model (separate populations) | ||
|
||
# Island-based evolution parameters | ||
# Islands provide diversity by maintaining separate populations that evolve independently. | ||
# Migration periodically shares the best solutions between adjacent islands. | ||
migration_interval: 50 # Migrate between islands every N generations | ||
migration_rate: 0.1 # Fraction of top programs to migrate (0.1 = 10%) | ||
|
||
# Selection parameters | ||
elite_selection_ratio: 0.1 # Ratio of elite programs to select | ||
exploration_ratio: 0.2 # Ratio of exploration vs exploitation | ||
exploitation_ratio: 0.7 # Ratio of exploitation vs random selection | ||
# Note: diversity_metric is fixed to "edit_distance" (feature_based not implemented) | ||
|
||
# Feature map dimensions for MAP-Elites | ||
feature_dimensions: # Dimensions for MAP-Elites feature map | ||
- "score" # Performance score | ||
- "complexity" # Code complexity (length) | ||
feature_bins: 10 # Number of bins per dimension | ||
|
||
# Evaluator configuration | ||
evaluator: | ||
# General settings | ||
timeout: 300 # Maximum evaluation time in seconds | ||
max_retries: 3 # Maximum number of retries for evaluation | ||
|
||
# Note: resource limits (memory_limit_mb, cpu_limit) are not yet implemented | ||
|
||
# Evaluation strategies | ||
cascade_evaluation: true # Use cascade evaluation to filter bad solutions early | ||
cascade_thresholds: # Thresholds for advancing to next evaluation stage | ||
- 0.5 # First stage threshold | ||
- 0.75 # Second stage threshold | ||
- 0.9 # Third stage threshold | ||
|
||
# Parallel evaluation | ||
parallel_evaluations: 4 # Number of parallel evaluations | ||
# Note: distributed evaluation is not yet implemented | ||
|
||
# LLM-based feedback (experimental) | ||
use_llm_feedback: false # Use LLM to evaluate code quality | ||
llm_feedback_weight: 0.1 # Weight for LLM feedback in final score |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
README | ||
====== | ||
|
||
Optimal-Execution Toy Benchmark for OpenEvolve | ||
--------------------------------------------- | ||
|
||
This repository contains a **minimal yet complete** benchmark that lets an evolutionary-search engine learn how to execute a fixed quantity of shares in an order-book with market impact. | ||
It mirrors the structure of the earlier “function-minimisation” example but replaces the mathematical objective with a *trading* objective: | ||
|
||
*Minimise implementation-shortfall / slippage when buying or selling a random volume during a short horizon.* | ||
|
||
The benchmark is intentionally lightweight – short Python, no external dependencies – yet it shows every building-block you would find in a realistic execution engine: | ||
|
||
1. synthetic order-book generation | ||
2. execution-schedule parameterisation | ||
3. a search / learning loop confined to an `EVOLVE-BLOCK` | ||
4. an **independent evaluator** that scores candidates on unseen market scenarios. | ||
|
||
------------------------------------------------------------------------------- | ||
|
||
Repository Layout | ||
----------------- | ||
|
||
``` | ||
. | ||
├── initial_program.py # candidate – contains the EVOLVE-BLOCK | ||
├── evaluator.py # ground-truth evaluator | ||
└── README.md # ← you are here | ||
``` | ||
|
||
Why two files? | ||
• `initial_program.py` is what the evolutionary framework mutates. | ||
• `evaluator.py` is trusted, *never* mutated and imports nothing except the | ||
candidate’s public `run_search()` function. | ||
|
||
------------------------------------------------------------------------------- | ||
|
||
Quick-start | ||
----------- | ||
|
||
``` | ||
python initial_program.py | ||
# Runs the candidate’s own training loop (random-search on α) | ||
|
||
python evaluator.py initial_program.py | ||
# Scores the candidate on fresh market scenarios | ||
``` | ||
|
||
Typical console output: | ||
|
||
``` | ||
Best alpha: 1.482 | Estimated average slippage: 0.00834 | ||
{'value_score': 0.213, 'speed_score': 0.667, | ||
'reliability': 1.0, 'overall_score': 0.269} | ||
``` | ||
|
||
------------------------------------------------------------------------------- | ||
|
||
1. Mechanics – Inside the Candidate (`initial_program.py`) | ||
---------------------------------------------------------- | ||
|
||
The file is split into two parts: | ||
|
||
### 1.1 EVOLVE-BLOCK (mutable) | ||
|
||
```python | ||
# EVOLVE-BLOCK-START … EVOLVE-BLOCK-END | ||
``` | ||
|
||
Only the code between those delimiters will be altered by OpenEvolve. | ||
Everything else is *frozen*; it plays the role of a “library.” | ||
|
||
Current strategy: | ||
|
||
1. **Parameter** – a single scalar `alpha (α)` | ||
• α < 0 → front-loads the schedule | ||
• α = 0 → uniform (TWAP) | ||
• α > 0 → back-loads the schedule | ||
|
||
2. **Search** – naïve random search over α | ||
(`search_algorithm()` evaluates ~250 random α’s and keeps the best.) | ||
|
||
3. **Fitness** – measured by `evaluate_alpha()` which, in turn, calls the | ||
**fixed** simulator (`simulate_execution`) for many random scenarios and | ||
averages per-share slippage. | ||
|
||
Return signature required by the evaluator: | ||
|
||
```python | ||
def run_search() -> tuple[float, float]: | ||
return best_alpha, estimated_cost | ||
``` | ||
|
||
The first element (α) is mandatory; anything after that is ignored by the | ||
evaluator but can be useful for debugging. | ||
|
||
### 1.2 Fixed “library” code (non-mutable) | ||
|
||
* `create_schedule(volume, horizon, alpha)` | ||
Weights each slice `(t+1)^α`, then normalises to equal volume. | ||
|
||
* `simulate_execution(...)` | ||
Ultra-simplified micro-structure: | ||
|
||
• The mid-price `P_t` follows a Gaussian random walk | ||
• The current spread is constant (`±spread/2`) | ||
• Market impact grows linearly with child-order size relative to | ||
book depth: | ||
`impact = (size / depth) * spread/2` | ||
|
||
Execution price for each slice: | ||
|
||
``` | ||
BUY : P_t + spread/2 + impact | ||
SELL: P_t - spread/2 - impact | ||
``` | ||
|
||
Slippage is summed over the horizon and returned *per share*. | ||
|
||
------------------------------------------------------------------------------- | ||
|
||
2. Mechanics – The Evaluator (`evaluator.py`) | ||
--------------------------------------------- | ||
|
||
The evaluator is the **oracle**; it owns the test scenarios and the scoring | ||
function. A successful candidate must *generalise*: the random numbers in | ||
the evaluator are independent from those inside the candidate. | ||
|
||
### 2.1 Process flow | ||
|
||
For each of `NUM_TRIALS = 10`: | ||
|
||
1. Draw a *fresh* `(volume, side)` pair | ||
`volume ∈ [100, 1000]`, `side ∈ {buy, sell}` | ||
|
||
2. Call `run_search()` **once** (time-limited to 8 s) | ||
|
||
3. Extract α and compute: | ||
|
||
``` | ||
cost_candidate = simulate_execution(vol, side, α) | ||
cost_baseline = simulate_execution(vol, side, 0.0) # uniform TWAP | ||
improvement = (cost_baseline - cost_candidate) | ||
/ max(cost_baseline, 1e-9) | ||
``` | ||
|
||
4. Store runtime and improvement. | ||
|
||
### 2.2 Scores | ||
|
||
After the 10 trials: | ||
|
||
``` | ||
value_score = mean(max(0, improvement)) ∈ [0, 1] | ||
speed_score = min(10, 1/mean(runtime)) / 10 ∈ [0, 1] | ||
reliability_score = success / 10 ∈ [0, 1] | ||
|
||
overall_score = 0.8·value + 0.1·speed + 0.1·reliability | ||
``` | ||
|
||
Intuition: | ||
|
||
* **Value** (quality of execution) dominates. | ||
* **Speed** rewards fast optimisation but is capped. | ||
* **Reliability** ensures the candidate rarely crashes or times-out. | ||
|
||
### 2.3 Stage-based evaluation (optional) | ||
|
||
* `evaluate_stage1()` – smoke-test; passes if `overall_score > 0.05` | ||
* `evaluate_stage2()` – identical to `evaluate()` | ||
|
||
Those mirrors the two-stage funnel from the previous demo. | ||
|
||
------------------------------------------------------------------------------- | ||
|
||
3. Extending the Benchmark | ||
-------------------------- | ||
|
||
The framework is deliberately tiny so you can experiment. | ||
|
||
Ideas: | ||
|
||
1. **Richer parameterisation** | ||
• Add `beta` for *U-shape* schedule | ||
• Add *child-order participation cap* (%ADV) | ||
|
||
2. **Better search / learning** | ||
• Replace random search with gradient-free CMA-ES, Bayesian optimisation or | ||
even RL inside the EVOLVE-BLOCK. | ||
|
||
3. **Enhanced market model** | ||
• Stochastic spread | ||
• Non-linear impact (`impact ∝ volume^γ`) | ||
• Resilience (price reverts after child order) | ||
|
||
4. **Multi-objective scoring** | ||
Mix risk metrics (variance of slippage) into the evaluator. | ||
|
||
When you add knobs, remember: | ||
|
||
* All **simulation logic for evaluation must live in `evaluator.py`**. | ||
Candidates cannot peek or tamper with it. | ||
* The evaluator must still be able to extract the *decision variables* from | ||
the tuple returned by `run_search()`. | ||
|
||
------------------------------------------------------------------------------- | ||
|
||
4. Known Limitations | ||
-------------------- | ||
|
||
1. **Impact model is linear & memory-less** | ||
Good for demonstration; unrealistic for real-world HFT. | ||
|
||
2. **No order-book micro-structure** | ||
We do not simulate queue positions, cancellations, hidden liquidity, etc. | ||
|
||
3. **Single parameter α** | ||
Optimal execution in reality depends on volatility, spread forecast, | ||
order-book imbalance and so forth. Here we sidestep all that for clarity. | ||
|
||
4. **Random search baseline** | ||
Evolutionary engines will easily outperform it; that is the point – we | ||
want a hill to climb. | ||
|
||
------------------------------------------------------------------------------- | ||
|
||
5. FAQ | ||
------ | ||
Q: **How do I run the example?** | ||
A: Run `python openevolve-run.py examples/optimal_execution/initial_program.py examples/optimal_execution/evaluator.py --iterations 20 --config config.yaml' | ||
Q: **Why does the evaluator re-implement `simulate_execution`?** | ||
A: To guarantee the candidate cannot cheat by hard-coding answers from its own | ||
RNG realisations. | ||
|
||
Q: **What happens if my `run_search()` returns something weird?** | ||
A: The evaluator casts the *first* item to `float`. Non-numeric or `NaN` | ||
values yield zero score. | ||
|
||
Q: **Is it okay to import heavy libraries (pandas, torch) inside the EVOLVE-BLOCK?** | ||
A: Technically yes, but remember the 8-second time-out and the judge’s machine | ||
may not have GPU or large RAM. | ||
|
||
|
||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.