You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-2Lines changed: 19 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,7 +84,12 @@ This project contains two main components:
84
84
|`data_loader.py`| JSONL data loader with proper Tinker renderers, loss masking, validation, and deduplication. |
85
85
|`data_selector.py`| Utilities for mining hard examples based on evaluation failures. |
86
86
|`hyperparam_utils.py`| Tinker's recommended LR formula and warmup/cosine scheduler. |
87
-
|`simple_eval.py`| Minimal working evaluator for demo (replace with Inspect AI for production). |
87
+
|`simple_eval.py`| Minimal working evaluator for demo (fallback if Inspect AI unavailable). |
88
+
|`inspect_eval.py`| Inspect AI task integration with Tinker sampling adapter. |
89
+
|`logger.py`| Structured JSON logging for metrics and events. |
90
+
|`checkpoint_manager.py`| Checkpoint save/resume for interrupted runs. |
91
+
|`error_handling.py`| Retry logic with exponential backoff and rate limiting. |
92
+
|`mock_tinker.py`| Mock Tinker client for offline demos and CI. |
88
93
|`requirements.txt`| Dependencies required to run the script. |
89
94
|`tests/`| Unit and integration tests for all components. |
90
95
@@ -148,6 +153,14 @@ See [DEMO.md](DEMO.md) for a detailed walkthrough of what happens during the dem
148
153
149
154
The script will fine‑tune the specified base model using LoRA, run evaluations, and iteratively improve the model until it meets your quality targets or a maximum number of rounds. If EvalOps integration is enabled, each evaluation round will be automatically submitted to your EvalOps workspace for tracking and analysis.
Continues from the latest checkpoint in `runs/` directory.
163
+
151
164
## EvalOps Integration
152
165
153
166
This project includes built-in integration with [EvalOps](https://evalops.dev) to automatically track evaluation results across training rounds. The `evalops_client.py` module provides a lightweight Python SDK that:
@@ -216,7 +229,11 @@ The test suite includes:
216
229
- Uses `save_weights_for_sampler()` for evaluation (not `save_state()` which includes optimizer state)
217
230
- Supports Tinker's recommended LR formula: `LR = 5e-5 × 10 × (2000/H_m)^P_m` with model-specific exponents
218
231
- Includes warmup + cosine decay scheduler for stable training
219
-
- Gracefully falls back when tinker-cookbook unavailable (for testing/development)
232
+
- Inspect AI integration with fallback to simple evaluator
233
+
- Structured JSON logging to `runs/<timestamp>/metrics.jsonl`
234
+
- Checkpoint resume with `--resume` flag
235
+
- Mock mode for offline demos and CI (set `TINKER_MOCK=1`)
236
+
- Gracefully falls back when tinker-cookbook unavailable
0 commit comments