Skip to content

Commit af0a03a

Browse files
committed
Refactor schemas: split Domain/Experiment into separate files, rename I/O functions, use dataclass objects for hierarchical relationships, and improve documentation
1 parent 2d38305 commit af0a03a

12 files changed

+340
-175
lines changed

src/schemas/PIPELINE_SCHEMAS.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Each stage follows a consistent pattern:
2525

2626
**Important:** All stage implementations must follow this pattern to ensure the pipeline is clean, consistent, and maintainable. This enables interoperability between different implementations, resumability of failed runs, and clear traceability through the pipeline.
2727

28-
**Note:** The dataclasses, save functions (`save_<stage>_output(data, metadata, output_path)`), and load functions (`load_<stage>_output(file_path) -> <OutputDataclass>`) for each stage will be provided and must be used. Do not implement custom serialization or data structures - use the standardized schemas to ensure consistency across the pipeline. Dataclasses provide type safety, validation, and clear structure. JSON is the serialization format.
28+
**Note:** The dataclasses, save functions (`save_<stage>(data, metadata, output_path)`), and load functions (`load_<stage>(file_path) -> <OutputDataclass>`) for each stage will be provided and must be used. Do not implement custom serialization or data structures - use the standardized schemas to ensure consistency across the pipeline. Dataclasses provide type safety, validation, and clear structure. JSON is the serialization format.
2929

3030
**Iteration Note:** Some stages operate on subsets (one area, capability, or task at a time) and require an outer orchestrator/loop script to iterate over all items:
3131
- **Stage 2 (Capability Generation)**: Operates on one area at a time - orchestrator loops over all areas from Stage 1
@@ -342,7 +342,7 @@ This stage creates two files:
342342
#### Output 1: `experiment.json`
343343

344344
**Stage Output:** Experiment dataclass + PipelineMetadata
345-
**Save Function:** `save_experiment_output(experiment: Experiment, metadata: PipelineMetadata, output_path: Path)`
345+
**Save Function:** `save_experiment(experiment: Experiment, metadata: PipelineMetadata, output_path: Path)`
346346

347347
**File Path:** `<output_dir>/<experiment_id>/experiment.json`
348348

@@ -373,7 +373,7 @@ This stage creates two files:
373373
#### Output 2: `domain.json`
374374

375375
**Stage Output:** Domain dataclass object + PipelineMetadata
376-
**Save Function:** `save_domain_output(domain: Domain, metadata: PipelineMetadata, output_path: Path)`
376+
**Save Function:** `save_domain(domain: Domain, metadata: PipelineMetadata, output_path: Path)`
377377

378378
**File Path:** `<output_dir>/<experiment_id>/domain/domain.json`
379379

@@ -411,7 +411,7 @@ This stage creates two files:
411411
### Output: `areas.json`
412412

413413
**Stage Output:** List[Area] dataclasses + PipelineMetadata
414-
**Save Function:** `save_areas_output(areas: List[Area], metadata: PipelineMetadata, output_path: Path)`
414+
**Save Function:** `save_areas(areas: List[Area], metadata: PipelineMetadata, output_path: Path)`
415415

416416
**File Path:** `<output_dir>/<experiment_id>/areas/<tag>/areas.json`
417417
```json
@@ -456,7 +456,7 @@ This stage creates two files:
456456
### Output: `capabilities.json` (one per area)
457457

458458
**Stage Output:** List[Capability] dataclasses + PipelineMetadata
459-
**Save Function:** `save_capabilities_output(capabilities: List[Capability], metadata: PipelineMetadata, output_path: Path)`
459+
**Save Function:** `save_capabilities(capabilities: List[Capability], metadata: PipelineMetadata, output_path: Path)`
460460

461461
**File Path:** `<output_dir>/<experiment_id>/capabilities/<cap_tag>/<area_id>/capabilities.json`
462462

@@ -504,7 +504,7 @@ This stage creates two files:
504504
### Output: `tasks.json` (one per capability)
505505

506506
**Stage Output:** List[Task] dataclasses + PipelineMetadata
507-
**Save Function:** `save_tasks_output(tasks: List[Task], metadata: PipelineMetadata, output_path: Path)`
507+
**Save Function:** `save_tasks(tasks: List[Task], metadata: PipelineMetadata, output_path: Path)`
508508

509509
**File Path:** `<output_dir>/<experiment_id>/tasks/<task_tag>/<area_id>/<capability_id>/tasks.json`
510510

@@ -563,7 +563,7 @@ This stage creates two files:
563563
### Output: `solution.json` (one per task)
564564

565565
**Stage Output:** TaskSolution dataclass + PipelineMetadata
566-
**Save Function:** `save_solution_output(task_solution: TaskSolution, metadata: PipelineMetadata, output_path: Path)`
566+
**Save Function:** `save_solution(task_solution: TaskSolution, metadata: PipelineMetadata, output_path: Path)`
567567

568568
**File Path:** `<output_dir>/<experiment_id>/solutions/<solution_tag>/<area_id>/<capability_id>/<task_id>/solution.json`
569569

@@ -634,7 +634,7 @@ This stage creates two files:
634634
### Output: `validation.json` (one per task)
635635

636636
**Stage Output:** ValidationResult dataclass + PipelineMetadata
637-
**Save Function:** `save_validation_output(validation_result: ValidationResult, metadata: PipelineMetadata, output_path: Path)`
637+
**Save Function:** `save_validation(validation_result: ValidationResult, metadata: PipelineMetadata, output_path: Path)`
638638

639639
**File Path:** `<output_dir>/<experiment_id>/validation/<validation_tag>/<area_id>/<capability_id>/<task_id>/validation.json`
640640

src/schemas/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ This directory contains standardized schemas for all ACE pipeline stages, ensuri
66

77
- **[`PIPELINE_SCHEMAS.md`](PIPELINE_SCHEMAS.md)** - Complete documentation of input/output formats for each stage
88
- **Python Dataclasses** - Type-safe data structures for each stage:
9-
- [`experiment_schemas.py`](experiment_schemas.py) - Experiment and Domain (Stage 0)
9+
- [`experiment_schemas.py`](experiment_schemas.py) - Experiment (Stage 0)
10+
- [`domain_schemas.py`](domain_schemas.py) - Domain (Stage 0)
1011
- [`metadata_schemas.py`](metadata_schemas.py) - Common metadata (PipelineMetadata)
1112
- [`area_schemas.py`](area_schemas.py) - Area generation (Stage 1)
1213
- [`capability_schemas.py`](capability_schemas.py) - Capability generation (Stage 2)
@@ -22,8 +23,8 @@ This directory contains standardized schemas for all ACE pipeline stages, ensuri
2223

2324
```python
2425
from src.schemas import (
25-
Experiment,
2626
Domain,
27+
Experiment,
2728
PipelineMetadata,
2829
Area,
2930
Capability,
@@ -33,12 +34,12 @@ from src.schemas import (
3334
)
3435

3536
# Create area
37+
domain = Domain(name="Personal Finance", domain_id="domain_000")
3638
area = Area(
3739
name="Cash Flow & Budget Management",
3840
area_id="area_000",
3941
description="Design and monitor budgets...",
40-
domain="personal finance",
41-
domain_id="domain_000",
42+
domain=domain,
4243
# generation_metadata is optional
4344
)
4445

@@ -54,8 +55,8 @@ area = Area.from_dict(data)
5455
```python
5556
from pathlib import Path
5657
from src.schemas import (
57-
save_areas_output,
58-
load_areas_output,
58+
save_areas,
59+
load_areas,
5960
PipelineMetadata,
6061
Area,
6162
)
@@ -68,10 +69,10 @@ metadata = PipelineMetadata(
6869
timestamp="2025-11-06T12:00:00Z",
6970
output_stage_tag="_20251009_122040"
7071
)
71-
save_areas_output(areas, metadata, Path("output/areas.json"))
72+
save_areas(areas, metadata, Path("output/areas.json"))
7273

7374
# Load areas
74-
areas, metadata = load_areas_output(Path("output/areas.json"))
75+
areas, metadata = load_areas(Path("output/areas.json"))
7576
```
7677

7778
## Pipeline Stages

src/schemas/__init__.py

Lines changed: 30 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,23 @@
66

77
from src.schemas.area_schemas import Area
88
from src.schemas.capability_schemas import Capability
9-
from src.schemas.experiment_schemas import Domain, Experiment
9+
from src.schemas.domain_schemas import Domain
10+
from src.schemas.experiment_schemas import Experiment
1011
from src.schemas.io_utils import (
11-
load_areas_output,
12-
load_capabilities_output,
13-
load_domain_output,
14-
load_experiment_output,
15-
load_solution_output,
16-
load_tasks_output,
17-
load_validation_output,
18-
save_areas_output,
19-
save_capabilities_output,
20-
save_domain_output,
21-
save_experiment_output,
22-
save_solution_output,
23-
save_tasks_output,
24-
save_validation_output,
12+
load_areas,
13+
load_capabilities,
14+
load_domain,
15+
load_experiment,
16+
load_solution,
17+
load_tasks,
18+
load_validation,
19+
save_areas,
20+
save_capabilities,
21+
save_domain,
22+
save_experiment,
23+
save_solution,
24+
save_tasks,
25+
save_validation,
2526
)
2627
from src.schemas.metadata_schemas import PipelineMetadata
2728
from src.schemas.solution_schemas import TaskSolution
@@ -46,19 +47,19 @@
4647
# Validation schemas
4748
"ValidationResult",
4849
# I/O functions - Save
49-
"save_experiment_output",
50-
"save_domain_output",
51-
"save_areas_output",
52-
"save_capabilities_output",
53-
"save_tasks_output",
54-
"save_solution_output",
55-
"save_validation_output",
50+
"save_experiment",
51+
"save_domain",
52+
"save_areas",
53+
"save_capabilities",
54+
"save_tasks",
55+
"save_solution",
56+
"save_validation",
5657
# I/O functions - Load
57-
"load_experiment_output",
58-
"load_domain_output",
59-
"load_areas_output",
60-
"load_capabilities_output",
61-
"load_tasks_output",
62-
"load_solution_output",
63-
"load_validation_output",
58+
"load_experiment",
59+
"load_domain",
60+
"load_areas",
61+
"load_capabilities",
62+
"load_tasks",
63+
"load_solution",
64+
"load_validation",
6465
]

src/schemas/area_schemas.py

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,14 @@
1-
"""Schemas for area generation stage."""
1+
"""Schemas for area generation stage (Stage 1).
2+
3+
Defines Area dataclass representing a domain area. Areas are high-level categories
4+
within a domain (e.g., "Budgeting" within "Personal Finance").
5+
"""
26

37
from dataclasses import dataclass, field
48
from typing import Dict, Optional
59

10+
from src.schemas.domain_schemas import Domain
11+
612

713
@dataclass
814
class Area:
@@ -11,18 +17,18 @@ class Area:
1117
name: str
1218
area_id: str
1319
description: Optional[str] = None
14-
domain: str = ""
15-
domain_id: str = ""
20+
domain: Optional[Domain] = None
1621
generation_metadata: Optional[Dict] = field(default_factory=dict)
1722

1823
def to_dict(self):
1924
"""Convert to dictionary."""
2025
result = {
2126
"name": self.name,
2227
"area_id": self.area_id,
23-
"domain": self.domain,
24-
"domain_id": self.domain_id,
2528
}
29+
if self.domain is not None:
30+
result["domain"] = self.domain.name
31+
result["domain_id"] = self.domain.domain_id
2632
if self.description is not None:
2733
result["description"] = self.description
2834
if self.generation_metadata:
@@ -32,11 +38,17 @@ def to_dict(self):
3238
@classmethod
3339
def from_dict(cls, data: dict):
3440
"""Create from dictionary."""
41+
domain = None
42+
if "domain" in data and "domain_id" in data:
43+
domain = Domain(
44+
name=data["domain"],
45+
domain_id=data["domain_id"],
46+
description=None,
47+
)
3548
return cls(
3649
name=data["name"],
3750
area_id=data["area_id"],
3851
description=data.get("description"),
39-
domain=data.get("domain", ""),
40-
domain_id=data.get("domain_id", ""),
52+
domain=domain,
4153
generation_metadata=data.get("generation_metadata", {}),
4254
)

src/schemas/capability_schemas.py

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,15 @@
1-
"""Schemas for capability generation stage."""
1+
"""Schemas for capability generation stage (Stage 2).
2+
3+
Defines Capability dataclass representing a capability within an area. Capabilities
4+
are specific skills or abilities (e.g., "Budget Creation" within "Budgeting" area).
5+
"""
26

37
from dataclasses import dataclass, field
48
from typing import Dict, Optional
59

10+
from src.schemas.area_schemas import Area
11+
from src.schemas.domain_schemas import Domain
12+
613

714
@dataclass
815
class Capability:
@@ -11,22 +18,21 @@ class Capability:
1118
name: str
1219
capability_id: str
1320
description: Optional[str] = None
14-
area: str = ""
15-
area_id: str = ""
16-
domain: str = ""
17-
domain_id: str = ""
21+
area: Optional[Area] = None
1822
generation_metadata: Optional[Dict] = field(default_factory=dict)
1923

2024
def to_dict(self):
2125
"""Convert to dictionary."""
2226
result = {
2327
"name": self.name,
2428
"capability_id": self.capability_id,
25-
"area": self.area,
26-
"area_id": self.area_id,
27-
"domain": self.domain,
28-
"domain_id": self.domain_id,
2929
}
30+
if self.area is not None:
31+
result["area"] = self.area.name
32+
result["area_id"] = self.area.area_id
33+
if self.area.domain is not None:
34+
result["domain"] = self.area.domain.name
35+
result["domain_id"] = self.area.domain.domain_id
3036
if self.description is not None:
3137
result["description"] = self.description
3238
if self.generation_metadata:
@@ -36,13 +42,25 @@ def to_dict(self):
3642
@classmethod
3743
def from_dict(cls, data: dict):
3844
"""Create from dictionary."""
45+
area = None
46+
if "area" in data and "area_id" in data:
47+
domain = None
48+
if "domain" in data and "domain_id" in data:
49+
domain = Domain(
50+
name=data["domain"],
51+
domain_id=data["domain_id"],
52+
description=None,
53+
)
54+
area = Area(
55+
name=data["area"],
56+
area_id=data["area_id"],
57+
description=None,
58+
domain=domain,
59+
)
3960
return cls(
4061
name=data["name"],
4162
capability_id=data["capability_id"],
4263
description=data.get("description"),
43-
area=data.get("area", ""),
44-
area_id=data.get("area_id", ""),
45-
domain=data.get("domain", ""),
46-
domain_id=data.get("domain_id", ""),
64+
area=area,
4765
generation_metadata=data.get("generation_metadata", {}),
4866
)

src/schemas/domain_schemas.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
"""Schemas for domain (Stage 0).
2+
3+
Defines Domain dataclass representing the domain being evaluated in the experiment.
4+
"""
5+
6+
from dataclasses import dataclass
7+
from typing import Optional
8+
9+
10+
@dataclass
11+
class Domain:
12+
"""Represents a domain."""
13+
14+
name: str
15+
domain_id: str
16+
description: Optional[str] = None
17+
18+
def to_dict(self):
19+
"""Convert to dictionary."""
20+
result = {
21+
"name": self.name,
22+
"domain_id": self.domain_id,
23+
}
24+
if self.description is not None:
25+
result["description"] = self.description
26+
return result
27+
28+
@classmethod
29+
def from_dict(cls, data: dict):
30+
"""Create from dictionary."""
31+
return cls(
32+
name=data["name"],
33+
domain_id=data["domain_id"],
34+
description=data.get("description"),
35+
)

0 commit comments

Comments
 (0)