Skip to content

Commit bd9fc33

Browse files
authored
Sub topic (#43)
* adding refactored task generation. updated prompts to ask for json outputs, and updated corresponding output parser. * fixed retry, json processing, and max token. * switichin to two phase task generation. * switichin to two phase task generation. part 2. * updated agentic config and readme. * simplified task generations. * simplified task generation. * fixed mypy errors. * ruff fix. * updated saved file name for solutions. * added extra details to agent solution messages. * fixed prompts. * fixed output dir name to include area name. * fixed task solver output dir name. * upgraded json handling, and model call. * updated readme to include latest agentic changes. * task diversity study scripts added. * updated prompts, and combination logic. * added resume, and retry.
1 parent eea799f commit bd9fc33

11 files changed

+1428
-0
lines changed
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Configuration for Diverse Task Generator
2+
3+
# Model settings
4+
model:
5+
name: gpt-4o # OpenAI model to use
6+
temperature: 1.0 # Temperature for all steps
7+
max_tokens: 8192 # Max tokens for all steps
8+
max_retries: 3 # Number of retry attempts for API calls
9+
retry_delay: 2.0 # Initial delay between retries in seconds (exponential backoff)
10+
11+
# Task generation settings
12+
generation:
13+
tasks_per_blueprint: 3 # Number of tasks to generate per blueprint
14+
min_subtopics: 3 # Suggested minimum number of sub-topics
15+
max_subtopics: 8 # Suggested maximum number of sub-topics
16+
17+
# Output settings
18+
output:
19+
base_dir: diverse_task_outputs
20+
save_intermediate_steps: true # Save each step's output
21+
pretty_print_json: true # Indent JSON files
22+
23+
# Input settings
24+
input:
25+
capability_json_path: capability.json # Default capability JSON file path
26+
27+
# Verification criteria
28+
verification:
29+
pass_threshold: 0.8 # Minimum pass rate to consider successful
30+
strict_mode: false # If true, all alignment criteria must pass
31+
32+
# Example capability for quick testing
33+
example_capability:
34+
name: "compound_interest_calculations"
35+
description: "The ability to calculate compound interest for various scenarios, including different compounding frequencies (annually, semi-annually, quarterly, monthly), different time periods, and understanding how changes in principal, rate, or time affect the final amount."
36+
domain: "personal_finance"
37+
area: "investing_and_savings"
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
"""Constants for diverse task generation."""
2+
3+
BLOOMS_TAXONOMY = {
4+
"Remember": {
5+
"description": "Recall or recognize facts, terms, and basic concepts. Example verbs: define, list, identify."
6+
},
7+
"Understand": {
8+
"description": "Explain ideas or concepts and interpret information in one's own words. Example verbs: summarize, describe, classify."
9+
},
10+
"Apply": {
11+
"description": "Use knowledge or methods in new but familiar situations. Example verbs: calculate, demonstrate, use, implement."
12+
},
13+
"Analyze": {
14+
"description": "Break information into parts and examine relationships or patterns. Example verbs: differentiate, compare, examine, infer."
15+
},
16+
"Evaluate": {
17+
"description": "Make judgments based on criteria and standards. Example verbs: justify, critique, assess, argue."
18+
},
19+
"Create": {
20+
"description": "Combine elements to form a new pattern, structure, or product. Example verbs: design, compose, formulate, generate."
21+
},
22+
}
23+
24+
DIFFICULTY_LEVELS = {
25+
"easy": {
26+
"description": "Involves direct recall, recognition, or simple application of knowledge and procedures."
27+
},
28+
"medium": {
29+
"description": "Requires connecting multiple ideas, performing multi-step reasoning, or applying knowledge in new but familiar contexts."
30+
},
31+
"hard": {
32+
"description": "Involves complex reasoning, integration of several sub-topics, or solving non-trivial problems that demand deeper conceptual understanding."
33+
},
34+
}
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
"""Dataclasses for the diverse task generation pipeline."""
2+
3+
from dataclasses import dataclass, field
4+
from typing import Dict, List, Optional
5+
6+
7+
@dataclass
8+
class Capability:
9+
"""Represents a capability to be tested."""
10+
11+
name: str
12+
description: str
13+
domain: str
14+
area: Optional[str] = None
15+
example_tasks: List[Dict] = field(default_factory=list)
16+
17+
18+
@dataclass
19+
class SubTopic:
20+
"""Represents a sub-topic within a capability."""
21+
22+
name: str
23+
description: Optional[str] = None
24+
25+
26+
@dataclass
27+
class Combination:
28+
"""Represents a valid (content, difficulty, reasoning) combination."""
29+
30+
content: str
31+
difficulty: str
32+
reasoning: str
33+
rationale: Optional[str] = None
34+
35+
36+
@dataclass
37+
class Blueprint:
38+
"""Represents a task blueprint for a specific combination."""
39+
40+
combination_id: int
41+
subtopic: str
42+
difficulty: str
43+
reasoning: str
44+
blueprint: str
45+
key_characteristics: List[str] = field(default_factory=list)
46+
example_question_outline: Optional[str] = None
47+
rationale: Optional[str] = None
48+
49+
50+
@dataclass
51+
class Task:
52+
"""Represents a generated multiple-choice task."""
53+
54+
task_id: str
55+
blueprint_id: int
56+
subtopic: str
57+
difficulty: str
58+
reasoning: str
59+
question: str
60+
choices: Dict[str, str]
61+
correct_answer: str
62+
explanation: Optional[str] = None
63+
alignment_notes: Optional[str] = None
64+
65+
66+
@dataclass
67+
class VerificationResult:
68+
"""Represents the verification result for a task."""
69+
70+
task_id: str
71+
subtopic_aligned: bool
72+
difficulty_aligned: bool
73+
reasoning_aligned: bool
74+
choices_appropriate: bool
75+
overall_aligned: bool
76+
feedback: str
77+
suggested_improvements: Optional[str] = None

0 commit comments

Comments
 (0)