Feature/reward task #53

finitearth · 2025-07-18T09:25:08Z

implements new tasks: RewardTask (accepts a reward function mapping from the prediction to a score), and JudgeTask (uses an LLM to score the responses. Optionally also accepts groundtruth labels, allowing for "fuzzy matches").
core functionalities of classification task has been moved to base task to prevent code duplication for other tasks
CAPO now accepts input parameter "check_fs_accuracy" (default True) - in case of reward tasks the accuracy cannot be evaluated, so we will take the prediction of the downstream_llm as target of fs.
CAPO also accepts "create_fs_reasoning" (default is True): if set to false, just use input-output pairs from df_few_shots
introduces tag-extraction function, to centralize repeated code for extractions like "<final_answer>5</final_answer>"
boosted test coverage

github-actions · 2025-07-18T09:40:55Z

Tests	Skipped	Failures	Errors	Time
84	0 💤	0 ❌	0 🔥	0.960s ⏱️

…omptolution into feature/RewardTask

Copilot

Pull Request Overview

This PR implements new task types for reward-based and LLM-as-judge evaluation, refactors the task architecture to reduce code duplication, and introduces several utility functions to improve functionality and test coverage.

Implements RewardTask (accepts reward functions for prediction scoring) and JudgeTask (uses LLM to score responses with optional ground truth)
Refactors core evaluation functionality from ClassificationTask to BaseTask to enable code reuse across different task types
Adds utility functions for tag extraction and improves CAPO to handle scenarios where accuracy cannot be evaluated

Reviewed Changes

Copilot reviewed 38 out of 39 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
promptolution/tasks/base_task.py	Major refactor moving evaluation logic from ClassificationTask to enable inheritance by new task types
promptolution/tasks/reward_tasks.py	New RewardTask implementation for scoring predictions with custom reward functions
promptolution/tasks/judge_tasks.py	New JudgeTask implementation for LLM-based evaluation with optional ground truth
promptolution/utils/formatting.py	New utility module for tag extraction functionality
promptolution/optimizers/capo.py	Added check_fs_accuracy parameter to handle reward tasks without ground truth
tests/	Comprehensive test coverage for new functionality and updated existing tests

tests/mocks/mock_predictor.py

tests/conftest.py

promptolution/tasks/judge_tasks.py

tests/tasks/test_classifications_tasks.py

promptolution/utils/formatting.py

finitearth · 2025-07-21T14:18:52Z

tests are red right now, fix is in next PR

finitearth added 6 commits July 17, 2025 15:36

initial commit

5adc751

fixed tests

4565056

boost test coverage

cf1b5bc

improve docstrings

6fa4163

allow for non ground truth checks in capo)

cad81e8

fix calculate score tests

4af7ce1

finitearth and others added 3 commits July 18, 2025 09:40

Update coverage badge in README [skip ci]

f9310fc

remove useless commments

5df4221

Merge branch 'feature/RewardTask' of https://github.com/finitearth/pr…

a568f13

…omptolution into feature/RewardTask

finitearth mentioned this pull request Jul 18, 2025

"Unsupervised" Prompt Tuning #22

Open

finitearth requested a review from Copilot July 18, 2025 11:44

Copilot AI reviewed Jul 18, 2025

View reviewed changes

finitearth added 5 commits July 18, 2025 13:48

make reasoning for fs optional in CAPO

72f82e1

remove prints

e2d4b60

Update formatting.py

a334229

update tutorials

aebe5be

formatting

b7e34a9

finitearth marked this pull request as ready for review July 21, 2025 14:18

finitearth requested a review from mo374z as a code owner July 21, 2025 14:18

finitearth requested a review from timo282 July 22, 2025 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/reward task #53

Feature/reward task #53

Uh oh!

finitearth commented Jul 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

finitearth commented Jul 21, 2025

Uh oh!

Uh oh!

Feature/reward task #53

Are you sure you want to change the base?

Feature/reward task #53

Uh oh!

Conversation

finitearth commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

finitearth commented Jul 21, 2025

Uh oh!

Uh oh!

finitearth commented Jul 18, 2025 •

edited

Loading