feat: Adds think budget-forcing #107

yelkurdi · 2025-08-28T18:33:39Z

Implements think budget-forcing techniques
Generates response with budget forcing using the completions APIs.
This relies on multi-step raw autocompletion and assumes the model's output is structured in the following form:
<think> ... </think> summary answer
The budget forcing method is proposed in the paper: https://arxiv.org/abs/2501.19393
This implementation tries to follow the key outlines in the paper while ensuring stable and fail-safe operation.
This is performed via multi-step generation. The model will be called multiple times until requirements are met, in other words, the response will be assembled conditionally.

unit tests provided
uv run --with mellea test/stdlib_basics/test_think_budget_forcing.py

nrfulton · 2025-08-29T12:21:25Z

Thanks for the contribution!

The pre-commit checks found several type errors and a typo; could you please fix these prior to a code review?

You can run these checks locally by installing the pre-commit hooks. Assuming you have already created a venv and installed Mellea editable (uv pip install -e .), you can then install the pre-commit hooks by running the following commands in the root of your mellea checkout:

uv pip install -e . --group dev && pre-commit install

Once installed, you can still commit over errors using the -n (no-verify) flag; e.g., git commit -a -m 'this may not pass pre-commit checks' -n. However, prior to opening PR for review, ensure that the latest commit does pass the pre-commit checks.

yelkurdi · 2025-08-29T14:48:11Z

Thanks for the contribution!

The pre-commit checks found several type errors and a typo; could you please fix these prior to a code review?

You can run these checks locally by installing the pre-commit hooks. Assuming you have already created a venv and installed Mellea editable (uv pip install -e .), you can then install the pre-commit hooks by running the following commands in the root of your mellea checkout:
uv pip install -e . --group dev && pre-commit install
Once installed, you can still commit over errors using the -n (no-verify) flag; e.g., git commit -a -m 'this may not pass pre-commit checks' -n. However, prior to opening PR for review, ensure that the latest commit does pass the pre-commit checks.

I have fixed the type errors related to the modified code, however I still get pre-commit (MyPy) errors for code not related to this PR:

(mellea_tbf)yelkurdi@login3 mellea_tbf {think_bf} $ pre-commit run --all-files
Ruff formatter...........................................................Passed
Ruff linter..............................................................Passed
MyPy.....................................................................Failed
- hook id: mypy
- exit code: 1

mellea/stdlib/docs/richdocument.py:8: error: Cannot find implementation or library stub for module named "docling.datamodel.base_models"  [import-not-found]
mellea/stdlib/docs/richdocument.py:9: error: Cannot find implementation or library stub for module named "docling.datamodel.pipeline_options"  [import-not-found]
mellea/stdlib/docs/richdocument.py:10: error: Cannot find implementation or library stub for module named "docling.document_converter"  [import-not-found]
mellea/stdlib/docs/richdocument.py:11: error: Cannot find implementation or library stub for module named "docling_core.types.doc.document"  [import-not-found]
mellea/stdlib/docs/richdocument.py:12: error: Cannot find implementation or library stub for module named "docling_core.types.io"  [import-not-found]
mellea/backends/watsonx.py:9: error: Cannot find implementation or library stub for module named "ibm_watsonx_ai"  [import-not-found]
mellea/backends/watsonx.py:10: error: Cannot find implementation or library stub for module named "ibm_watsonx_ai.foundation_models"  [import-not-found]
mellea/backends/watsonx.py:11: error: Cannot find implementation or library stub for module named "ibm_watsonx_ai.foundation_models.schema"  [import-not-found]
mellea/backends/watsonx.py:11: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
mellea/backends/huggingface.py:366: error: Module "outlines.processors" has no attribute "RegexLogitsProcessor"  [attr-defined]
mellea/backends/huggingface.py:468: error: Module "outlines.processors" has no attribute "RegexLogitsProcessor"  [attr-defined]
Found 10 errors in 3 files (checked 34 source files)

uv-lock..................................................................Passed
codespell................................................................Passed

test/backends/test_think_budget_forcing/README.md

test/backends/test_think_budget_forcing/install.sh

test/backends/test_think_budget_forcing/run_test.sh

ramon-astudillo · 2025-09-02T19:27:15Z

@nrfulton let us know if there are any further changes needed. It would be good to know if we are missing something fundamental. This will inform the other PRs, thanks!.

yelkurdi · 2025-09-03T14:07:54Z

@nrfulton let us know if there are any further changes needed. It would be good to know if we are missing something fundamental. This will inform the other PRs, thanks!.

@ramon-astudillo @nrfulton after some recent updates to main, it seems that automatic checks fail for my branch. I modeled my tests using test_openai_vllm which requires the user to start the model server manually (using the serve.sh script) . I'm looking into adapting the code to the existing testing approach.

mergify · 2025-09-03T21:11:20Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Signed-off-by: Mateus Devino <mdevino@ibm.com>

…rompt_modules (generative-computing#105) * Implements "prompt_modules" and complete refactor of the "decompose" feature * typo: missing period * minor fix: changed the "NotRequired" import * fix: minor fixes * moves prompt_modules to utils * moves decompose modules to appropriate path * refactor: moves prompt_modules to cli scope Signed-off-by: Tulio Coppola <tulio.cppl@icloud.com> * adds README.md to write later Signed-off-by: Tulio Coppola <tulio.cppl@icloud.com> --------- Signed-off-by: Tulio Coppola <tulio.cppl@icloud.com> Co-authored-by: Tulio Coppola <tuliocoppola@ibm.com> Co-authored-by: Nathan Fulton <nathan@ibm.com>

…udget_forcing.py

yelkurdi · 2025-09-07T04:18:34Z

@nrfulton
Updated the PR as per our discussion, relocated the think budget forcing function from the backend into sampling_algos
To run the tests

cd test/stdlib_basics/test_think_budget_forcing
./install.sh
./run_test.sh

nrfulton · 2025-09-08T18:20:15Z

Thanks! Taking a look.

corrected default argument

ramon-astudillo · 2025-09-19T15:00:37Z

Hi @nrfulton any further concerns? if not, I suggest merging

nrfulton · 2025-09-19T16:24:48Z

@nrfulton Updated the PR as per our discussion, relocated the think budget forcing function from the backend into sampling_algos To run the tests
cd test/stdlib_basics/test_think_budget_forcing
./install.sh
./run_test.sh

A couple of questions:

Is there a way for this to implement the SamplingStrategy interface? It's less important that the implementation is in stdlib per se and more important that it's implementing the SamplingStrategy interface, so that it can be used where-ever that interface is expected.
What is it about these tests that requires standing up a custom vllm server? Shouldn't ollama with granite 4.0 tiny suffice to test this functionality?

yelkurdi · 2025-09-19T17:05:36Z

mellea/stdlib/sampling_algos/budget_forcing_alg.py

+        if rem_toks <= min_step_len:  # minimum step length reached
+            break
+
+        model_options["max_tokens"] = rem_toks


model_options

yelkurdi · 2025-09-19T17:05:47Z

mellea/stdlib/sampling_algos/budget_forcing.py

+        model_options["max_tokens"] = rem_toks
+        # TODO workaround to obtain generated token counts
+        # The token count should be relayed by openai's CompletionUsage
+        model_options["logprobs"] = 1  # To get number of generated tokens


model_options

yelkurdi · 2025-09-19T17:22:09Z

@nrfulton Updated the PR as per our discussion, relocated the think budget forcing function from the backend into sampling_algos To run the tests
cd test/stdlib_basics/test_think_budget_forcing
./install.sh
./run_test.sh
A couple of questions:

Is there a way for this to implement the SamplingStrategy interface? It's less important that the implementation is in stdlib per se and more important that it's implementing the SamplingStrategy interface, so that it can be used where-ever that interface is expected.

What is it about these tests that requires standing up a custom vllm server? Shouldn't ollama with granite 4.0 tiny suffice to test this functionality?

This is a lower level algorithm that requires direct access to the model_options as in here: https://github.com/generative-computing/mellea/pull/107/files#r2363700427 and here: https://github.com/generative-computing/mellea/pull/107/files#r2363701247
In order to implement it as a strategy we need to provide a mechanism for a strategy class to obtain a model_options object and modify it on the fly from the context. This in addition to overloading the generate function within the strategy class, which I think I can do.
It does not need it in general, that was for local testing. I created the PR at an early stage and I copied one of your test cases that uses vLLM. It is not possible to start an Ollama server on a computing cluster where I typically work. I can take another look at and see if I can execute it from my laptop.

nrfulton · 2025-09-22T14:45:33Z

Blocked on #160 after sync conversation between @yelkurdi and myself.

…nterface

yelkurdi · 2025-10-23T12:24:12Z

After recent discussions, this PR requires prompting _generate_from_raw to public #208

…to public `generate_from_raw`

yelkurdi · 2025-11-06T20:54:56Z

@nrfulton @ramon-astudillo This PR is ready for review, it incorporates all the recent updates involving the generate_from_raw generation function as per the PR #219
cc: @avinash2692 @jakelorocco

Yousef El-Kurdi added 2 commits August 28, 2025 17:03

Initial commit - think budget-forcing - tests run - WIP

bd503cf

adds zero-think case

9385af8

nrfulton self-requested a review August 29, 2025 11:40

resolved type checking errors

fda0768

ramon-astudillo reviewed Aug 29, 2025

View reviewed changes

test/backends/test_think_budget_forcing/README.md Outdated Show resolved Hide resolved

test/backends/test_think_budget_forcing/install.sh Outdated Show resolved Hide resolved

test/backends/test_think_budget_forcing/run_test.sh Outdated Show resolved Hide resolved

Yousef El-Kurdi and others added 2 commits August 29, 2025 19:16

fixes typo and some scripts

ff03b6f

Merge branch 'main' into think_bf

d73c1ac

Merge branch 'main' into think_bf

1df37d5

Yousef El-Kurdi and others added 9 commits September 5, 2025 03:22

backend interface using _raw_generate

e013f92

Bump version number from 0.0.2 to 0.0.3 (generative-computing#117)

556634b

ci: Rename .mergify.yml to mergify.yml (generative-computing#119)

6b6599d

docs: fix typo on README (generative-computing#116)

396bf7a

Signed-off-by: Mateus Devino <mdevino@ibm.com>

moved the budget forcing function into mellea/stdlib/sampling_algos/b…

75e3d0e

…udget_forcing.py

adds budget forcing fn

fd7a3b3

Merge branch 'main' into think_bf

8f1a820

feat: adds think budget forcing - relocated test dir

599eac1

yelkurdi changed the title ~~Think budget-forcing~~ feat: Adds think budget-forcing Sep 7, 2025

Update budget_forcing.py

8098128

corrected default argument

Merge branch 'main' into think_bf

56a828a

yelkurdi commented Sep 19, 2025

View reviewed changes

yelkurdi marked this pull request as draft October 3, 2025 21:59

yelkurdi and others added 13 commits October 6, 2025 10:50

Merge branch 'main' into think_bf

3535b65

merging main in-progress

ad076c5

Merge branch 'main' into think_bf

66ae952

main branch updates

05c8185

updates to think_budget_forcing function to match sampling strategy i…

80e8485

…nterface

adds sampling strategy for budget forcing

7f2c8f1

minor fixes

2493ca1

feat: ollama generate_from_raw uses existing event loop

dbadd21

Merge branch 'main' into think_bf

4396f81

fix: add blocking prevention mech

f4dc004

Merge branch 'main' into jal/ollama-generate-from-raw

c143ce4

Merge branch 'jal/ollama-generate-from-raw' into think_bf

99b3156

fixes of async inconsistencies and incorporating Jacob's branch

8d91627

yelkurdi added 5 commits November 4, 2025 09:07

Merge branch 'main' into think_bf

d0c9e41

updates interface significantly after prompting _generate_from_raw …

8796661

…to public `generate_from_raw`

minor fix to test case

5664a8d

minor updates

d83fb84

Merge branch 'main' into think_bf

1a999b9

yelkurdi marked this pull request as ready for review November 6, 2025 20:51

feat: Adds think budget-forcing #107

Are you sure you want to change the base?

feat: Adds think budget-forcing #107

Conversation

yelkurdi commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrfulton commented Aug 29, 2025

Uh oh!

yelkurdi commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ramon-astudillo commented Sep 2, 2025

Uh oh!

yelkurdi commented Sep 3, 2025

Uh oh!

mergify bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

Uh oh!

yelkurdi commented Sep 7, 2025

Uh oh!

nrfulton commented Sep 8, 2025

Uh oh!

ramon-astudillo commented Sep 19, 2025

Uh oh!

nrfulton commented Sep 19, 2025

Uh oh!

yelkurdi Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

yelkurdi Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

yelkurdi commented Sep 19, 2025

Uh oh!

nrfulton commented Sep 22, 2025

Uh oh!

yelkurdi commented Oct 23, 2025

Uh oh!

yelkurdi commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

yelkurdi commented Aug 28, 2025 •

edited

Loading

mergify bot commented Sep 3, 2025 •

edited

Loading