Skip to content

Conversation

@yelkurdi
Copy link
Contributor

@yelkurdi yelkurdi commented Aug 28, 2025

Implements think budget-forcing techniques
Generates response with budget forcing using the completions APIs.
This relies on multi-step raw autocompletion and assumes the model's output is structured in the following form:
<think> ... </think> summary answer
The budget forcing method is proposed in the paper: https://arxiv.org/abs/2501.19393
This implementation tries to follow the key outlines in the paper while ensuring stable and fail-safe operation.
This is performed via multi-step generation. The model will be called multiple times until requirements are met, in other words, the response will be assembled conditionally.

unit tests provided
uv run --with mellea test/stdlib_basics/test_think_budget_forcing.py

@nrfulton nrfulton self-requested a review August 29, 2025 11:40
@nrfulton
Copy link
Contributor

Thanks for the contribution!

The pre-commit checks found several type errors and a typo; could you please fix these prior to a code review?

You can run these checks locally by installing the pre-commit hooks. Assuming you have already created a venv and installed Mellea editable (uv pip install -e .), you can then install the pre-commit hooks by running the following commands in the root of your mellea checkout:

uv pip install -e . --group dev && pre-commit install

Once installed, you can still commit over errors using the -n (no-verify) flag; e.g., git commit -a -m 'this may not pass pre-commit checks' -n. However, prior to opening PR for review, ensure that the latest commit does pass the pre-commit checks.

@yelkurdi
Copy link
Contributor Author

Thanks for the contribution!

The pre-commit checks found several type errors and a typo; could you please fix these prior to a code review?

You can run these checks locally by installing the pre-commit hooks. Assuming you have already created a venv and installed Mellea editable (uv pip install -e .), you can then install the pre-commit hooks by running the following commands in the root of your mellea checkout:

uv pip install -e . --group dev && pre-commit install

Once installed, you can still commit over errors using the -n (no-verify) flag; e.g., git commit -a -m 'this may not pass pre-commit checks' -n. However, prior to opening PR for review, ensure that the latest commit does pass the pre-commit checks.

I have fixed the type errors related to the modified code, however I still get pre-commit (MyPy) errors for code not related to this PR:

(mellea_tbf)yelkurdi@login3 mellea_tbf {think_bf} $ pre-commit run --all-files
Ruff formatter...........................................................Passed
Ruff linter..............................................................Passed
MyPy.....................................................................Failed
- hook id: mypy
- exit code: 1

mellea/stdlib/docs/richdocument.py:8: error: Cannot find implementation or library stub for module named "docling.datamodel.base_models"  [import-not-found]
mellea/stdlib/docs/richdocument.py:9: error: Cannot find implementation or library stub for module named "docling.datamodel.pipeline_options"  [import-not-found]
mellea/stdlib/docs/richdocument.py:10: error: Cannot find implementation or library stub for module named "docling.document_converter"  [import-not-found]
mellea/stdlib/docs/richdocument.py:11: error: Cannot find implementation or library stub for module named "docling_core.types.doc.document"  [import-not-found]
mellea/stdlib/docs/richdocument.py:12: error: Cannot find implementation or library stub for module named "docling_core.types.io"  [import-not-found]
mellea/backends/watsonx.py:9: error: Cannot find implementation or library stub for module named "ibm_watsonx_ai"  [import-not-found]
mellea/backends/watsonx.py:10: error: Cannot find implementation or library stub for module named "ibm_watsonx_ai.foundation_models"  [import-not-found]
mellea/backends/watsonx.py:11: error: Cannot find implementation or library stub for module named "ibm_watsonx_ai.foundation_models.schema"  [import-not-found]
mellea/backends/watsonx.py:11: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
mellea/backends/huggingface.py:366: error: Module "outlines.processors" has no attribute "RegexLogitsProcessor"  [attr-defined]
mellea/backends/huggingface.py:468: error: Module "outlines.processors" has no attribute "RegexLogitsProcessor"  [attr-defined]
Found 10 errors in 3 files (checked 34 source files)

uv-lock..................................................................Passed
codespell................................................................Passed

@ramon-astudillo
Copy link

@nrfulton let us know if there are any further changes needed. It would be good to know if we are missing something fundamental. This will inform the other PRs, thanks!.

@yelkurdi
Copy link
Contributor Author

yelkurdi commented Sep 3, 2025

@nrfulton let us know if there are any further changes needed. It would be good to know if we are missing something fundamental. This will inform the other PRs, thanks!.

@ramon-astudillo @nrfulton after some recent updates to main, it seems that automatic checks fail for my branch. I modeled my tests using test_openai_vllm which requires the user to start the model server manually (using the serve.sh script) . I'm looking into adapting the code to the existing testing approach.

@mergify
Copy link

mergify bot commented Sep 3, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Yousef El-Kurdi and others added 9 commits September 5, 2025 03:22
Signed-off-by: Mateus Devino <mdevino@ibm.com>
…rompt_modules (generative-computing#105)

* Implements "prompt_modules" and complete refactor of the "decompose" feature

* typo: missing period

* minor fix: changed the "NotRequired" import

* fix: minor fixes

* moves prompt_modules to utils

* moves decompose modules to appropriate path

* refactor: moves prompt_modules to cli scope

Signed-off-by: Tulio Coppola <tulio.cppl@icloud.com>

* adds README.md to write later

Signed-off-by: Tulio Coppola <tulio.cppl@icloud.com>

---------

Signed-off-by: Tulio Coppola <tulio.cppl@icloud.com>
Co-authored-by: Tulio Coppola <tuliocoppola@ibm.com>
Co-authored-by: Nathan Fulton <nathan@ibm.com>
@yelkurdi
Copy link
Contributor Author

yelkurdi commented Sep 7, 2025

@nrfulton
Updated the PR as per our discussion, relocated the think budget forcing function from the backend into sampling_algos
To run the tests

cd test/stdlib_basics/test_think_budget_forcing
./install.sh
./run_test.sh

@yelkurdi yelkurdi changed the title Think budget-forcing feat: Adds think budget-forcing Sep 7, 2025
@nrfulton
Copy link
Contributor

nrfulton commented Sep 8, 2025

Thanks! Taking a look.

corrected default argument
@ramon-astudillo
Copy link

Hi @nrfulton any further concerns? if not, I suggest merging

@nrfulton
Copy link
Contributor

@nrfulton Updated the PR as per our discussion, relocated the think budget forcing function from the backend into sampling_algos To run the tests

cd test/stdlib_basics/test_think_budget_forcing
./install.sh
./run_test.sh

A couple of questions:

  1. Is there a way for this to implement the SamplingStrategy interface? It's less important that the implementation is in stdlib per se and more important that it's implementing the SamplingStrategy interface, so that it can be used where-ever that interface is expected.
  2. What is it about these tests that requires standing up a custom vllm server? Shouldn't ollama with granite 4.0 tiny suffice to test this functionality?

if rem_toks <= min_step_len: # minimum step length reached
break

model_options["max_tokens"] = rem_toks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_options

model_options["max_tokens"] = rem_toks
# TODO workaround to obtain generated token counts
# The token count should be relayed by openai's CompletionUsage
model_options["logprobs"] = 1 # To get number of generated tokens
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_options

@yelkurdi
Copy link
Contributor Author

@nrfulton Updated the PR as per our discussion, relocated the think budget forcing function from the backend into sampling_algos To run the tests

cd test/stdlib_basics/test_think_budget_forcing
./install.sh
./run_test.sh

A couple of questions:

  1. Is there a way for this to implement the SamplingStrategy interface? It's less important that the implementation is in stdlib per se and more important that it's implementing the SamplingStrategy interface, so that it can be used where-ever that interface is expected.
  2. What is it about these tests that requires standing up a custom vllm server? Shouldn't ollama with granite 4.0 tiny suffice to test this functionality?
  1. This is a lower level algorithm that requires direct access to the model_options as in here: https://github.com/generative-computing/mellea/pull/107/files#r2363700427 and here: https://github.com/generative-computing/mellea/pull/107/files#r2363701247
    In order to implement it as a strategy we need to provide a mechanism for a strategy class to obtain a model_options object and modify it on the fly from the context. This in addition to overloading the generate function within the strategy class, which I think I can do.

  2. It does not need it in general, that was for local testing. I created the PR at an early stage and I copied one of your test cases that uses vLLM. It is not possible to start an Ollama server on a computing cluster where I typically work. I can take another look at and see if I can execute it from my laptop.

@nrfulton
Copy link
Contributor

Blocked on #160 after sync conversation between @yelkurdi and myself.

@yelkurdi yelkurdi marked this pull request as draft October 3, 2025 21:59
@yelkurdi
Copy link
Contributor Author

After recent discussions, this PR requires prompting _generate_from_raw to public #208

@yelkurdi yelkurdi marked this pull request as ready for review November 6, 2025 20:51
@yelkurdi
Copy link
Contributor Author

yelkurdi commented Nov 6, 2025

@nrfulton @ramon-astudillo This PR is ready for review, it incorporates all the recent updates involving the generate_from_raw generation function as per the PR #219
cc: @avinash2692 @jakelorocco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants