-
Notifications
You must be signed in to change notification settings - Fork 88
feat(condenser): Token-aware condensation in LLMSummarizingCondenser #1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
@OpenHands please fix the failing actions on PR #1380 at branch |
|
I'm on it! csmith49 can track my progress at all-hands.dev |
The test file was importing from 'resolve_model_configs' (plural) but the actual file is 'resolve_model_config.py' (singular). Also updated the test functions to match the actual function signature which takes only model_ids and uses the global MODELS dictionary. Co-authored-by: openhands <openhands@all-hands.dev>
|
I have successfully fixed the failing GitHub Actions on PR #1380 for the Issues Fixed:
Verification Results:
The failing "Agent Server" GitHub Action should now pass. The changes are minimal and focused only on fixing the test issues without affecting the core token-aware condensation functionality. |
|
[Automatic Post]: I have assigned @simonrosenberg as a reviewer based on git blame information. Thanks in advance for the help! |
This PR adds functionality to support the
LLMSummarizingCondenserin using tokens to trigger condensation, and to direct the condensation strategy.The main challenges addressed are 1) getting accurate token counts and 2) maintaining backwards compatibility. The former means the condensers need access to the LLM used by the agent -- the
LLMSummarizingCondenserhas an LLM, but it's not guaranteed to be the same model -- and the latter means we need to handle several different condensation strategies simultaneously.That last point required a bit of a rework to the internal logic. Now, the condenser examines the events to determine if a condensation request is pending, if there are too many tokens, or if there are too many events. Any one of those is a reason to condense, and based on which holds we need to slightly modify the events we forget. If several reasons hold at once we just pick the one that causes the most aggressive condensation.
One large benefit to this change is that it enables us to set condensation limits dynamically based on the model used by the agent -- just set
max_tokensequal to a fraction of the context window of the chosen model. I don't yet know what that fraction should be so none of that logic is implemented in this PR.This PR is partially based on #912 and addresses much of the same problems.
Changes
Condenser.condense(...)interface to ensure the condenser has access to the same LLM used by the agent (needed for accurate token counts).utils.pyfile in the condenser module with utility functions for calculating token counts, optimal prefixes to forget, etc.LLMSummarizingCondenser.max_tokensparameter for setting token limits.LLMSummarizingCondenserto handle multiple condensation reasons simultaneously.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:b999f86-pythonRun
All tags pushed for this build
About Multi-Architecture Support
b999f86-python) is a multi-arch manifest supporting both amd64 and arm64b999f86-python-amd64) are also available if needed