-
Notifications
You must be signed in to change notification settings - Fork 60
LCORE-1137: store info about consumed tokens into token usage history #958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LCORE-1137: store info about consumed tokens into token usage history #958
Conversation
WalkthroughEndpoint handlers now forward token usage history plus explicit model and provider identifiers into quota consumption. The quota utility signature was extended to accept Changes
Sequence Diagram(s)(omitted — changes are limited to parameter propagation and an additional optional call; control flow not substantially changed) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @src/utils/quota.py:
- Around line 17-25: The consume_tokens function currently has 7 parameters and
the docstring lacks documentation for token_usage_history; minimally, add a
param entry for token_usage_history to the consume_tokens docstring describing
its type and purpose (Optional[TokenUsageHistory], used to record or check past
token consumption), and leave a TODO note to refactor the signature later; for a
longer-term fix to satisfy pylint R0913/R0917, refactor to accept a dataclass
(e.g., TokenConsumptionContext with user_id, input_tokens, output_tokens,
model_id, provider_id, and optional token_usage_history) and update call sites
in query.py and streaming_query_v2.py accordingly.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/query.pysrc/app/endpoints/streaming_query_v2.pysrc/utils/quota.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.pysrc/utils/quota.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (1)
src/configuration.py (2)
configuration(94-105)token_usage_history(310-334)
src/app/endpoints/query.py (1)
src/configuration.py (2)
configuration(94-105)token_usage_history(310-334)
src/utils/quota.py (3)
src/quota/quota_exceed_error.py (1)
QuotaExceedError(6-53)src/configuration.py (1)
token_usage_history(310-334)src/quota/token_usage_history.py (1)
TokenUsageHistory(34-201)
🪛 GitHub Actions: Python linter
src/utils/quota.py
[error] 17-17: Pylint: Too many arguments (7/5) (R0913) in function call. Step: uv run pylint src tests
[error] 17-17: Pylint: Too many positional arguments (7/5) (R0917) in function call. Step: uv run pylint src tests
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / azure
- GitHub Check: E2E: server mode / vertexai
- GitHub Check: E2E: library mode / azure
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / vertexai
🔇 Additional comments (3)
src/utils/quota.py (1)
39-47: LGTM!The token usage history recording logic is correctly implemented. The conditional check ensures recording only occurs when
token_usage_historyis provided, and all required parameters are properly passed to theconsume_tokensmethod.src/app/endpoints/streaming_query_v2.py (1)
276-284: LGTM!The
consume_tokenscall has been correctly updated to pass the new required parameters (token_usage_history,model_id, andprovider_id). The values are appropriately sourced fromconfiguration.token_usage_historyand thecontextobject.src/app/endpoints/query.py (1)
388-396: LGTM!The
consume_tokenscall has been correctly updated to pass the new required parameters (token_usage_history,model_id, andprovider_id). The values are appropriately sourced fromconfiguration.token_usage_historyand the localmodel_idandprovider_idvariables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @tests/integration/endpoints/test_query_v2_integration.py:
- Around line 1150-1157: Add an assertion that verifies the token_usage_history
argument is passed to the quota consumer: after capturing consume_args from
mock_consume.call_args, assert that consume_args.kwargs["token_usage_history"]
equals the expected token_usage_history (or, if token_usage_history is passed
positionally, assert the correct args[index] matches the expected
token_usage_history). Use the same expected token usage object used in the test
setup to compare; reference mock_consume / consume_tokens via consume_args to
locate where to add the assertion.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/utils/quota.pytests/integration/endpoints/test_query_v2_integration.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/quota.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/integration/endpoints/test_query_v2_integration.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/integration/endpoints/test_query_v2_integration.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / vertexai
- GitHub Check: E2E: server mode / azure
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / vertexai
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / azure
| mock_consume.assert_called_once() | ||
| consume_args = mock_consume.call_args | ||
| user_id, _, _, _ = test_auth | ||
| assert consume_args.args[1] == user_id | ||
| assert consume_args.args[2] == user_id | ||
| assert consume_args.kwargs["model_id"] == "test-model" | ||
| assert consume_args.kwargs["provider_id"] == "test-provider" | ||
| assert consume_args.kwargs["input_tokens"] == 100 | ||
| assert consume_args.kwargs["output_tokens"] == 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assertion to verify token_usage_history parameter is passed to consume_tokens.
The PR objective is to "store info about consumed tokens into token usage history," and the AI summary confirms that token_usage_history is now passed to the quota consumer. However, this test doesn't verify that the token_usage_history parameter is actually being passed to consume_tokens.
The test verifies model_id, provider_id, input_tokens, and output_tokens, but the core feature—passing token usage history—is not validated.
🔎 Suggested assertion to add
mock_consume.assert_called_once()
consume_args = mock_consume.call_args
user_id, _, _, _ = test_auth
assert consume_args.args[2] == user_id
+assert consume_args.kwargs.get("token_usage_history") is not None
assert consume_args.kwargs["model_id"] == "test-model"
assert consume_args.kwargs["provider_id"] == "test-provider"
assert consume_args.kwargs["input_tokens"] == 100
assert consume_args.kwargs["output_tokens"] == 50Alternatively, if token_usage_history is in positional args, verify it at the appropriate index (likely args[0] or args[1]):
mock_consume.assert_called_once()
consume_args = mock_consume.call_args
user_id, _, _, _ = test_auth
+# Verify token_usage_history is passed (adjust index based on actual signature)
+assert consume_args.args[0] is not None or consume_args.args[1] is not None
assert consume_args.args[2] == user_id
assert consume_args.kwargs["model_id"] == "test-model"
assert consume_args.kwargs["provider_id"] == "test-provider"
assert consume_args.kwargs["input_tokens"] == 100
assert consume_args.kwargs["output_tokens"] == 50🤖 Prompt for AI Agents
In @tests/integration/endpoints/test_query_v2_integration.py around lines
1150-1157, Add an assertion that verifies the token_usage_history argument is
passed to the quota consumer: after capturing consume_args from
mock_consume.call_args, assert that consume_args.kwargs["token_usage_history"]
equals the expected token_usage_history (or, if token_usage_history is passed
positionally, assert the correct args[index] matches the expected
token_usage_history). Use the same expected token usage object used in the test
setup to compare; reference mock_consume / consume_tokens via consume_args to
locate where to add the assertion.
aa14608 to
12d6728
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/utils/quota.py (1)
17-26: Consider the dataclass refactoring from the previous review.The pylint disable comment suppresses R0913/R0917 warnings but doesn't address the root cause. While the minimal fix (adding docstring for
token_usage_history) has been applied, the refactoring to aTokenConsumptionContextdataclass from the previous review remains a recommended improvement for maintainability.
🧹 Nitpick comments (1)
src/utils/quota.py (1)
35-36: Enhance parameter descriptions for clarity.The descriptions "Model identification" and "Provider identification" are terse. Consider adding examples or more context, e.g., "Identifier of the model being used (e.g., 'gpt-4')" and "Identifier of the provider supplying the model (e.g., 'openai')".
As per coding guidelines, docstrings should provide clear, descriptive explanations following Google Python conventions.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/utils/quota.pytests/integration/endpoints/test_query_v2_integration.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integration/endpoints/test_query_v2_integration.py
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/utils/quota.py
🧬 Code graph analysis (1)
src/utils/quota.py (5)
src/models/responses.py (2)
QuotaExceededResponse(1671-1780)model(1732-1745)src/quota/quota_exceed_error.py (1)
QuotaExceedError(6-53)src/quota/quota_limiter.py (1)
QuotaLimiter(49-187)src/configuration.py (1)
token_usage_history(310-334)src/quota/token_usage_history.py (1)
TokenUsageHistory(34-201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / azure
- GitHub Check: E2E: server mode / azure
- GitHub Check: E2E: library mode / vertexai
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / vertexai
🔇 Additional comments (2)
src/utils/quota.py (2)
3-3: LGTM!The new imports correctly support the token usage history feature. The
Optionaltype annotation andTokenUsageHistoryimport follow the coding guidelines for absolute imports and modern type hints.Also applies to: 12-12
41-49: Correct exception handling in token usage history recording.
TokenUsageHistory.consume_tokens()can raise database exceptions (e.g.,psycopg2.OperationalError,sqlite3.Error) during cursor execution, which will prevent the subsequent quota limiter consumption loop from executing. Consider whether database failures should fail-fast or be logged and allowed to proceed with quota consumption. Note that the current behavior is symmetric with quota limiters, which also lack exception handling.
Description
LCORE-1137: store info about consumed tokens into token usage history
Type of change
Tools used to create PR
Related Tickets & Documents
Summary by CodeRabbit
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.