LCORE-1137: store info about consumed tokens into token usage history #958

tisnik · 2026-01-05T09:42:40Z

Description

LCORE-1137: store info about consumed tokens into token usage history

Type of change

Tools used to create PR

Assisted-by: N/A
Generated by: N/A

Related Tickets & Documents

Related Issue #LCORE-1137

Summary by CodeRabbit

Improvements
- Enhanced token quota tracking to record richer usage history and pass explicit model and provider context during consumption for more accurate accounting and reporting.
Tests
- Updated integration tests to validate the new token usage history and model/provider parameters are passed during quota consumption.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-05T09:42:50Z

Walkthrough

Endpoint handlers now forward token usage history plus explicit model and provider identifiers into quota consumption. The quota utility signature was extended to accept token_usage_history, model_id, and provider_id, and it records usage history when provided.

Changes

Cohort / File(s)	Summary
Endpoint Call Sites `src/app/endpoints/query.py`, `src/app/endpoints/streaming_query_v2.py`	Calls to `consume_tokens` updated to pass `token_usage_history`, `model_id`, and `provider_id`.
Quota Utility Implementation `src/utils/quota.py`	`consume_tokens` signature extended to include `token_usage_history: Optional[TokenUsageHistory]`, `model_id: str`, and `provider_id: str`. Adds imports and optional recording via `token_usage_history.consume_tokens(...)` before applying quota limiters; docstring updated.
Tests `tests/integration/endpoints/test_query_v2_integration.py`	Assertions updated to match shifted argument positions and to verify `model_id="test-model"` and `provider_id="test-provider"` are passed to `consume_tokens`.

Sequence Diagram(s)

(omitted — changes are limited to parameter propagation and an additional optional call; control flow not substantially changed)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

LCORE-1137: token history table #939: Introduces the TokenUsageHistory class and its consume_tokens method that this change calls when history is provided.
LCORE-741: consume quota mechanism #722: Originally introduced the utils.quota.consume_tokens API that this PR extends.

Suggested reviewers

manstis
umago

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: extending the consume_tokens function to record token usage information into the token usage history by passing token_usage_history, model_id, and provider_id parameters across multiple endpoint files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Fix all issues with AI Agents 🤖

In @src/utils/quota.py:
- Around line 17-25: The consume_tokens function currently has 7 parameters and
the docstring lacks documentation for token_usage_history; minimally, add a
param entry for token_usage_history to the consume_tokens docstring describing
its type and purpose (Optional[TokenUsageHistory], used to record or check past
token consumption), and leave a TODO note to refactor the signature later; for a
longer-term fix to satisfy pylint R0913/R0917, refactor to accept a dataclass
(e.g., TokenConsumptionContext with user_id, input_tokens, output_tokens,
model_id, provider_id, and optional token_usage_history) and update call sites
in query.py and streaming_query_v2.py accordingly.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 049a7ed and ab47686.

📒 Files selected for processing (3)

src/app/endpoints/query.py
src/app/endpoints/streaming_query_v2.py
src/utils/quota.py

🧰 Additional context used

📓 Path-based instructions (3)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query.py
src/utils/quota.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query.py

src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

src/app/endpoints/streaming_query_v2.py
src/app/endpoints/query.py

🧬 Code graph analysis (3)

src/app/endpoints/streaming_query_v2.py (1)

src/configuration.py (2)

configuration (94-105)

token_usage_history (310-334)

src/app/endpoints/query.py (1)

src/configuration.py (2)

configuration (94-105)

token_usage_history (310-334)

src/utils/quota.py (3)

src/quota/quota_exceed_error.py (1)

QuotaExceedError (6-53)

src/configuration.py (1)

token_usage_history (310-334)

src/quota/token_usage_history.py (1)

TokenUsageHistory (34-201)

🪛 GitHub Actions: Python linter

src/utils/quota.py

[error] 17-17: Pylint: Too many arguments (7/5) (R0913) in function call. Step: uv run pylint src tests

[error] 17-17: Pylint: Too many positional arguments (7/5) (R0917) in function call. Step: uv run pylint src tests

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: build-pr
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E: server mode / azure
GitHub Check: E2E: server mode / vertexai
GitHub Check: E2E: library mode / azure
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / vertexai

🔇 Additional comments (3)

src/utils/quota.py (1)

39-47: LGTM!

The token usage history recording logic is correctly implemented. The conditional check ensures recording only occurs when token_usage_history is provided, and all required parameters are properly passed to the consume_tokens method.

src/app/endpoints/streaming_query_v2.py (1)

276-284: LGTM!

The consume_tokens call has been correctly updated to pass the new required parameters (token_usage_history, model_id, and provider_id). The values are appropriately sourced from configuration.token_usage_history and the context object.

src/app/endpoints/query.py (1)

388-396: LGTM!

The consume_tokens call has been correctly updated to pass the new required parameters (token_usage_history, model_id, and provider_id). The values are appropriately sourced from configuration.token_usage_history and the local model_id and provider_id variables.

src/utils/quota.py

coderabbitai

Actionable comments posted: 1

Fix all issues with AI Agents 🤖

In @tests/integration/endpoints/test_query_v2_integration.py:
- Around line 1150-1157: Add an assertion that verifies the token_usage_history
argument is passed to the quota consumer: after capturing consume_args from
mock_consume.call_args, assert that consume_args.kwargs["token_usage_history"]
equals the expected token_usage_history (or, if token_usage_history is passed
positionally, assert the correct args[index] matches the expected
token_usage_history). Use the same expected token usage object used in the test
setup to compare; reference mock_consume / consume_tokens via consume_args to
locate where to add the assertion.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab47686 and aa14608.

📒 Files selected for processing (2)

src/utils/quota.py
tests/integration/endpoints/test_query_v2_integration.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/utils/quota.py

🧰 Additional context used

📓 Path-based instructions (2)

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

tests/integration/endpoints/test_query_v2_integration.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

tests/integration/endpoints/test_query_v2_integration.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: build-pr
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E: library mode / vertexai
GitHub Check: E2E: server mode / azure
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / vertexai
GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / azure

coderabbitai · 2026-01-05T09:59:38Z

tests/integration/endpoints/test_query_v2_integration.py

    mock_consume.assert_called_once()
    consume_args = mock_consume.call_args
    user_id, _, _, _ = test_auth
-    assert consume_args.args[1] == user_id
+    assert consume_args.args[2] == user_id
+    assert consume_args.kwargs["model_id"] == "test-model"
+    assert consume_args.kwargs["provider_id"] == "test-provider"
    assert consume_args.kwargs["input_tokens"] == 100
    assert consume_args.kwargs["output_tokens"] == 50


⚠️ Potential issue | 🟠 Major

Add assertion to verify token_usage_history parameter is passed to consume_tokens.

The PR objective is to "store info about consumed tokens into token usage history," and the AI summary confirms that token_usage_history is now passed to the quota consumer. However, this test doesn't verify that the token_usage_history parameter is actually being passed to consume_tokens.

The test verifies model_id, provider_id, input_tokens, and output_tokens, but the core feature—passing token usage history—is not validated.

🔎 Suggested assertion to add

mock_consume.assert_called_once() consume_args = mock_consume.call_args user_id, _, _, _ = test_auth assert consume_args.args[2] == user_id +assert consume_args.kwargs.get("token_usage_history") is not None assert consume_args.kwargs["model_id"] == "test-model" assert consume_args.kwargs["provider_id"] == "test-provider" assert consume_args.kwargs["input_tokens"] == 100 assert consume_args.kwargs["output_tokens"] == 50

Alternatively, if token_usage_history is in positional args, verify it at the appropriate index (likely args[0] or args[1]):

mock_consume.assert_called_once() consume_args = mock_consume.call_args user_id, _, _, _ = test_auth +# Verify token_usage_history is passed (adjust index based on actual signature) +assert consume_args.args[0] is not None or consume_args.args[1] is not None assert consume_args.args[2] == user_id assert consume_args.kwargs["model_id"] == "test-model" assert consume_args.kwargs["provider_id"] == "test-provider" assert consume_args.kwargs["input_tokens"] == 100 assert consume_args.kwargs["output_tokens"] == 50

🤖 Prompt for AI Agents

In @tests/integration/endpoints/test_query_v2_integration.py around lines 1150-1157, Add an assertion that verifies the token_usage_history argument is passed to the quota consumer: after capturing consume_args from mock_consume.call_args, assert that consume_args.kwargs["token_usage_history"] equals the expected token_usage_history (or, if token_usage_history is passed positionally, assert the correct args[index] matches the expected token_usage_history). Use the same expected token usage object used in the test setup to compare; reference mock_consume / consume_tokens via consume_args to locate where to add the assertion.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/utils/quota.py (1)

17-26: Consider the dataclass refactoring from the previous review.

The pylint disable comment suppresses R0913/R0917 warnings but doesn't address the root cause. While the minimal fix (adding docstring for token_usage_history) has been applied, the refactoring to a TokenConsumptionContext dataclass from the previous review remains a recommended improvement for maintainability.

🧹 Nitpick comments (1)

src/utils/quota.py (1)

35-36: Enhance parameter descriptions for clarity.

The descriptions "Model identification" and "Provider identification" are terse. Consider adding examples or more context, e.g., "Identifier of the model being used (e.g., 'gpt-4')" and "Identifier of the provider supplying the model (e.g., 'openai')".

As per coding guidelines, docstrings should provide clear, descriptive explanations following Google Python conventions.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa14608 and 12d6728.

📒 Files selected for processing (2)

src/utils/quota.py
tests/integration/endpoints/test_query_v2_integration.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/integration/endpoints/test_query_v2_integration.py

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

src/utils/quota.py

🧬 Code graph analysis (1)

src/utils/quota.py (5)

src/models/responses.py (2)

QuotaExceededResponse (1671-1780)

model (1732-1745)

src/quota/quota_exceed_error.py (1)

QuotaExceedError (6-53)

src/quota/quota_limiter.py (1)

QuotaLimiter (49-187)

src/configuration.py (1)

token_usage_history (310-334)

src/quota/token_usage_history.py (1)

TokenUsageHistory (34-201)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / azure
GitHub Check: E2E: server mode / azure
GitHub Check: E2E: library mode / vertexai
GitHub Check: build-pr
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / vertexai

🔇 Additional comments (2)

src/utils/quota.py (2)

3-3: LGTM!

The new imports correctly support the token usage history feature. The Optional type annotation and TokenUsageHistory import follow the coding guidelines for absolute imports and modern type hints.

Also applies to: 12-12

41-49: Correct exception handling in token usage history recording.

TokenUsageHistory.consume_tokens() can raise database exceptions (e.g., psycopg2.OperationalError, sqlite3.Error) during cursor execution, which will prevent the subsequent quota limiter consumption loop from executing. Consider whether database failures should fail-fast or be logged and allowed to proceed with quota consumption. Note that the current behavior is symmetric with quota limiters, which also lack exception handling.

LCORE-1137: store info about consumed tokens into token usage history

ab47686

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

src/utils/quota.py Show resolved Hide resolved

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

Updated integration test, updated docstrings

12d6728

tisnik force-pushed the lcore-1137-store-info-into-token-usage-history branch from aa14608 to 12d6728 Compare January 5, 2026 10:02

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

tisnik merged commit 657dcde into lightspeed-core:main Jan 5, 2026
19 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-1137: store info about consumed tokens into token usage history #958

LCORE-1137: store info about consumed tokens into token usage history #958

Uh oh!

tisnik commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LCORE-1137: store info about consumed tokens into token usage history #958

LCORE-1137: store info about consumed tokens into token usage history #958

Uh oh!

Conversation

tisnik commented Jan 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tisnik commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 5, 2026 •

edited

Loading