Skip to content

Conversation

@tisnik
Copy link
Contributor

@tisnik tisnik commented Jan 5, 2026

Description

LCORE-1137: store info about consumed tokens into token usage history

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

  • Assisted-by: N/A
  • Generated by: N/A

Related Tickets & Documents

  • Related Issue #LCORE-1137

Summary by CodeRabbit

  • Improvements

    • Enhanced token quota tracking to record richer usage history and pass explicit model and provider context during consumption for more accurate accounting and reporting.
  • Tests

    • Updated integration tests to validate the new token usage history and model/provider parameters are passed during quota consumption.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

Walkthrough

Endpoint handlers now forward token usage history plus explicit model and provider identifiers into quota consumption. The quota utility signature was extended to accept token_usage_history, model_id, and provider_id, and it records usage history when provided.

Changes

Cohort / File(s) Summary
Endpoint Call Sites
src/app/endpoints/query.py, src/app/endpoints/streaming_query_v2.py
Calls to consume_tokens updated to pass token_usage_history, model_id, and provider_id.
Quota Utility Implementation
src/utils/quota.py
consume_tokens signature extended to include token_usage_history: Optional[TokenUsageHistory], model_id: str, and provider_id: str. Adds imports and optional recording via token_usage_history.consume_tokens(...) before applying quota limiters; docstring updated.
Tests
tests/integration/endpoints/test_query_v2_integration.py
Assertions updated to match shifted argument positions and to verify model_id="test-model" and provider_id="test-provider" are passed to consume_tokens.

Sequence Diagram(s)

(omitted — changes are limited to parameter propagation and an additional optional call; control flow not substantially changed)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • manstis
  • umago

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: extending the consume_tokens function to record token usage information into the token usage history by passing token_usage_history, model_id, and provider_id parameters across multiple endpoint files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @src/utils/quota.py:
- Around line 17-25: The consume_tokens function currently has 7 parameters and
the docstring lacks documentation for token_usage_history; minimally, add a
param entry for token_usage_history to the consume_tokens docstring describing
its type and purpose (Optional[TokenUsageHistory], used to record or check past
token consumption), and leave a TODO note to refactor the signature later; for a
longer-term fix to satisfy pylint R0913/R0917, refactor to accept a dataclass
(e.g., TokenConsumptionContext with user_id, input_tokens, output_tokens,
model_id, provider_id, and optional token_usage_history) and update call sites
in query.py and streaming_query_v2.py accordingly.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 049a7ed and ab47686.

📒 Files selected for processing (3)
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/utils/quota.py
🧰 Additional context used
📓 Path-based instructions (3)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query.py
  • src/utils/quota.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query.py
🧬 Code graph analysis (3)
src/app/endpoints/streaming_query_v2.py (1)
src/configuration.py (2)
  • configuration (94-105)
  • token_usage_history (310-334)
src/app/endpoints/query.py (1)
src/configuration.py (2)
  • configuration (94-105)
  • token_usage_history (310-334)
src/utils/quota.py (3)
src/quota/quota_exceed_error.py (1)
  • QuotaExceedError (6-53)
src/configuration.py (1)
  • token_usage_history (310-334)
src/quota/token_usage_history.py (1)
  • TokenUsageHistory (34-201)
🪛 GitHub Actions: Python linter
src/utils/quota.py

[error] 17-17: Pylint: Too many arguments (7/5) (R0913) in function call. Step: uv run pylint src tests


[error] 17-17: Pylint: Too many positional arguments (7/5) (R0917) in function call. Step: uv run pylint src tests

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / azure
  • GitHub Check: E2E: server mode / vertexai
  • GitHub Check: E2E: library mode / azure
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / vertexai
🔇 Additional comments (3)
src/utils/quota.py (1)

39-47: LGTM!

The token usage history recording logic is correctly implemented. The conditional check ensures recording only occurs when token_usage_history is provided, and all required parameters are properly passed to the consume_tokens method.

src/app/endpoints/streaming_query_v2.py (1)

276-284: LGTM!

The consume_tokens call has been correctly updated to pass the new required parameters (token_usage_history, model_id, and provider_id). The values are appropriately sourced from configuration.token_usage_history and the context object.

src/app/endpoints/query.py (1)

388-396: LGTM!

The consume_tokens call has been correctly updated to pass the new required parameters (token_usage_history, model_id, and provider_id). The values are appropriately sourced from configuration.token_usage_history and the local model_id and provider_id variables.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @tests/integration/endpoints/test_query_v2_integration.py:
- Around line 1150-1157: Add an assertion that verifies the token_usage_history
argument is passed to the quota consumer: after capturing consume_args from
mock_consume.call_args, assert that consume_args.kwargs["token_usage_history"]
equals the expected token_usage_history (or, if token_usage_history is passed
positionally, assert the correct args[index] matches the expected
token_usage_history). Use the same expected token usage object used in the test
setup to compare; reference mock_consume / consume_tokens via consume_args to
locate where to add the assertion.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab47686 and aa14608.

📒 Files selected for processing (2)
  • src/utils/quota.py
  • tests/integration/endpoints/test_query_v2_integration.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/quota.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/integration/endpoints/test_query_v2_integration.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/integration/endpoints/test_query_v2_integration.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / vertexai
  • GitHub Check: E2E: server mode / azure
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / vertexai
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / azure

Comment on lines 1150 to 1157
mock_consume.assert_called_once()
consume_args = mock_consume.call_args
user_id, _, _, _ = test_auth
assert consume_args.args[1] == user_id
assert consume_args.args[2] == user_id
assert consume_args.kwargs["model_id"] == "test-model"
assert consume_args.kwargs["provider_id"] == "test-provider"
assert consume_args.kwargs["input_tokens"] == 100
assert consume_args.kwargs["output_tokens"] == 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add assertion to verify token_usage_history parameter is passed to consume_tokens.

The PR objective is to "store info about consumed tokens into token usage history," and the AI summary confirms that token_usage_history is now passed to the quota consumer. However, this test doesn't verify that the token_usage_history parameter is actually being passed to consume_tokens.

The test verifies model_id, provider_id, input_tokens, and output_tokens, but the core feature—passing token usage history—is not validated.

🔎 Suggested assertion to add
 mock_consume.assert_called_once()
 consume_args = mock_consume.call_args
 user_id, _, _, _ = test_auth
 assert consume_args.args[2] == user_id
+assert consume_args.kwargs.get("token_usage_history") is not None
 assert consume_args.kwargs["model_id"] == "test-model"
 assert consume_args.kwargs["provider_id"] == "test-provider"
 assert consume_args.kwargs["input_tokens"] == 100
 assert consume_args.kwargs["output_tokens"] == 50

Alternatively, if token_usage_history is in positional args, verify it at the appropriate index (likely args[0] or args[1]):

 mock_consume.assert_called_once()
 consume_args = mock_consume.call_args
 user_id, _, _, _ = test_auth
+# Verify token_usage_history is passed (adjust index based on actual signature)
+assert consume_args.args[0] is not None or consume_args.args[1] is not None
 assert consume_args.args[2] == user_id
 assert consume_args.kwargs["model_id"] == "test-model"
 assert consume_args.kwargs["provider_id"] == "test-provider"
 assert consume_args.kwargs["input_tokens"] == 100
 assert consume_args.kwargs["output_tokens"] == 50
🤖 Prompt for AI Agents
In @tests/integration/endpoints/test_query_v2_integration.py around lines
1150-1157, Add an assertion that verifies the token_usage_history argument is
passed to the quota consumer: after capturing consume_args from
mock_consume.call_args, assert that consume_args.kwargs["token_usage_history"]
equals the expected token_usage_history (or, if token_usage_history is passed
positionally, assert the correct args[index] matches the expected
token_usage_history). Use the same expected token usage object used in the test
setup to compare; reference mock_consume / consume_tokens via consume_args to
locate where to add the assertion.

@tisnik tisnik force-pushed the lcore-1137-store-info-into-token-usage-history branch from aa14608 to 12d6728 Compare January 5, 2026 10:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/utils/quota.py (1)

17-26: Consider the dataclass refactoring from the previous review.

The pylint disable comment suppresses R0913/R0917 warnings but doesn't address the root cause. While the minimal fix (adding docstring for token_usage_history) has been applied, the refactoring to a TokenConsumptionContext dataclass from the previous review remains a recommended improvement for maintainability.

🧹 Nitpick comments (1)
src/utils/quota.py (1)

35-36: Enhance parameter descriptions for clarity.

The descriptions "Model identification" and "Provider identification" are terse. Consider adding examples or more context, e.g., "Identifier of the model being used (e.g., 'gpt-4')" and "Identifier of the provider supplying the model (e.g., 'openai')".

As per coding guidelines, docstrings should provide clear, descriptive explanations following Google Python conventions.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa14608 and 12d6728.

📒 Files selected for processing (2)
  • src/utils/quota.py
  • tests/integration/endpoints/test_query_v2_integration.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/integration/endpoints/test_query_v2_integration.py
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/utils/quota.py
🧬 Code graph analysis (1)
src/utils/quota.py (5)
src/models/responses.py (2)
  • QuotaExceededResponse (1671-1780)
  • model (1732-1745)
src/quota/quota_exceed_error.py (1)
  • QuotaExceedError (6-53)
src/quota/quota_limiter.py (1)
  • QuotaLimiter (49-187)
src/configuration.py (1)
  • token_usage_history (310-334)
src/quota/token_usage_history.py (1)
  • TokenUsageHistory (34-201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / azure
  • GitHub Check: E2E: server mode / azure
  • GitHub Check: E2E: library mode / vertexai
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / vertexai
🔇 Additional comments (2)
src/utils/quota.py (2)

3-3: LGTM!

The new imports correctly support the token usage history feature. The Optional type annotation and TokenUsageHistory import follow the coding guidelines for absolute imports and modern type hints.

Also applies to: 12-12


41-49: Correct exception handling in token usage history recording.

TokenUsageHistory.consume_tokens() can raise database exceptions (e.g., psycopg2.OperationalError, sqlite3.Error) during cursor execution, which will prevent the subsequent quota limiter consumption loop from executing. Consider whether database failures should fail-fast or be logged and allowed to proceed with quota consumption. Note that the current behavior is symmetric with quota limiters, which also lack exception handling.

@tisnik tisnik merged commit 657dcde into lightspeed-core:main Jan 5, 2026
19 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant