-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Add anthropic_cache_messages model setting and automatically strip cache points over the limit
#3442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Wh1isper
wants to merge
15
commits into
pydantic:main
Choose a base branch
from
Wh1isper:feat-anthropic-cache-all
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+786
−29
Open
Add anthropic_cache_messages model setting and automatically strip cache points over the limit
#3442
Changes from 13 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
b4862a0
feat: add cache all and limit cache point in AnthropicModel
Wh1isper 9bf3f6e
fix ci issues
Wh1isper 0f0dd76
use BetaTextBlockParam and add nocover
Wh1isper 8bf1d94
use anthropic_cache_messages
Wh1isper 240f71c
fix ci
Wh1isper ae63b13
update docstring for _limit_cache_points
Wh1isper 6cceb59
fix doc example issue
Wh1isper 264ad1e
Merge branch 'main' into feat-anthropic-cache-all
Wh1isper 0aa82ad
fix doc check
Wh1isper 779bd40
update docs and add real case
Wh1isper bf0dc84
test via real api key
Wh1isper 7f317f0
fix docs ruff issues
Wh1isper a8f8eaf
Merge branch 'main' into feat-anthropic-cache-all
DouweM 63500d7
Update docs/models/anthropic.md
Wh1isper 03dfa19
use run_async in docs
Wh1isper File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,25 +80,45 @@ agent = Agent(model) | |
|
|
||
| ## Prompt Caching | ||
|
|
||
| Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides three ways to use prompt caching: | ||
| Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides four ways to use prompt caching: | ||
|
|
||
| 1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker in your user messages to cache everything before it | ||
| 2. **Cache System Instructions**: Set [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly | ||
| 3. **Cache Tool Definitions**: Set [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly | ||
| 4. **Cache Last Message (Convenience)**: Set [`AnthropicModelSettings.anthropic_cache_messages`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_messages] to `True` to automatically cache the last user message | ||
|
|
||
| You can combine all three strategies for maximum savings: | ||
| You can combine multiple strategies for maximum savings: | ||
|
|
||
| ```python {test="skip"} | ||
| from pydantic_ai import Agent, CachePoint, RunContext | ||
| from pydantic_ai.models.anthropic import AnthropicModelSettings | ||
|
|
||
| # Example 1: Use anthropic_cache_messages for automatic last message caching | ||
| agent = Agent( | ||
| 'anthropic:claude-sonnet-4-5', | ||
| system_prompt='You are a helpful assistant.', | ||
| model_settings=AnthropicModelSettings( | ||
| anthropic_cache_messages=True, # Automatically caches the last message | ||
| ), | ||
| ) | ||
|
|
||
| async def main(): | ||
| # The last message is automatically cached - no need for manual CachePoint | ||
| result1 = await agent.run('What is the capital of France?') | ||
|
|
||
| # Subsequent calls with similar conversation benefit from cache | ||
| result2 = await agent.run('What is the capital of Germany?') | ||
| print(f'Cache write: {result1.usage().cache_write_tokens}') | ||
| print(f'Cache read: {result2.usage().cache_read_tokens}') | ||
|
|
||
| # Example 2: Combine with other cache settings for comprehensive caching | ||
| agent = Agent( | ||
| 'anthropic:claude-sonnet-4-5', | ||
| system_prompt='Detailed instructions...', | ||
| model_settings=AnthropicModelSettings( | ||
| # Use True for default 5m TTL, or specify '5m' / '1h' directly | ||
| anthropic_cache_instructions=True, | ||
| anthropic_cache_tool_definitions='1h', # Longer cache for tool definitions | ||
| anthropic_cache_instructions=True, # Cache system instructions | ||
| anthropic_cache_tool_definitions='1h', # Cache tool definitions with 1h TTL | ||
| anthropic_cache_messages=True, # Also cache the last message | ||
| ), | ||
| ) | ||
|
|
||
|
|
@@ -107,22 +127,25 @@ def search_docs(ctx: RunContext, query: str) -> str: | |
| """Search documentation.""" | ||
| return f'Results for {query}' | ||
|
|
||
| async def main(): | ||
| # First call - writes to cache | ||
| result1 = await agent.run([ | ||
| async def main(): # noqa: F811 | ||
|
||
| # All three cache points are used: instructions, tools, and last message | ||
| result = await agent.run('Search for Python best practices') | ||
| print(result.output) | ||
|
|
||
| # Example 3: Fine-grained control with manual CachePoint markers | ||
| agent = Agent( | ||
| 'anthropic:claude-sonnet-4-5', | ||
| system_prompt='Instructions...', | ||
| ) | ||
|
|
||
| async def main(): # noqa: F811 | ||
| # Manually control cache points for specific content blocks | ||
| result = await agent.run([ | ||
| 'Long context from documentation...', | ||
| CachePoint(), | ||
| CachePoint(), # Cache everything up to this point | ||
| 'First question' | ||
| ]) | ||
|
|
||
| # Subsequent calls - read from cache (90% cost reduction) | ||
| result2 = await agent.run([ | ||
| 'Long context from documentation...', # Same content | ||
| CachePoint(), | ||
| 'Second question' | ||
| ]) | ||
| print(f'First: {result1.output}') | ||
| print(f'Second: {result2.output}') | ||
| print(result.output) | ||
| ``` | ||
|
|
||
| Access cache usage statistics via `result.usage()`: | ||
|
|
@@ -139,9 +162,98 @@ agent = Agent( | |
| ), | ||
| ) | ||
|
|
||
| async def main(): | ||
| async def main(): # noqa: F811 | ||
| result = await agent.run('Your question') | ||
| usage = result.usage() | ||
| print(f'Cache write tokens: {usage.cache_write_tokens}') | ||
| print(f'Cache read tokens: {usage.cache_read_tokens}') | ||
| ``` | ||
|
|
||
| ### Cache Point Limits | ||
|
|
||
| Anthropic enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors. | ||
|
|
||
| #### How Cache Points Are Allocated | ||
|
|
||
| Cache points can be placed in three locations: | ||
|
|
||
| 1. **System Prompt**: Via `anthropic_cache_instructions` setting (adds cache point to last system prompt block) | ||
| 2. **Tool Definitions**: Via `anthropic_cache_tool_definitions` setting (adds cache point to last tool definition) | ||
| 3. **Messages**: Via `CachePoint` markers or `anthropic_cache_messages` setting (adds cache points to message content) | ||
|
|
||
| Each setting uses **at most 1 cache point**, but you can combine them: | ||
|
|
||
| ```python {test="skip"} | ||
| from pydantic_ai import Agent, CachePoint | ||
| from pydantic_ai.models.anthropic import AnthropicModelSettings | ||
|
|
||
| # Example: Using all 3 cache point sources | ||
| agent = Agent( | ||
| 'anthropic:claude-sonnet-4-5', | ||
| system_prompt='Detailed instructions...', | ||
| model_settings=AnthropicModelSettings( | ||
| anthropic_cache_instructions=True, # 1 cache point | ||
| anthropic_cache_tool_definitions=True, # 1 cache point | ||
| anthropic_cache_messages=True, # 1 cache point | ||
| ), | ||
| ) | ||
|
|
||
| @agent.tool_plain | ||
| def my_tool() -> str: | ||
| return 'result' | ||
|
|
||
| async def main(): # noqa: F811 | ||
| # This uses 3 cache points (instructions + tools + last message) | ||
| # You can add 1 more CachePoint marker before hitting the limit | ||
| result = await agent.run([ | ||
| 'Context', CachePoint(), # 4th cache point - OK | ||
| 'Question' | ||
| ]) | ||
| print(result.output) | ||
| usage = result.usage() | ||
| print(f'Cache write tokens: {usage.cache_write_tokens}') | ||
| print(f'Cache read tokens: {usage.cache_read_tokens}') | ||
| ``` | ||
|
|
||
| #### Automatic Cache Point Limiting | ||
|
|
||
| When cache points from all sources (settings + `CachePoint` markers) exceed 4, Pydantic AI automatically removes excess cache points from **older message content** (keeping the most recent ones): | ||
|
|
||
| ```python {test="skip"} | ||
| from pydantic_ai import Agent, CachePoint | ||
| from pydantic_ai.models.anthropic import AnthropicModelSettings | ||
|
|
||
| agent = Agent( | ||
| 'anthropic:claude-sonnet-4-5', | ||
| system_prompt='Instructions...', | ||
| model_settings=AnthropicModelSettings( | ||
| anthropic_cache_instructions=True, # 1 cache point | ||
| anthropic_cache_tool_definitions=True, # 1 cache point | ||
| ), | ||
| ) | ||
|
|
||
| @agent.tool_plain | ||
| def search() -> str: | ||
| return 'data' | ||
|
|
||
| async def main(): # noqa: F811 | ||
| # Already using 2 cache points (instructions + tools) | ||
| # Can add 2 more CachePoint markers (4 total limit) | ||
| result = await agent.run([ | ||
| 'Context 1', CachePoint(), # Oldest - will be removed | ||
| 'Context 2', CachePoint(), # Will be kept (3rd point) | ||
| 'Context 3', CachePoint(), # Will be kept (4th point) | ||
| 'Question' | ||
| ]) | ||
| # Final cache points: instructions + tools + Context 2 + Context 3 = 4 | ||
| print(result.output) | ||
| usage = result.usage() | ||
| print(f'Cache write tokens: {usage.cache_write_tokens}') | ||
| print(f'Cache read tokens: {usage.cache_read_tokens}') | ||
| ``` | ||
|
|
||
| **Key Points**: | ||
| - System and tool cache points are **always preserved** | ||
Wh1isper marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - The cache point created by `anthropic_cache_messages` is **always preserved** (as it's the newest message cache point) | ||
| - Additional `CachePoint` markers in messages are removed from oldest to newest when the limit is exceeded | ||
| - This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.