Skip to content

Commit d2684f4

Browse files
Wh1isperDouweM
andauthored
Add anthropic_cache_messages model setting and automatically strip cache points over the limit (#3442)
Co-authored-by: Douwe Maan <douwe@pydantic.dev>
1 parent 50489c5 commit d2684f4

File tree

4 files changed

+786
-29
lines changed

4 files changed

+786
-29
lines changed

docs/models/anthropic.md

Lines changed: 157 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -80,25 +80,53 @@ agent = Agent(model)
8080

8181
## Prompt Caching
8282

83-
Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides three ways to use prompt caching:
83+
Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching) to reduce costs by caching parts of your prompts. Pydantic AI provides four ways to use prompt caching:
8484

8585
1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker in your user messages to cache everything before it
8686
2. **Cache System Instructions**: Set [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
8787
3. **Cache Tool Definitions**: Set [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
88+
4. **Cache All Messages**: Set [`AnthropicModelSettings.anthropic_cache_messages`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_messages] to `True` to automatically cache all messages
8889

89-
You can combine all three strategies for maximum savings:
90+
### Example 1: Automatic Message Caching
91+
92+
Use `anthropic_cache_messages` to automatically cache all messages up to and including the newest user message:
9093

9194
```python {test="skip"}
92-
from pydantic_ai import Agent, CachePoint, RunContext
95+
from pydantic_ai import Agent
96+
from pydantic_ai.models.anthropic import AnthropicModelSettings
97+
98+
agent = Agent(
99+
'anthropic:claude-sonnet-4-5',
100+
system_prompt='You are a helpful assistant.',
101+
model_settings=AnthropicModelSettings(
102+
anthropic_cache_messages=True, # Automatically caches the last message
103+
),
104+
)
105+
106+
# The last message is automatically cached - no need for manual CachePoint
107+
result1 = agent.run_sync('What is the capital of France?')
108+
109+
# Subsequent calls with similar conversation benefit from cache
110+
result2 = agent.run_sync('What is the capital of Germany?')
111+
print(f'Cache write: {result1.usage().cache_write_tokens}')
112+
print(f'Cache read: {result2.usage().cache_read_tokens}')
113+
```
114+
115+
### Example 2: Comprehensive Caching Strategy
116+
117+
Combine multiple cache settings for maximum savings:
118+
119+
```python {test="skip"}
120+
from pydantic_ai import Agent, RunContext
93121
from pydantic_ai.models.anthropic import AnthropicModelSettings
94122

95123
agent = Agent(
96124
'anthropic:claude-sonnet-4-5',
97125
system_prompt='Detailed instructions...',
98126
model_settings=AnthropicModelSettings(
99-
# Use True for default 5m TTL, or specify '5m' / '1h' directly
100-
anthropic_cache_instructions=True,
101-
anthropic_cache_tool_definitions='1h', # Longer cache for tool definitions
127+
anthropic_cache_instructions=True, # Cache system instructions
128+
anthropic_cache_tool_definitions='1h', # Cache tool definitions with 1h TTL
129+
anthropic_cache_messages=True, # Also cache the last message
102130
),
103131
)
104132

@@ -107,24 +135,34 @@ def search_docs(ctx: RunContext, query: str) -> str:
107135
"""Search documentation."""
108136
return f'Results for {query}'
109137

110-
async def main():
111-
# First call - writes to cache
112-
result1 = await agent.run([
113-
'Long context from documentation...',
114-
CachePoint(),
115-
'First question'
116-
])
117-
118-
# Subsequent calls - read from cache (90% cost reduction)
119-
result2 = await agent.run([
120-
'Long context from documentation...', # Same content
121-
CachePoint(),
122-
'Second question'
123-
])
124-
print(f'First: {result1.output}')
125-
print(f'Second: {result2.output}')
138+
139+
result = agent.run_sync('Search for Python best practices')
140+
print(result.output)
126141
```
127142

143+
### Example 3: Fine-Grained Control with CachePoint
144+
145+
Use manual `CachePoint` markers to control cache locations precisely:
146+
147+
```python {test="skip"}
148+
from pydantic_ai import Agent, CachePoint
149+
150+
agent = Agent(
151+
'anthropic:claude-sonnet-4-5',
152+
system_prompt='Instructions...',
153+
)
154+
155+
# Manually control cache points for specific content blocks
156+
result = agent.run_sync([
157+
'Long context from documentation...',
158+
CachePoint(), # Cache everything up to this point
159+
'First question'
160+
])
161+
print(result.output)
162+
```
163+
164+
### Accessing Cache Usage Statistics
165+
128166
Access cache usage statistics via `result.usage()`:
129167

130168
```python {test="skip"}
@@ -139,9 +177,101 @@ agent = Agent(
139177
),
140178
)
141179

142-
async def main():
143-
result = await agent.run('Your question')
144-
usage = result.usage()
145-
print(f'Cache write tokens: {usage.cache_write_tokens}')
146-
print(f'Cache read tokens: {usage.cache_read_tokens}')
180+
result = agent.run_sync('Your question')
181+
usage = result.usage()
182+
print(f'Cache write tokens: {usage.cache_write_tokens}')
183+
print(f'Cache read tokens: {usage.cache_read_tokens}')
147184
```
185+
186+
### Cache Point Limits
187+
188+
Anthropic enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors.
189+
190+
#### How Cache Points Are Allocated
191+
192+
Cache points can be placed in three locations:
193+
194+
1. **System Prompt**: Via `anthropic_cache_instructions` setting (adds cache point to last system prompt block)
195+
2. **Tool Definitions**: Via `anthropic_cache_tool_definitions` setting (adds cache point to last tool definition)
196+
3. **Messages**: Via `CachePoint` markers or `anthropic_cache_messages` setting (adds cache points to message content)
197+
198+
Each setting uses **at most 1 cache point**, but you can combine them.
199+
200+
#### Example: Using All 3 Cache Point Sources
201+
202+
Define an agent with all cache settings enabled:
203+
204+
```python {test="skip"}
205+
from pydantic_ai import Agent, CachePoint
206+
from pydantic_ai.models.anthropic import AnthropicModelSettings
207+
208+
agent = Agent(
209+
'anthropic:claude-sonnet-4-5',
210+
system_prompt='Detailed instructions...',
211+
model_settings=AnthropicModelSettings(
212+
anthropic_cache_instructions=True, # 1 cache point
213+
anthropic_cache_tool_definitions=True, # 1 cache point
214+
anthropic_cache_messages=True, # 1 cache point
215+
),
216+
)
217+
218+
@agent.tool_plain
219+
def my_tool() -> str:
220+
return 'result'
221+
222+
223+
# This uses 3 cache points (instructions + tools + last message)
224+
# You can add 1 more CachePoint marker before hitting the limit
225+
result = agent.run_sync([
226+
'Context', CachePoint(), # 4th cache point - OK
227+
'Question'
228+
])
229+
print(result.output)
230+
usage = result.usage()
231+
print(f'Cache write tokens: {usage.cache_write_tokens}')
232+
print(f'Cache read tokens: {usage.cache_read_tokens}')
233+
```
234+
235+
#### Automatic Cache Point Limiting
236+
237+
When cache points from all sources (settings + `CachePoint` markers) exceed 4, Pydantic AI automatically removes excess cache points from **older message content** (keeping the most recent ones).
238+
239+
Define an agent with 2 cache points from settings:
240+
241+
```python {test="skip"}
242+
from pydantic_ai import Agent, CachePoint
243+
from pydantic_ai.models.anthropic import AnthropicModelSettings
244+
245+
agent = Agent(
246+
'anthropic:claude-sonnet-4-5',
247+
system_prompt='Instructions...',
248+
model_settings=AnthropicModelSettings(
249+
anthropic_cache_instructions=True, # 1 cache point
250+
anthropic_cache_tool_definitions=True, # 1 cache point
251+
),
252+
)
253+
254+
@agent.tool_plain
255+
def search() -> str:
256+
return 'data'
257+
258+
# Already using 2 cache points (instructions + tools)
259+
# Can add 2 more CachePoint markers (4 total limit)
260+
result = agent.run_sync([
261+
'Context 1', CachePoint(), # Oldest - will be removed
262+
'Context 2', CachePoint(), # Will be kept (3rd point)
263+
'Context 3', CachePoint(), # Will be kept (4th point)
264+
'Question'
265+
])
266+
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
267+
print(result.output)
268+
usage = result.usage()
269+
print(f'Cache write tokens: {usage.cache_write_tokens}')
270+
print(f'Cache read tokens: {usage.cache_read_tokens}')
271+
```
272+
273+
**Key Points**:
274+
- System and tool cache points are **always preserved**
275+
- The cache point created by `anthropic_cache_messages` is **always preserved** (as it's the newest message cache point)
276+
- Additional `CachePoint` markers in messages are removed from oldest to newest when the limit is exceeded
277+
- This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching

pydantic_ai_slim/pydantic_ai/models/anthropic.py

Lines changed: 103 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,19 @@ class AnthropicModelSettings(ModelSettings, total=False):
183183
See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
184184
"""
185185

186+
anthropic_cache_messages: bool | Literal['5m', '1h']
187+
"""Convenience setting to enable caching for the last user message.
188+
189+
When enabled, this automatically adds a cache point to the last content block
190+
in the final user message, which is useful for caching conversation history
191+
or context in multi-turn conversations.
192+
If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
193+
194+
Note: Uses 1 of Anthropic's 4 available cache points per request. Any additional CachePoint
195+
markers in messages will be automatically limited to respect the 4-cache-point maximum.
196+
See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
197+
"""
198+
186199

187200
@dataclass(init=False)
188201
class AnthropicModel(Model):
@@ -347,7 +360,7 @@ async def _messages_create(
347360
tool_choice = self._infer_tool_choice(tools, model_settings, model_request_parameters)
348361

349362
system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)
350-
363+
self._limit_cache_points(system_prompt, anthropic_messages, tools)
351364
try:
352365
extra_headers = self._map_extra_headers(beta_features, model_settings)
353366

@@ -392,7 +405,7 @@ async def _messages_count_tokens(
392405
tool_choice = self._infer_tool_choice(tools, model_settings, model_request_parameters)
393406

394407
system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)
395-
408+
self._limit_cache_points(system_prompt, anthropic_messages, tools)
396409
try:
397410
extra_headers = self._map_extra_headers(beta_features, model_settings)
398411

@@ -803,6 +816,25 @@ async def _map_message( # noqa: C901
803816
system_prompt_parts.insert(0, instructions)
804817
system_prompt = '\n\n'.join(system_prompt_parts)
805818

819+
# Add cache_control to the last message content if anthropic_cache_messages is enabled
820+
if anthropic_messages and (cache_messages := model_settings.get('anthropic_cache_messages')):
821+
ttl: Literal['5m', '1h'] = '5m' if cache_messages is True else cache_messages
822+
m = anthropic_messages[-1]
823+
content = m['content']
824+
if isinstance(content, str):
825+
# Convert string content to list format with cache_control
826+
m['content'] = [ # pragma: no cover
827+
BetaTextBlockParam(
828+
text=content,
829+
type='text',
830+
cache_control=BetaCacheControlEphemeralParam(type='ephemeral', ttl=ttl),
831+
)
832+
]
833+
else:
834+
# Add cache_control to the last content block
835+
content = cast(list[BetaContentBlockParam], content)
836+
self._add_cache_control_to_last_param(content, ttl)
837+
806838
# If anthropic_cache_instructions is enabled, return system prompt as a list with cache_control
807839
if system_prompt and (cache_instructions := model_settings.get('anthropic_cache_instructions')):
808840
# If True, use '5m'; otherwise use the specified ttl value
@@ -818,6 +850,75 @@ async def _map_message( # noqa: C901
818850

819851
return system_prompt, anthropic_messages
820852

853+
@staticmethod
854+
def _limit_cache_points(
855+
system_prompt: str | list[BetaTextBlockParam],
856+
anthropic_messages: list[BetaMessageParam],
857+
tools: list[BetaToolUnionParam],
858+
) -> None:
859+
"""Limit the number of cache points in the request to Anthropic's maximum.
860+
861+
Anthropic enforces a maximum of 4 cache points per request. This method ensures
862+
compliance by counting existing cache points and removing excess ones from messages.
863+
864+
Strategy:
865+
1. Count cache points in system_prompt (can be multiple if list of blocks)
866+
2. Count cache points in tools (can be in any position, not just last)
867+
3. Raise UserError if system + tools already exceed MAX_CACHE_POINTS
868+
4. Calculate remaining budget for message cache points
869+
5. Traverse messages from newest to oldest, keeping the most recent cache points
870+
within the remaining budget
871+
6. Remove excess cache points from older messages to stay within limit
872+
873+
Cache point priority (always preserved):
874+
- System prompt cache points
875+
- Tool definition cache points
876+
- Message cache points (newest first, oldest removed if needed)
877+
878+
Raises:
879+
UserError: If system_prompt and tools combined already exceed MAX_CACHE_POINTS (4).
880+
This indicates a configuration error that cannot be auto-fixed.
881+
"""
882+
MAX_CACHE_POINTS = 4
883+
884+
# Count existing cache points in system prompt
885+
used_cache_points = (
886+
sum(1 for block in system_prompt if 'cache_control' in cast(dict[str, Any], block))
887+
if isinstance(system_prompt, list)
888+
else 0
889+
)
890+
891+
# Count existing cache points in tools (any tool may have cache_control)
892+
# Note: cache_control can be in the middle of tools list if builtin tools are added after
893+
for tool in tools:
894+
if 'cache_control' in tool:
895+
used_cache_points += 1
896+
897+
# Calculate remaining cache points budget for messages
898+
remaining_budget = MAX_CACHE_POINTS - used_cache_points
899+
if remaining_budget < 0: # pragma: no cover
900+
raise UserError(
901+
f'Too many cache points for Anthropic request. '
902+
f'System prompt and tool definitions already use {used_cache_points} cache points, '
903+
f'which exceeds the maximum of {MAX_CACHE_POINTS}.'
904+
)
905+
# Remove excess cache points from messages (newest to oldest)
906+
for message in reversed(anthropic_messages):
907+
content = message['content']
908+
if isinstance(content, str): # pragma: no cover
909+
continue
910+
911+
# Process content blocks in reverse order (newest first)
912+
for block in reversed(cast(list[BetaContentBlockParam], content)):
913+
block_dict = cast(dict[str, Any], block)
914+
915+
if 'cache_control' in block_dict:
916+
if remaining_budget > 0:
917+
remaining_budget -= 1
918+
else:
919+
# Exceeded limit, remove this cache point
920+
del block_dict['cache_control']
921+
821922
@staticmethod
822923
def _add_cache_control_to_last_param(params: list[BetaContentBlockParam], ttl: Literal['5m', '1h'] = '5m') -> None:
823924
"""Add cache control to the last content block param.

0 commit comments

Comments
 (0)