Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 42 additions & 13 deletions docs/user-guide/concepts/model-providers/amazon-bedrock.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,37 +332,66 @@ For detailed information about supported models, minimum token requirements, and

#### System Prompt Caching

System prompt caching allows you to reuse a cached system prompt across multiple requests:
System prompt caching allows you to reuse a cached system prompt across multiple requests. Strands supports two approaches for system prompt caching:

**Provider-Agnostic Approach (Recommended)**

Use SystemContentBlock arrays to define cache points that work across all model providers:

```python
from strands import Agent
from strands.types.content import SystemContentBlock

# Define system content with cache points
system_content = [
SystemContentBlock(
text="You are a helpful assistant that provides concise answers. "
"This is a long system prompt with detailed instructions..."
"..." * 1600 # needs to be at least 1,024 tokens
),
SystemContentBlock(cachePoint={"type": "default"})
]

# Create an agent with SystemContentBlock array
agent = Agent(system_prompt=system_content)

# First request will cache the system prompt
response1 = agent("Tell me about Python")
print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")

# Second request will reuse the cached system prompt
response2 = agent("Tell me about JavaScript")
print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
```

**Legacy Bedrock-Specific Approach**

For backwards compatibility, you can still use the Bedrock-specific `cache_prompt` configuration:

```python
from strands import Agent
from strands.models import BedrockModel

# Using system prompt caching with BedrockModel
# Using legacy system prompt caching with BedrockModel
bedrock_model = BedrockModel(
model_id="anthropic.claude-sonnet-4-20250514-v1:0",
cache_prompt="default"
cache_prompt="default" # This approach is deprecated
)

# Create an agent with the model
agent = Agent(
model=bedrock_model,
system_prompt="You are a helpful assistant that provides concise answers. " +
"This is a long system prompt with detailed instructions... "
# Add enough text to reach the minimum token requirement for your model
)

# First request will cache the system prompt
response1 = agent("Tell me about Python")
print(f"Cache write tokens: {response1.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response1.metrics.accumulated_usage.get('cacheReadInputTokens')}")

# Second request will reuse the cached system prompt
response2 = agent("Tell me about JavaScript")
print(f"Cache write tokens: {response2.metrics.accumulated_usage.get('cacheWriteInputTokens')}")
print(f"Cache read tokens: {response2.metrics.accumulated_usage.get('cacheReadInputTokens')}")
response = agent("Tell me about Python")
```

> **Note**: The `cache_prompt` configuration is deprecated in favor of the provider-agnostic SystemContentBlock approach. The new approach enables caching across all model providers through a unified interface.

#### Tool Caching

Tool caching allows you to reuse a cached tool definition across multiple requests:
Expand Down