Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Dec 6, 2025

Summary

  • Exclude mini variants from prompt_cache_retention support in model features
  • Piggyback existing tests to validate mini variants are not covered

Context
Evaluation surfaced failures related to passing prompt_cache_retention to mini variants (e.g. gpt-5-mini / gpt-5.1-codex-mini) causing litellm BadRequest errors. The intended behavior is to avoid sending prompt_cache_retention for these mini models.

Changes

  • openhands-sdk/openhands/sdk/llm/utils/model_features.py
    • supports_prompt_cache_retention now requires model to match GPT-5/GPT-4.1 patterns AND not contain "mini".
  • tests/sdk/llm/test_responses_parsing_and_kwargs.py
    • Updated test_chat_and_responses_options_prompt_cache_retention_gpt_5_plus_and_non_gpt to assert no prompt_cache_retention for mini variants.
  • tests/sdk/llm/test_model_features.py
    • Updated test_prompt_cache_retention_support expectations for mini variants to False.

Validation

  • Ran pre-commit on changed files: all hooks passed.
  • Ran targeted tests for the modified areas: passing.
    • tests/sdk/llm/test_responses_parsing_and_kwargs.py::test_chat_and_responses_options_prompt_cache_retention_gpt_5_plus_and_non_gpt
    • tests/sdk/llm/test_model_features.py::test_prompt_cache_retention_support

Notes

  • Full test suite has an unrelated import error in tests/github_workflows/test_resolve_model_config.py due to missing helper module; unrelated to this change.

Co-authored-by: openhands openhands@all-hands.dev

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bfaf5c5-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bfaf5c5-python \
  ghcr.io/openhands/agent-server:bfaf5c5-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bfaf5c5-golang-amd64
ghcr.io/openhands/agent-server:bfaf5c5-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bfaf5c5-golang-arm64
ghcr.io/openhands/agent-server:bfaf5c5-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bfaf5c5-java-amd64
ghcr.io/openhands/agent-server:bfaf5c5-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bfaf5c5-java-arm64
ghcr.io/openhands/agent-server:bfaf5c5-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bfaf5c5-python-amd64
ghcr.io/openhands/agent-server:bfaf5c5-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:bfaf5c5-python-arm64
ghcr.io/openhands/agent-server:bfaf5c5-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:bfaf5c5-golang
ghcr.io/openhands/agent-server:bfaf5c5-java
ghcr.io/openhands/agent-server:bfaf5c5-python

About Multi-Architecture Support

  • Each variant tag (e.g., bfaf5c5-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., bfaf5c5-python-amd64) are also available if needed

…djust tests

- Update model_features.get_features to skip mini variants
- Update tests to piggyback existing coverage and validate mini excluded

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm/utils
   model_features.py46295%32, 184
TOTAL12610562355% 

enyst and others added 2 commits December 6, 2025 15:04
…atterns + mini exclusions

- Patterns: ['gpt-5', 'gpt-4.1'] with inline doc reference of actual listed models
- Exclude all '*mini' in feature gate (covers gpt-5-mini, gpt-5.1-mini, codex-mini)
- Extend tests to include explicit gpt-5.1-mini exclusion

Co-authored-by: openhands <openhands@all-hands.dev>
… docs; keep other minis excluded

- Update feature gate to carve out 'gpt-5.1-codex-mini'
- Update tests to expect retention for 5.1-codex-mini

Co-authored-by: openhands <openhands@all-hands.dev>
enyst and others added 3 commits December 6, 2025 16:13
…ix E501

- Provide find_models_by_id for tests expecting resolve_model_configs
- Wrap long error message to satisfy Ruff E501

Co-authored-by: openhands <openhands@all-hands.dev>
- Test failure was local-only; CI doesn’t run tests/github_workflows in tests.yml
- run-eval workflow uses resolve_model_config.py (singular) directly

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator Author

enyst commented Dec 6, 2025

PASS (200) for all documented positives:
openai/gpt-5.1
openai/gpt-5.1-codex
openai/gpt-5.1-codex-mini
openai/gpt-5.1-chat-latest
openai/gpt-5
openai/gpt-5-codex
openai/gpt-4.1

Negative controls:
openai/gpt-5-mini: UNEXPECTED-PASS (200), but response shows prompt_cache_retention: null and status=incomplete with incomplete_details.reason = max_output_tokens. So it didn’t error on the parameter, but the model didn’t produce output, and the retention was not applied.
openai/gpt-5.1-mini: 400 model_not_found (as before)

- Define llm_51_codex_mini before use

Co-authored-by: openhands <openhands@all-hands.dev>
@OpenHands OpenHands deleted a comment from openhands-ai bot Dec 6, 2025
@enyst enyst marked this pull request as ready for review December 6, 2025 16:30
@enyst enyst requested a review from xingyaoww December 6, 2025 16:30
…t_cache_retention

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst
Copy link
Collaborator Author

enyst commented Dec 6, 2025

@xingyaoww Re: the failure in the agent behavior PR, looks like the issue is that some mini models don't support extended cache, while one does (gpt-5.1-codex-mini).

I verified a list of models, all those above that should support it, and tried a few that don't; excluded the one in integration tests too.

Behavior tests

@enyst enyst changed the title Exclude '*mini' models from prompt_cache_retention support and adjust tests Exclude '*mini' models from prompt_cache_retention Dec 6, 2025
@enyst enyst marked this pull request as draft December 9, 2025 06:05
…che_retention; rename *_PATTERNS -> *_MODELS

- Introduce apply_ordered_model_rules to handle lists like ["gpt-5", "!gpt-5-mini", "gpt-5.1-codex-mini"] (last wins)
- Rewrite PROMPT_CACHE_RETENTION to PROMPT_CACHE_RETENTION_MODELS using ordered rules
- Rename all *_PATTERNS constants to *_MODELS across utils and exceptions for clarity
- Keep substring semantics and backward compatibility for other feature lists

Co-authored-by: openhands <openhands@all-hands.dev>
…RETENTION_MODELS for simplicity

- Keep explicit allow for gpt-5.1-codex-mini

Co-authored-by: openhands <openhands@all-hands.dev>
… models)

Use proper newlines in commit messages

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst marked this pull request as ready for review December 10, 2025 01:24
@enyst enyst requested a review from xingyaoww December 10, 2025 01:25
@enyst
Copy link
Collaborator Author

enyst commented Dec 17, 2025

@OpenHands Add the new GPT-5.2 to this PR. Verify that it is on OpenAI list with extended prompt cache support, and add it to the comment and to the relevant tests.

@openhands-ai
Copy link

openhands-ai bot commented Dec 17, 2025

I'm on it! enyst can track my progress at all-hands.dev

- Verified against OpenAI prompt caching docs: gpt-5.2 is listed under models with extended (24h) retention
- No code changes needed beyond docs comment since our ordered rules already match gpt-5.* (excluding mini)
- Extend tests to assert retention for gpt-5.2, gpt-5.2-chat-latest, and gpt-5.2-pro; keep mini exclusions intact

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Dec 17, 2025

Summary of changes for adding GPT-5.2

What I did

  • Verified support on OpenAI docs

    • Confirmed that GPT-5.2 is listed under models with extended prompt cache retention (24h) in the OpenAI Prompt Caching guide.
  • Code/tests updated

    • Model features:
      • No logic changes required. Our ordered rules already allow gpt-5.* (excluding mini by default, with explicit allow for gpt-5.1-codex-mini). I updated the inline comment to explicitly include gpt-5.2 in the documented list.
    • Tests:
      • tests/sdk/llm/test_model_features.py
        • Added asserts for extended retention support:
          • "gpt-5.2" -> True
          • "openai/gpt-5.2-chat-latest" -> True
          • "openai/gpt-5.2-pro" -> True
        • Left mini exclusions intact (still False), preserving the special allow for "gpt-5.1-codex-mini".
      • tests/sdk/llm/test_responses_parsing_and_kwargs.py
        • Extended test_chat_and_responses_options_prompt_cache_retention_gpt_5_plus_and_non_gpt:
          • Asserts prompt_cache_retention="24h" is included for both Chat and Responses paths for:
            • "openai/gpt-5.2"
            • "openai/gpt-5.2-chat-latest"
          • Keeps mini variants excluded.

Validation

  • Ran pre-commit on changed files: passed
  • Ran targeted tests: passed
    • tests/sdk/llm/test_model_features.py::test_prompt_cache_retention_support
    • tests/sdk/llm/test_responses_parsing_and_kwargs.py::test_chat_and_responses_options_prompt_cache_retention_gpt_5_plus_and_non_gpt

Commit

  • Branch: fix/exclude-mini-from-prompt-cache-retention

  • Message:
    LLM: add GPT-5.2 to extended prompt_cache_retention support and tests

    • Verified against OpenAI prompt caching docs: gpt-5.2 is listed under models with extended (24h) retention
    • No code changes needed beyond docs comment since our ordered rules already match gpt-5.* (excluding mini)
    • Extend tests to assert retention for gpt-5.2, gpt-5.2-chat-latest, and gpt-5.2-pro; keep mini exclusions intact

    Co-authored-by: openhands openhands@all-hands.dev

Checklist

  • Verified GPT-5.2 extended retention in OpenAI docs
  • Updated tests to cover GPT-5.2 and variants
  • Ensured “mini” exclusions remain, except for gpt-5.1-codex-mini
  • Ran linters/type checks and tests
  • Committed and pushed to the same PR branch

Note about the “comment”

  • Per your request to “add it to the comment,” I didn’t post to the PR directly (following the project’s guidance to avoid PR comments in this flow). If you want, I can provide a short updated “PASS” block you can paste into the PR discussion to include GPT-5.2, GPT-5.2-chat-latest, and GPT-5.2-pro as positives.

View full conversation

@enyst enyst merged commit 7db5ce2 into main Dec 17, 2025
21 checks passed
@enyst enyst deleted the fix/exclude-mini-from-prompt-cache-retention branch December 17, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants