Skip to content

fix(openai-responses): surface cache read tokens in metadata chunk#2555

Open
gautham18113 wants to merge 1 commit into
strands-agents:mainfrom
gautham18113:fix/openai-responses-cache-token-tracking
Open

fix(openai-responses): surface cache read tokens in metadata chunk#2555
gautham18113 wants to merge 1 commit into
strands-agents:mainfrom
gautham18113:fix/openai-responses-cache-token-tracking

Conversation

@gautham18113
Copy link
Copy Markdown

Summary

Fixes #2407.

OpenAIResponsesModel._format_chunk built the metadata usage dict with only the three scalar token counts (inputTokens, outputTokens, totalTokens). The input_tokens_details.cached_tokens field returned by the Responses API on cache hits was silently dropped, so cacheReadInputTokens was never populated and cache reads were invisible to telemetry and cost tooling.

The fix mirrors what OpenAIModel.format_chunk already does for the Chat Completions path (added in #2115 / #2116): read input_tokens_details.cached_tokens and set cacheReadInputTokens when it is present.

Root cause (before):

case "metadata":
    return {
        "metadata": {
            "usage": {
                "inputTokens": getattr(event["data"], "input_tokens", 0),
                "outputTokens": getattr(event["data"], "output_tokens", 0),
                "totalTokens": getattr(event["data"], "total_tokens", 0),
                # input_tokens_details.cached_tokens dropped here
            },
            ...
        }
    }

After:

case "metadata":
    usage_data: Usage = {
        "inputTokens": getattr(event["data"], "input_tokens", 0),
        "outputTokens": getattr(event["data"], "output_tokens", 0),
        "totalTokens": getattr(event["data"], "total_tokens", 0),
    }
    if token_details := getattr(event["data"], "input_tokens_details", None):
        if cached := getattr(token_details, "cached_tokens", None):
            usage_data["cacheReadInputTokens"] = cached
    return {"metadata": {"usage": usage_data, "metrics": {"latencyMs": 0}}}

Note: cacheWriteInputTokens is not set — the OpenAI Responses API does not return cache-write counts, consistent with the Chat Completions fix in #2115.

Changes

  • src/strands/models/openai_responses.py: add Usage import; build usage_data dict first then conditionally populate cacheReadInputTokens
  • tests/strands/models/test_openai_responses.py: add two _format_chunk parametrize cases (with and without cache tokens); add test_stream_cache_tokens_propagated and test_stream_no_cache_tokens_when_absent end-to-end streaming tests; add explicit input_tokens_details=None to existing usage mocks so they are not ambiguous

Test plan

  • pytest tests/strands/models/test_openai_responses.py — 93 passed, 0 failed
  • New test test_stream_cache_tokens_propagated verifies cacheReadInputTokens is set when input_tokens_details.cached_tokens is present on the completed response
  • New test test_stream_no_cache_tokens_when_absent verifies cacheReadInputTokens is absent when input_tokens_details is None
  • Existing tests updated to set input_tokens_details=None explicitly on usage mocks to prevent false positives

OpenAIResponsesModel._format_chunk dropped input_tokens_details when
building the metadata usage dict, so cacheReadInputTokens was never set
and cache hits were invisible to telemetry and cost tooling.

Mirror the fix already present in OpenAIModel.format_chunk (added in
strands-agents#2115 / strands-agents#2116): read input_tokens_details.cached_tokens and set
cacheReadInputTokens when the field is present.

Fixes strands-agents#2407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenAIResponsesModel drops prompt cache tokens (cached_tokens not plumbed to cacheReadInputTokens)

1 participant