fix(openai-responses): surface cache read tokens in metadata chunk by gautham18113 · Pull Request #2555 · strands-agents/harness-sdk

gautham18113 · 2026-06-02T06:07:58Z

Summary

OpenAIResponsesModel._format_chunk built the metadata usage dict with only the three scalar token counts (inputTokens, outputTokens, totalTokens). The input_tokens_details.cached_tokens field returned by the Responses API on cache hits was silently dropped, so cacheReadInputTokens was never populated and cache reads were invisible to telemetry and cost tooling.

The fix mirrors what OpenAIModel.format_chunk already does for the Chat Completions path (added in #2115 / #2116): read input_tokens_details.cached_tokens and set cacheReadInputTokens when it is present.

Root cause (before):

case "metadata":
    return {
        "metadata": {
            "usage": {
                "inputTokens": getattr(event["data"], "input_tokens", 0),
                "outputTokens": getattr(event["data"], "output_tokens", 0),
                "totalTokens": getattr(event["data"], "total_tokens", 0),
                # input_tokens_details.cached_tokens dropped here
            },
            ...
        }
    }

After:

case "metadata":
    usage_data: Usage = {
        "inputTokens": getattr(event["data"], "input_tokens", 0),
        "outputTokens": getattr(event["data"], "output_tokens", 0),
        "totalTokens": getattr(event["data"], "total_tokens", 0),
    }
    if token_details := getattr(event["data"], "input_tokens_details", None):
        if cached := getattr(token_details, "cached_tokens", None):
            usage_data["cacheReadInputTokens"] = cached
    return {"metadata": {"usage": usage_data, "metrics": {"latencyMs": 0}}}

Note: cacheWriteInputTokens is not set — the OpenAI Responses API does not return cache-write counts, consistent with the Chat Completions fix in #2115.

Changes

src/strands/models/openai_responses.py: add Usage import; build usage_data dict first then conditionally populate cacheReadInputTokens
tests/strands/models/test_openai_responses.py: add two _format_chunk parametrize cases (with and without cache tokens); add test_stream_cache_tokens_propagated and test_stream_no_cache_tokens_when_absent end-to-end streaming tests; add explicit input_tokens_details=None to existing usage mocks so they are not ambiguous

Test plan

pytest tests/strands/models/test_openai_responses.py — 93 passed, 0 failed
New test test_stream_cache_tokens_propagated verifies cacheReadInputTokens is set when input_tokens_details.cached_tokens is present on the completed response
New test test_stream_no_cache_tokens_when_absent verifies cacheReadInputTokens is absent when input_tokens_details is None
Existing tests updated to set input_tokens_details=None explicitly on usage mocks to prevent false positives

OpenAIResponsesModel._format_chunk dropped input_tokens_details when building the metadata usage dict, so cacheReadInputTokens was never set and cache hits were invisible to telemetry and cost tooling. Mirror the fix already present in OpenAIModel.format_chunk (added in strands-agents#2115 / strands-agents#2116): read input_tokens_details.cached_tokens and set cacheReadInputTokens when the field is present. Fixes strands-agents#2407

github-actions Bot added the size/m label Jun 2, 2026

gautham18113 requested a deployment to manual-approval June 2, 2026 06:08 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai-responses): surface cache read tokens in metadata chunk#2555

fix(openai-responses): surface cache read tokens in metadata chunk#2555
gautham18113 wants to merge 1 commit into
strands-agents:mainfrom
gautham18113:fix/openai-responses-cache-token-tracking

gautham18113 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gautham18113 commented Jun 2, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant