fix(openai-responses): surface cache read tokens in metadata chunk#2555
Open
gautham18113 wants to merge 1 commit into
Open
fix(openai-responses): surface cache read tokens in metadata chunk#2555gautham18113 wants to merge 1 commit into
gautham18113 wants to merge 1 commit into
Conversation
OpenAIResponsesModel._format_chunk dropped input_tokens_details when building the metadata usage dict, so cacheReadInputTokens was never set and cache hits were invisible to telemetry and cost tooling. Mirror the fix already present in OpenAIModel.format_chunk (added in strands-agents#2115 / strands-agents#2116): read input_tokens_details.cached_tokens and set cacheReadInputTokens when the field is present. Fixes strands-agents#2407
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2407.
OpenAIResponsesModel._format_chunkbuilt the metadata usage dict with only the three scalar token counts (inputTokens,outputTokens,totalTokens). Theinput_tokens_details.cached_tokensfield returned by the Responses API on cache hits was silently dropped, socacheReadInputTokenswas never populated and cache reads were invisible to telemetry and cost tooling.The fix mirrors what
OpenAIModel.format_chunkalready does for the Chat Completions path (added in #2115 / #2116): readinput_tokens_details.cached_tokensand setcacheReadInputTokenswhen it is present.Root cause (before):
After:
Note:
cacheWriteInputTokensis not set — the OpenAI Responses API does not return cache-write counts, consistent with the Chat Completions fix in #2115.Changes
src/strands/models/openai_responses.py: addUsageimport; buildusage_datadict first then conditionally populatecacheReadInputTokenstests/strands/models/test_openai_responses.py: add two_format_chunkparametrize cases (with and without cache tokens); addtest_stream_cache_tokens_propagatedandtest_stream_no_cache_tokens_when_absentend-to-end streaming tests; add explicitinput_tokens_details=Noneto existing usage mocks so they are not ambiguousTest plan
pytest tests/strands/models/test_openai_responses.py— 93 passed, 0 failedtest_stream_cache_tokens_propagatedverifiescacheReadInputTokensis set wheninput_tokens_details.cached_tokensis present on the completed responsetest_stream_no_cache_tokens_when_absentverifiescacheReadInputTokensis absent wheninput_tokens_detailsisNoneinput_tokens_details=Noneexplicitly on usage mocks to prevent false positives