fix: joined tool_results to prevent context lost by poshinchen · Pull Request #239 · strands-agents/evals

poshinchen · 2026-06-02T17:57:01Z

Description

Fixes a bug where multi-block toolResult.content was silently dropped to the first block, causing FaithfulnessEvaluator (and other evaluators) to flag values from content[1+] as hallucinated.

Root cause: Three sites in StrandsInMemorySessionMapper and two in CloudWatchSessionMapper read content[0] (or response[0]) only:

StrandsInMemorySessionMapper._process_tool_results (legacy convention)
StrandsInMemorySessionMapper._convert_inference_messages (latest convention — had a ## To-do comment)
StrandsInMemorySessionMapper._convert_tool_execution_span (latest convention)
CloudWatchSessionMapper._extract_tool_results / _process_tool_results (via _extract_tool_result_text)

Fix: Two shared helpers in mappers/utils.py, used by both mappers:

serialize_tool_result_block(block) — serializes a single Bedrock-style content block. Handles text, json (via json.dumps), and image/document/video (placeholder marker so the judge knows non-text data exists).
join_tool_result_content(content) — joins all blocks with \n, filtering empties.

This addresses both the reported text-block bug and the latent same-shape bug for json blocks (common via Strands' @tool decorator returning dicts).

Related Issues

Fixes #235

Ref: aws/agentcore-cli#1393

Documentation PR

N/A

Type of Change

Bug fix

Testing

4 new regression tests in test_strands_in_memory_mapper.py covering: legacy multi-text, legacy text+json, latest-convention multi-text inference, latest-convention multi-text tool execution span.
1 new regression test in test_cloudwatch_session_mapper.py for multi-block tool result reaching the ToolExecutionSpan.
All 172 existing mapper tests pass unchanged.
I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2026-06-02T18:00:42Z

+    return ""
+
+
+def join_tool_result_content(content: Any) -> str:


Issue: serialize_tool_result_block and join_tool_result_content have multiple code paths (text, json, image/document/video, non-dict, None, str, list) but no direct unit tests in test_utils.py. The existing regression tests only exercise the text and json paths through the mapper integration layer.

Suggestion: Add focused unit tests in tests/strands_evals/mappers/test_utils.py for these functions covering edge cases:

serialize_tool_result_block with: None, non-dict, {"text": ""}, {"json": {...}}, {"json": <unserializable>}, {"image": ...}, {"document": ...}, {"video": ...}, empty dict

join_tool_result_content with: None, [], "", a plain string, a list with mixed block types, a non-list/non-str value

github-actions · 2026-06-02T18:00:43Z

Assessment: Approve

Clean, well-scoped bug fix that correctly consolidates duplicated content[0]-only logic into shared helpers. The approach is DRY, handles edge cases thoughtfully, and the regression tests prove the fix works end-to-end.

Review Details

Testing: The new helpers have good indirect coverage through mapper tests, but would benefit from direct unit tests in test_utils.py to lock down each branch (image/document/video placeholders, json serialization fallback, non-dict inputs, etc.)
Code quality: Implementation is clean — good use of early returns, proper fallback handling for json.dumps, and sensible placeholder markers for binary content types.

Nice consolidation that eliminates the ## To-do comment and fixes both text and json block handling in one pass.

poshinchen · 2026-06-02T20:01:33Z

closing it since #240 is preferred.

fix: joined tool_results to prevent context lost

0e7383f

poshinchen temporarily deployed to auto-approve June 2, 2026 17:57 — with GitHub Actions Inactive

github-actions Bot added the strands-running label Jun 2, 2026

github-actions Bot reviewed Jun 2, 2026

View reviewed changes

github-actions Bot removed the strands-running label Jun 2, 2026

poshinchen mentioned this pull request Jun 2, 2026

fix(mappers): join all toolResult.content blocks to fix faithfulness false negatives #240

Merged

poshinchen closed this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: joined tool_results to prevent context lost#239

fix: joined tool_results to prevent context lost#239
poshinchen wants to merge 1 commit into
strands-agents:mainfrom
poshinchen:fix/fix-tool-results

poshinchen commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

poshinchen commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poshinchen commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

github-actions Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

poshinchen commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

poshinchen commented Jun 2, 2026 •

edited

Loading