Skip to content

fix: joined tool_results to prevent context lost#239

Closed
poshinchen wants to merge 1 commit into
strands-agents:mainfrom
poshinchen:fix/fix-tool-results
Closed

fix: joined tool_results to prevent context lost#239
poshinchen wants to merge 1 commit into
strands-agents:mainfrom
poshinchen:fix/fix-tool-results

Conversation

@poshinchen

@poshinchen poshinchen commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes a bug where multi-block toolResult.content was silently dropped to the first block, causing FaithfulnessEvaluator (and other evaluators) to flag values from content[1+] as hallucinated.

Root cause: Three sites in StrandsInMemorySessionMapper and two in CloudWatchSessionMapper read content[0] (or response[0]) only:

  • StrandsInMemorySessionMapper._process_tool_results (legacy convention)
  • StrandsInMemorySessionMapper._convert_inference_messages (latest convention — had a ## To-do comment)
  • StrandsInMemorySessionMapper._convert_tool_execution_span (latest convention)
  • CloudWatchSessionMapper._extract_tool_results / _process_tool_results (via _extract_tool_result_text)

Fix: Two shared helpers in mappers/utils.py, used by both mappers:

  • serialize_tool_result_block(block) — serializes a single Bedrock-style content block. Handles text, json (via json.dumps), and image/document/video (placeholder marker so the judge knows non-text data exists).
  • join_tool_result_content(content) — joins all blocks with \n, filtering empties.

This addresses both the reported text-block bug and the latent same-shape bug for json blocks (common via Strands' @tool decorator returning dicts).

Related Issues

Fixes #235

Ref: aws/agentcore-cli#1393

Documentation PR

N/A

Type of Change

Bug fix

Testing

  • 4 new regression tests in test_strands_in_memory_mapper.py covering: legacy multi-text, legacy text+json, latest-convention multi-text inference, latest-convention multi-text tool execution span.

  • 1 new regression test in test_cloudwatch_session_mapper.py for multi-block tool result reaching the ToolExecutionSpan.

  • All 172 existing mapper tests pass unchanged.

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

return ""


def join_tool_result_content(content: Any) -> str:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: serialize_tool_result_block and join_tool_result_content have multiple code paths (text, json, image/document/video, non-dict, None, str, list) but no direct unit tests in test_utils.py. The existing regression tests only exercise the text and json paths through the mapper integration layer.

Suggestion: Add focused unit tests in tests/strands_evals/mappers/test_utils.py for these functions covering edge cases:

  • serialize_tool_result_block with: None, non-dict, {"text": ""}, {"json": {...}}, {"json": <unserializable>}, {"image": ...}, {"document": ...}, {"video": ...}, empty dict
  • join_tool_result_content with: None, [], "", a plain string, a list with mixed block types, a non-list/non-str value

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Assessment: Approve

Clean, well-scoped bug fix that correctly consolidates duplicated content[0]-only logic into shared helpers. The approach is DRY, handles edge cases thoughtfully, and the regression tests prove the fix works end-to-end.

Review Details
  • Testing: The new helpers have good indirect coverage through mapper tests, but would benefit from direct unit tests in test_utils.py to lock down each branch (image/document/video placeholders, json serialization fallback, non-dict inputs, etc.)
  • Code quality: Implementation is clean — good use of early returns, proper fallback handling for json.dumps, and sensible placeholder markers for binary content types.

Nice consolidation that eliminates the ## To-do comment and fixes both text and json block handling in one pass.

@poshinchen

Copy link
Copy Markdown
Contributor Author

closing it since #240 is preferred.

@poshinchen poshinchen closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] FaithfulnessEvaluator drops multi-part toolResult.content → false negatives

1 participant