fix: joined tool_results to prevent context lost#239
Conversation
| return "" | ||
|
|
||
|
|
||
| def join_tool_result_content(content: Any) -> str: |
There was a problem hiding this comment.
Issue: serialize_tool_result_block and join_tool_result_content have multiple code paths (text, json, image/document/video, non-dict, None, str, list) but no direct unit tests in test_utils.py. The existing regression tests only exercise the text and json paths through the mapper integration layer.
Suggestion: Add focused unit tests in tests/strands_evals/mappers/test_utils.py for these functions covering edge cases:
serialize_tool_result_blockwith:None, non-dict,{"text": ""},{"json": {...}},{"json": <unserializable>},{"image": ...},{"document": ...},{"video": ...}, empty dictjoin_tool_result_contentwith:None,[],"", a plain string, a list with mixed block types, a non-list/non-str value
|
Assessment: Approve Clean, well-scoped bug fix that correctly consolidates duplicated Review Details
Nice consolidation that eliminates the |
|
closing it since #240 is preferred. |
Description
Fixes a bug where multi-block
toolResult.contentwas silently dropped to the first block, causingFaithfulnessEvaluator(and other evaluators) to flag values fromcontent[1+]as hallucinated.Root cause: Three sites in
StrandsInMemorySessionMapperand two inCloudWatchSessionMapperreadcontent[0](orresponse[0]) only:StrandsInMemorySessionMapper._process_tool_results(legacy convention)StrandsInMemorySessionMapper._convert_inference_messages(latest convention — had a## To-docomment)StrandsInMemorySessionMapper._convert_tool_execution_span(latest convention)CloudWatchSessionMapper._extract_tool_results/_process_tool_results(via_extract_tool_result_text)Fix: Two shared helpers in
mappers/utils.py, used by both mappers:serialize_tool_result_block(block)— serializes a single Bedrock-style content block. Handlestext,json(viajson.dumps), andimage/document/video(placeholder marker so the judge knows non-text data exists).join_tool_result_content(content)— joins all blocks with\n, filtering empties.This addresses both the reported
text-block bug and the latent same-shape bug forjsonblocks (common via Strands'@tooldecorator returning dicts).Related Issues
Fixes #235
Ref: aws/agentcore-cli#1393
Documentation PR
N/A
Type of Change
Bug fix
Testing
4 new regression tests in
test_strands_in_memory_mapper.pycovering: legacy multi-text, legacy text+json, latest-convention multi-text inference, latest-convention multi-text tool execution span.1 new regression test in
test_cloudwatch_session_mapper.pyfor multi-block tool result reaching theToolExecutionSpan.All 172 existing mapper tests pass unchanged.
I ran
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.