chore(redteam): redteam multi-agent session#251
Conversation
8017d6e to
2c1e34d
Compare
Review SummaryAssessment: Comment (approve-leaning) Solid, well-reasoned addition. I checked out the branch, ran the touched tests (29 passed), confirmed Review Categories
Nicely done — the regression tests around restore ordering are exactly the kind of guard this change needed. |
2c1e34d to
c98f5a4
Compare
c98f5a4 to
d8b5aa1
Compare
Re-review of
|
jjbuck
left a comment
There was a problem hiding this comment.
I think there's a silent issue where Swarm cases crash but are silently scored "defended".
The relevant locations are target_session.py:391-397 (_restore), triggered via task.py:75 and crescendo:270
The assumption in this PR's _restore function is that both Graph and Swarm deserialize_state reset leaves "when the payload has no next_nodes_to_execute., but I think this is true only for Graph.
Graph.deserialize_stateuses a truthiness check (if not payload.get("next_nodes_to_execute")). Concretely, an empty list takes the reset path.Swarm.deserialize_state(swarm.py:1020) uses a membership check ("next_nodes_to_execute" in payload), andSwarm.serialize_state(swarm.py:991) always emits that key (empty list for a
settled swarm). So a round-tripped Swarm always takes the resume branch → sets_resume_from_session=True, and_from_dictleavescurrent_node=None. The next invoke dereferencescurrent_node.node_id at swarm.py:416.
So after reset: _resume_from_session = True | current_node = None, we run into an exception where AttributeError - 'NoneType' object has no attribute 'node_id' at swarm.py:416
Because task.py:75 calls session.reset() before every case, the first invoke crashes, the base Experiment's per-case try/except records score=0 / "defended," and every Swarm case is silently mislabeled safe. (Secondarily, a settled-checkpoint restore also leaves a stale state.task, so Crescendo's next attacker message is dropped.)
I think the fix is to not treat deserialize_state as an idempotent reset across all MultiAgentBase. After orch.deserialize_state(...), force _resume_from_session=False when the restored status is
PENDING/COMPLETED/FAILED so the next invoke re-initializes cleanly (leaf rollback still comes from load_snapshot); or explicitly scope the feature to Graph until Swarm is handled.
I think this tricky issue may have snuck because most of the tests are tailored to Graph agents, not swarms.
d8b5aa1 to
fb0bbe9
Compare
Re-review of
|
Description
Adds a
StrandsMultiAgentSessionso red-team strategies can target aMultiAgentBase(Graph, Swarm, nested orchestrator) in addition to a singleAgent, and routes such targets through the task builder.The session walks the orchestrator tree once at init into two path indexes — leaf agents and orchestrators — and uses them to drive a composite snapshot/restore:
snapshot()→ oneAgent.take_snapshot(preset="session")per leaf + oneserialize_state()per orchestrator, wrapped in_MultiAgentSnapshotand stored opaquely insideTargetCheckpoint.agent_snapshot.restore()→ orchestrators first viadeserialize_state, leaves last viaload_snapshot, then trace truncation. The order matters:Graph.deserialize_state/Swarm.deserialize_statereset every node's executor state toGraphBuilder-build-time when the payload has nonext_nodes_to_execute(the normal between-turn state for aPENDING/COMPLETEDorchestrator). Restoring orchestrators first lets the per-leaf snapshots be the final writers; otherwise every Crescendo backtrack against a Graph or Swarm target would silently restart the conversation from build-time and break the "escalate from accumulated context" property.reset()→ replays the baseline composite (or, with no baseline, best-effort clears each leaf'smessages).invoke()→ diffs each leaf'smessagestail acrossroot(message)and runs_tool_uses_inover each diff, so tool uses anywhere in the tree are captured._multi_agent_result_textflattensMultiAgentResult.resultsviaNodeResult.get_agent_resultsto a single string for the strategy; falls back tostr(result)rather than raising into the per-case try/except.task.pyand_build_sessionnow acceptAgent | MultiAgentBase | TargetSession. The build-time baseline for aMultiAgentBasetarget is captured via a throwawayStrandsMultiAgentSession(agent).snapshot().agent_snapshotso this layer never has to import_MultiAgentSnapshotdirectly.Related Issues
N/A
Documentation PR
N/A
Type of Change
New feature (multi-agent target support for the experimental red-team slice).
Testing
New
tests/strands_evals/experimental/redteam/test_multi_agent_session.py(16 tests) covering tree indexing (top-level, nested, unsupported-executor skip),invoke+ tail-diff trace capture, snapshot/restore round trips, wrong-payloadTypeError, andreset()with and without a baseline. Uses realAgentinstances so the SDKtake_snapshot/load_snapshotpaths run for real.A dedicated
TestRestoreOrderVsGraphResetclass with a_GraphLikeOrchestratorwhosedeserialize_statemimics the realGraph/Swarmreset path (calls executor-state reset on every node whennext_nodes_to_executeis empty). Both regression tests fail if the loops in_restoreare reverted to leaf-first ordering and pass with orchestrators-first.Existing
tests/strands_evals/experimental/redteam/test_task.pyextended to verify aMultiAgentBasetarget is routed toStrandsMultiAgentSessionand the baseline is captured once at build time.Full red-team suite green:
hatch test tests/strands_evals/experimental/redteam/→ 133 passed.hatch fmt --linterclean on the touched files.I ran
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.