Skip to content

chore(redteam): redteam multi-agent session#251

Open
poshinchen wants to merge 2 commits into
strands-agents:mainfrom
poshinchen:chore/redteam-multiagent
Open

chore(redteam): redteam multi-agent session#251
poshinchen wants to merge 2 commits into
strands-agents:mainfrom
poshinchen:chore/redteam-multiagent

Conversation

@poshinchen

@poshinchen poshinchen commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a StrandsMultiAgentSession so red-team strategies can target a MultiAgentBase (Graph, Swarm, nested orchestrator) in addition to a single Agent, and routes such targets through the task builder.

The session walks the orchestrator tree once at init into two path indexes — leaf agents and orchestrators — and uses them to drive a composite snapshot/restore:

  • snapshot() → one Agent.take_snapshot(preset="session") per leaf + one serialize_state() per orchestrator, wrapped in _MultiAgentSnapshot and stored opaquely inside TargetCheckpoint.agent_snapshot.
  • restore() → orchestrators first via deserialize_state, leaves last via load_snapshot, then trace truncation. The order matters: Graph.deserialize_state / Swarm.deserialize_state reset every node's executor state to GraphBuilder-build-time when the payload has no next_nodes_to_execute (the normal between-turn state for a PENDING / COMPLETED orchestrator). Restoring orchestrators first lets the per-leaf snapshots be the final writers; otherwise every Crescendo backtrack against a Graph or Swarm target would silently restart the conversation from build-time and break the "escalate from accumulated context" property.
  • reset() → replays the baseline composite (or, with no baseline, best-effort clears each leaf's messages).
  • invoke() → diffs each leaf's messages tail across root(message) and runs _tool_uses_in over each diff, so tool uses anywhere in the tree are captured.
  • _multi_agent_result_text flattens MultiAgentResult.results via NodeResult.get_agent_results to a single string for the strategy; falls back to str(result) rather than raising into the per-case try/except.

task.py and _build_session now accept Agent | MultiAgentBase | TargetSession. The build-time baseline for a MultiAgentBase target is captured via a throwaway StrandsMultiAgentSession(agent).snapshot().agent_snapshot so this layer never has to import _MultiAgentSnapshot directly.

Related Issues

N/A

Documentation PR

N/A

Type of Change

New feature (multi-agent target support for the experimental red-team slice).

Testing

  • New tests/strands_evals/experimental/redteam/test_multi_agent_session.py (16 tests) covering tree indexing (top-level, nested, unsupported-executor skip), invoke + tail-diff trace capture, snapshot/restore round trips, wrong-payload TypeError, and reset() with and without a baseline. Uses real Agent instances so the SDK take_snapshot/load_snapshot paths run for real.

  • A dedicated TestRestoreOrderVsGraphReset class with a _GraphLikeOrchestrator whose deserialize_state mimics the real Graph / Swarm reset path (calls executor-state reset on every node when next_nodes_to_execute is empty). Both regression tests fail if the loops in _restore are reverted to leaf-first ordering and pass with orchestrators-first.

  • Existing tests/strands_evals/experimental/redteam/test_task.py extended to verify a MultiAgentBase target is routed to StrandsMultiAgentSession and the baseline is captured once at build time.

  • Full red-team suite green: hatch test tests/strands_evals/experimental/redteam/ → 133 passed.

  • hatch fmt --linter clean on the touched files.

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Comment thread src/strands_evals/experimental/redteam/strategies/target_session.py Outdated
Comment thread src/strands_evals/experimental/redteam/strategies/target_session.py
Comment thread src/strands_evals/experimental/redteam/strategies/target_session.py
Comment thread tests/strands_evals/experimental/redteam/test_multi_agent_session.py Outdated
@github-actions

Copy link
Copy Markdown

Review Summary

Assessment: Comment (approve-leaning)

Solid, well-reasoned addition. I checked out the branch, ran the touched tests (29 passed), confirmed ruff is clean, and verified every SDK surface this PR depends on against the installed strands-agents 1.42.0 (Agent.take_snapshot/load_snapshot, MultiAgentBase.serialize_state/deserialize_state, NodeResult.get_agent_results, node.executor) — all match. The orchestrators-first restore ordering is the subtle correctness crux of the PR and it's backed by a dedicated regression test (TestRestoreOrderVsGraphReset) that genuinely fails if the loops are reverted. No litellm/Jinja2/_internal introduced; annotations and structured logging follow the repo conventions.

Review Categories
  • Correctness: Restore ordering and trace-truncation logic are sound and well-tested; the build-time-reset hazard from Graph/Swarm.deserialize_state is correctly handled.
  • Documentation accuracy: One docstring points at a non-existent StrandsAgentSession._restore_state and attributes a deepcopy pattern that StrandsAgentSession doesn't actually use — worth correcting since the rest of the docstrings are unusually precise.
  • API surface: StrandsMultiAgentSession is in the module __all__ but not re-exported from strategies/__init__.py the way StrandsAgentSession is — asymmetric; intentional or oversight?
  • Edge cases: Tail-diff trace capture double-counts if a single Agent instance is shared across two node paths (out of scope for Graph/Swarm, but undocumented).
  • Tests: Comprehensive and use real Agent instances so the SDK snapshot paths run for real; one try/except/else raise-assertion could use pytest.raises (nit).

Nicely done — the regression tests around restore ordering are exactly the kind of guard this change needed.

@poshinchen poshinchen force-pushed the chore/redteam-multiagent branch from 2c1e34d to c98f5a4 Compare June 10, 2026 21:33
@poshinchen poshinchen force-pushed the chore/redteam-multiagent branch from c98f5a4 to d8b5aa1 Compare June 10, 2026 21:45
@yonib05 yonib05 added area-redteam Red teaming: adversarial generation, attack strategies, attack success evaluation chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact labels Jun 11, 2026
@github-actions

Copy link
Copy Markdown

Re-review of d8b5aa1 — Approve

Thanks for the quick turnaround. All four items from the previous review are addressed, and I re-verified the branch end-to-end:

  • Docstring accuracy — the _restore_state / "same pattern" cross-reference is gone; _restore now explains the deepcopy rationale and the orchestrators-first ordering on their own terms.
  • API surfaceStrandsMultiAgentSession is now re-exported from strategies/__init__.py (and its __all__), symmetric with StrandsAgentSession.
  • Tool-use double-counting — documented as out-of-scope in the invoke docstring, with the correct note that Graph/Swarm give each node its own executor.
  • pytest.raises — adopted.
  • ✅ The AGENTS.md addition codifying the single-backtick docstring convention is a nice touch that explains the broader RST→backtick cleanup across the diff.
Verification performed
  • Checked out d8b5aa1, installed the package, ran the touched suites → 29 passed.
  • ruff check on all five touched files → clean.
  • Stress-tested the correctness crux: reverted _restore to leaf-first (orchestrators-last) ordering and confirmed both TestRestoreOrderVsGraphReset tests fail (assert 1 == 2), then pass again once restored — the regression guard genuinely protects the build-time-reset hazard.

No remaining blockers from my side. Nicely iterated.

@jjbuck jjbuck left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a silent issue where Swarm cases crash but are silently scored "defended".

The relevant locations are target_session.py:391-397 (_restore), triggered via task.py:75 and crescendo:270

The assumption in this PR's _restore function is that both Graph and Swarm deserialize_state reset leaves "when the payload has no next_nodes_to_execute., but I think this is true only for Graph.

  • Graph.deserialize_state uses a truthiness check (if not payload.get("next_nodes_to_execute")). Concretely, an empty list takes the reset path.
  • Swarm.deserialize_state (swarm.py:1020) uses a membership check ("next_nodes_to_execute" in payload), and Swarm.serialize_state (swarm.py:991) always emits that key (empty list for a
    settled swarm). So a round-tripped Swarm always takes the resume branch → sets _resume_from_session=True, and _from_dict leaves current_node=None. The next invoke dereferences current_node.node_id at swarm.py:416.

So after reset: _resume_from_session = True | current_node = None, we run into an exception where AttributeError - 'NoneType' object has no attribute 'node_id' at swarm.py:416

Because task.py:75 calls session.reset() before every case, the first invoke crashes, the base Experiment's per-case try/except records score=0 / "defended," and every Swarm case is silently mislabeled safe. (Secondarily, a settled-checkpoint restore also leaves a stale state.task, so Crescendo's next attacker message is dropped.)

I think the fix is to not treat deserialize_state as an idempotent reset across all MultiAgentBase. After orch.deserialize_state(...), force _resume_from_session=False when the restored status is
PENDING/COMPLETED/FAILED so the next invoke re-initializes cleanly (leaf rollback still comes from load_snapshot); or explicitly scope the feature to Graph until Swarm is handled.

I think this tricky issue may have snuck because most of the tests are tailored to Graph agents, not swarms.

@github-actions

Copy link
Copy Markdown

Re-review of fb0bbe9 (force-pushed over d8b5aa1) — Approve

The force-push folded in the prior fixes and added a new, substantive correctness fix: _force_fresh_invoke_if_settled, which clears the SDK's private _resume_from_session flag after restoring a settled-status orchestrator. I reviewed this fresh against strands-agents 1.43.0 (the env bumped from 1.42.0 since my last pass) and verified every claim in the new docstring directly against SDK source rather than taking it on faith.

The new fix is real, not speculative

I reproduced the exact bug it guards against against a real Swarm:

serialized status: completed
next_nodes_to_execute present: True = []          # serialize_state ALWAYS emits the key
after deserialize, _resume_from_session = True     # branch is on key PRESENCE, not truthiness
after deserialize, current_node = None             # → next invoke derefs None → AttributeError

Confirmed in SDK source:

  • Swarm.serialize_state always includes next_nodes_to_execute (empty list via the else branch for a settled swarm).
  • Swarm.deserialize_state sets self._resume_from_session = "next_nodes_to_execute" in payload — membership, not truthiness — so a settled swarm wrongly takes the resume branch and skips the node reset, leaving current_node=None.
  • Buried by the experiment's per-case try/except, this would silently mislabel every Swarm case as "defended" (score 0) — exactly the failure mode the docstring calls out.

Verification performed

Checks (all green)
  • Checked out fb0bbe9, installed the package against SDK 1.43.0, ran the touched suites → 35 passed (was 29; +6 for the new guard).
  • ruff check on redteam/ + touched tests → clean.
  • Status set is correct: settled = {PENDING, COMPLETED, FAILED} correctly excludes EXECUTING and INTERRUPTED (verified the live enum).
  • Guard tests are non-vacuous: disabled the _force_fresh_invoke_if_settled call site and confirmed 4 of 6 TestSwarmShapedResumeFlagGuard tests fail (AttributeError: 'NoneType'... + resume-flag assertions); the 2 that still pass correctly assert the guard leaves INTERRUPTED/non-settled payloads untouched.
  • "No-op-or-better for Graph" claim holds: a GraphBuilder-built Graph carries _resume_from_session (set in __init__), so the guard never emits a spurious rename warning on Graph restores — confirmed on a live instance.
  • The missing-attribute warning path is itself tested (test_settled_payload_without_resume_attr_logs_warning), so a future SDK rename surfaces loudly instead of silently regressing.

Notes (non-blocking)

  • Reaching into the private _resume_from_session is the right call here given the SDK bug, and it's well-defended: the warning-on-missing-attribute + a dedicated test means a rename won't silently rot. Worth a short upstream issue against strands-agents so the key-presence-vs-truthiness branch in Swarm.deserialize_state gets fixed at the source and this workaround can eventually be retired.

All four earlier comments remain addressed. No remaining blockers — this is a clean, well-tested fix to a genuinely subtle SDK interaction. Nicely done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-redteam Red teaming: adversarial generation, attack strategies, attack success evaluation chore Maintenance tasks, dependency updates, CI changes, refactoring with no user-facing impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants