Skip to content
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
3951d3d
Add pending message queue and background tool execution
DouweM Apr 4, 2026
f530e2a
Address review feedback and update snapshots after main rebase
DouweM Apr 24, 2026
11addf5
Merge remote-tracking branch 'origin/main' into background-tools
DouweM Apr 24, 2026
b1217e5
Use list instead of deque for pending_messages to fix Temporal serial…
DouweM Apr 24, 2026
648ca4c
Fix coverage gaps: ContextVar-based per-run state + test simplifications
DouweM Apr 24, 2026
98207fb
Remove incorrect 'pragma: no cover' on background task cleanup paths
DouweM Apr 24, 2026
eed6298
Add test exercising background task cancellation on run abort
DouweM Apr 24, 2026
cb91f8c
Refocus PR on pending message queue; drop background tools to harness
DouweM Apr 25, 2026
034d2c6
Merge remote-tracking branch 'origin/main' into background-tools
DouweM Apr 25, 2026
ff9e760
Thread pending_messages through Agent.system_prompt_parts
DouweM Apr 28, 2026
f8ba158
Address auto-review feedback on PR #4980
DouweM Apr 29, 2026
97ebbd2
Stamp steering ModelRequest with timestamp + run_id at construction
DouweM Apr 29, 2026
0d8ab31
Address auto-review round 2
DouweM Apr 29, 2026
ebfa2d0
Switch agent_run.enqueue example to follow_up
DouweM Apr 29, 2026
1a546ef
Merge remote-tracking branch 'origin/main' into background-tools
DouweM Apr 29, 2026
c4e3250
Round 4: cover system-prompt enqueue path + warn about follow-up loops
DouweM Apr 29, 2026
9b4a747
Round 5: dedup enqueue validation, consolidate UsageLimits warning
DouweM Apr 29, 2026
04317a6
Merge main into background-tools
DouweM May 11, 2026
a8cdef0
Accept strings and UserContent in `enqueue`
DouweM May 11, 2026
f362019
Allow `enqueue` to accept a full `ModelRequest`
DouweM May 11, 2026
0cfcaac
Merge parts-style enqueues at drain, keep ModelRequest passthrough di…
DouweM May 11, 2026
c3e7341
Merge main into branch
DouweM May 13, 2026
2efb668
Rename priorities to `'asap'` / `'when_idle'`, drop `SystemPromptPart…
DouweM May 13, 2026
12089ff
Add snapshot tests for `'asap'` end-of-run drain + rich message_histo…
DouweM May 14, 2026
9a5f4ab
Fix coverage gaps in enqueue tests
DouweM May 14, 2026
62f57f1
Merge remote-tracking branch 'origin/main' into background-tools
DouweM May 14, 2026
ce543f4
Suppress reportPrivateUsage on `_clean_message_history` import in wir…
DouweM May 14, 2026
9503185
Fix lint D417 + apply ruff format / autofixes after merge
DouweM May 14, 2026
d226586
Address auto-review: fix merge bugs + inline `coerce_enqueue_item`
DouweM May 14, 2026
a18e4c6
Address auto-review borderline items
DouweM May 15, 2026
9b1c6c2
Stamp `conversation_id` on drain-created `ModelRequest`s (Devin review)
DouweM May 15, 2026
070bdcc
Split 'asap' and 'when_idle' into separate ModelRequests at end-of-ru…
DouweM May 15, 2026
6c89afe
Cover the producer-supplied `conversation_id` branch in `_flatten_dra…
DouweM May 15, 2026
6965079
Document thread-safety contract on `enqueue` (Devin review)
DouweM May 15, 2026
b59f305
Merge remote-tracking branch 'origin/main' into background-tools
DouweM May 18, 2026
9fcc3ea
Fix RunUsage import to use canonical source
DouweM May 18, 2026
b60ebb1
Move enqueue helpers to `_enqueue.py`, pre-package to `ModelRequest`
DouweM May 18, 2026
f1d57bb
Apply auto-review fixes: drop redundant comment + migrate wire-merge …
DouweM May 19, 2026
dc32827
Address PR review: reinject pending_messages, bare-iteration warning,…
DouweM May 21, 2026
9d97861
Merge remote-tracking branch 'origin/main' into background-tools
DouweM May 21, 2026
ce266a0
Expand enqueue to accept request parts and interleaved message sequences
DouweM May 21, 2026
328aa88
Drop `pending_messages` param from `system_prompt_parts`; raise on en…
DouweM May 21, 2026
3618943
test: drop unreachable line in enqueue-without-queue test
DouweM May 22, 2026
6cea37b
Address review: use `UserError`/`list` for enqueue, centralize run-me…
DouweM May 22, 2026
d9cf1a9
Make `enqueue` flat-variadic over `UserContent`; keep run-metadata he…
DouweM May 22, 2026
4e6a3fd
Raise `UndrainedPendingMessagesError` on undrained bare-iteration que…
DouweM May 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions docs/message-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,75 @@ print(result2.all_messages())
"""
```

## Injecting messages mid-run
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves mention in at least one more place, like in the tool and hooks docs, where RunContext is available, telling people they can enqueue from there.


Tools, capability hooks, and external code driving an agent run can inject extra
[`ModelRequestPart`][pydantic_ai.messages.ModelRequestPart]s into the conversation
mid-run via a pending message queue. Use this when something happens during a run
that the agent should know about — a tool wants to add follow-up context, an external
event needs to redirect the agent's plan, or background work needs to reach the agent
when it completes.

Enqueued parts are bundled into a [`PendingMessage`][pydantic_ai.messages.PendingMessage]
and drained automatically based on a `priority`:

- `'steering'` (default): drained into the next [`ModelRequest`][pydantic_ai.messages.ModelRequest] before the model call. Use when the new context should influence the agent's *next* step.
- `'follow_up'`: drained only when the agent would otherwise end. The agent run continues with a new model request that includes the follow-up parts. Use when the agent shouldn't stop while there's still pending work.

### From inside a tool or hook

Use [`RunContext.enqueue`][pydantic_ai.tools.RunContext.enqueue] when you have a
`RunContext` in scope:

```python {title="enqueue_from_tool.py"}
from pydantic_ai import Agent, RunContext, SystemPromptPart

agent = Agent('test')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other examples in this file use Agent('openai:gpt-5.2') — these two new examples should follow the same pattern instead of using the 'test' model. Real model names help users understand how to apply the feature in their own code and are consistent with the documentation guidelines.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both new examples use Agent('test'), but every other example in this file uses a real model name like Agent('openai:gpt-5.2'). Please switch to a real model name for consistency — real model names help users understand how to set up their own code and prevent them from cargo-culting 'test' into their projects.

(Already flagged in an earlier comment but still unaddressed.)



@agent.tool
def trigger_alert(ctx: RunContext[None]) -> str:
ctx.enqueue(SystemPromptPart('Alert: production is degraded, prioritize triage.'))
return 'alert raised'
```

The steering message is appended to the agent's message history and is visible to the
model on the next request, alongside any tool returns from the same step.

### From external code driving `agent.iter()`

Use [`AgentRun.enqueue`][pydantic_ai.run.AgentRun.enqueue] when you're driving a run
from outside (e.g. forwarding events from a webhook, chat platform, or job queue):

```python {title="enqueue_from_agent_run.py"}
from pydantic_ai import Agent, UserPromptPart

agent = Agent('test')


async def main():
async with agent.iter('Start drafting the report') as agent_run:
agent_run.enqueue(
UserPromptPart('Change of plan: focus on Q3 revenue first.'),
priority='steering',
)
async for _ in agent_run:
...
```
Comment on lines +461 to +473
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example uses async for _ in agent_run: to iterate, but the Limitations box immediately below warns that follow-up messages aren't drained inside bare async for loops. While this example uses priority='steering' (which does work), showing async for right before calling out its limitation with follow-ups could mislead users into thinking async for works for all priorities.

Consider either:

  • Using agent_run.next() in this example (showing the recommended pattern that works for both priorities), or
  • Adding a brief inline note that this works because steering messages are drained before model requests regardless of iteration style


[`AgentRun.pending_messages`][pydantic_ai.run.AgentRun.pending_messages] exposes the
current queue for inspection.

!!! info "Limitations"
- Follow-up messages need [`Agent.run`][pydantic_ai.agent.AbstractAgent.run] or
explicit [`AgentRun.next()`][pydantic_ai.run.AgentRun.next] driving — they
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Agent.iter + AgentRun.next driving"

Can we detect the "misconfigured" usage, and raise an error when we had pending messages?

aren't drained inside a bare `async for node in agent_run:` loop. Steering
messages work in either case.
- Inside a [Temporal](durable_execution/temporal.md) workflow, tools run in
activities and don't share state with the workflow, so `ctx.enqueue` from a
tool doesn't currently propagate back to the run. Enqueue from the workflow
context (e.g. via `AgentRun.enqueue`) instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note about infinite follow-up loops would be helpful: if something keeps enqueuing follow-up messages (e.g. a tool that always enqueues a follow-up and gets called on every iteration), the agent will loop indefinitely unless usage_limits are configured. A brief mention of UsageLimits as the safety net would help users avoid this pitfall — something like "Set usage_limits to guard against unbounded follow-up cycles."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Limitations box should warn about infinite follow-up loops. If a tool always enqueues a follow-up and gets called on every iteration, the agent will loop indefinitely. A brief mention of UsageLimits as the safety net would help users avoid this pitfall — something like:

Set [usage_limits][pydantic_ai.usage.UsageLimits] to guard against unbounded follow-up cycles.

## Processing Message History

Sometimes you may want to modify the message history before it's sent to the model. This could be for privacy
Expand Down
4 changes: 4 additions & 0 deletions pydantic_ai_slim/pydantic_ai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@
PartDeltaEvent,
PartEndEvent,
PartStartEvent,
PendingMessage,
PendingMessagePriority,
Comment thread
devin-ai-integration[bot] marked this conversation as resolved.
Outdated
RetryPromptPart,
SystemPromptPart,
TextContent,
Expand Down Expand Up @@ -232,6 +234,8 @@
'PartDeltaEvent',
'PartEndEvent',
'PartStartEvent',
'PendingMessage',
'PendingMessagePriority',
'RetryPromptPart',
'SystemPromptPart',
'TextContent',
Expand Down
3 changes: 3 additions & 0 deletions pydantic_ai_slim/pydantic_ai/_agent_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ class GraphAgentState:
"""Last-resolved `max_tokens` from model settings, used only in error messages."""
last_model_request_parameters: models.ModelRequestParameters | None = None
"""Last-resolved model request parameters, used for OTel span attributes."""
pending_messages: list[_messages.PendingMessage] = dataclasses.field(default_factory=list[_messages.PendingMessage])
"""Queue of messages waiting to be injected into the conversation."""

def check_incomplete_tool_call(self) -> None:
"""Raise `IncompleteToolCall` if the last model response was truncated mid-tool-call."""
Expand Down Expand Up @@ -1267,6 +1269,7 @@ def build_run_context(ctx: GraphRunContext[GraphAgentState, GraphAgentDeps[DepsT
run_id=ctx.state.run_id,
metadata=ctx.state.metadata,
tool_manager=ctx.deps.tool_manager,
pending_messages=ctx.state.pending_messages,
)
validation_context = build_validation_context(ctx.deps.validation_context, run_context)
run_context = replace(run_context, validation_context=validation_context)
Expand Down
24 changes: 24 additions & 0 deletions pydantic_ai_slim/pydantic_ai/_run_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from pydantic_ai._instrumentation import DEFAULT_INSTRUMENTATION_VERSION

from . import _utils, messages as _messages
from .messages import PendingMessage, PendingMessagePriority

if TYPE_CHECKING:
from .agent.abstract import AbstractAgent
Expand Down Expand Up @@ -92,6 +93,14 @@ class RunContext(Generic[RunContextAgentDepsT]):
`after_model_request`). Currently `None` in tool hooks, output validators,
and during agent construction.
"""
pending_messages: list[PendingMessage] = field(default_factory=list[PendingMessage])
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I see where this is public. It does make sense here if users want to inspect it, but do they really need to be able to? Is that required for the harness features we've worked on? I prefer starting with things private, and making them public only if we have a use case. So I'm ok dropping this pending_messages as a public field. Maybe enqueue should not write onto the RunContext, but rather write right onto the graph deps/state?

"""Queue of messages waiting to be injected into the conversation.

Messages are drained automatically: `'steering'` messages before the next model
request, `'follow_up'` messages when the agent would otherwise end.

Use [`enqueue`][pydantic_ai.tools.RunContext.enqueue] to add messages.
"""

tool_manager: ToolManager[RunContextAgentDepsT] | None = None
"""The tool manager for the current run step.
Expand All @@ -109,6 +118,21 @@ def last_attempt(self) -> bool:
"""Whether this is the last attempt at running this tool before an error is raised."""
return self.retry == self.max_retries

def enqueue(
self,
*parts: _messages.ModelRequestPart,
priority: PendingMessagePriority = 'steering',
) -> None:
"""Enqueue message parts to be injected into the conversation.

Args:
*parts: One or more message parts (e.g. `SystemPromptPart`, `UserPromptPart`).
priority: When to inject:
`'steering'` (default) — before the next model request.
`'follow_up'` — when the agent would otherwise end.
"""
self.pending_messages.append(PendingMessage(parts=parts, priority=priority))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating that at least one part is passed. Currently ctx.enqueue() with no args creates PendingMessage(parts=()), which the drain would turn into an empty ModelRequest(parts=[]). For steering, that empty request gets appended to message history; for follow-up, it creates a ModelRequestNode with an empty request that would trigger the UserError('No message history, user prompt, or instructions provided') check at _agent_graph.py:784 if there's nothing else in the history.

A simple guard would prevent confusing behavior:

if not parts:
    raise ValueError('enqueue() requires at least one ModelRequestPart')

Same applies to AgentRun.enqueue in run.py:419.


__repr__ = _utils.dataclasses_no_defaults_repr


Expand Down
12 changes: 10 additions & 2 deletions pydantic_ai_slim/pydantic_ai/agent/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,11 @@
from .._output import OutputToolset
from .._template import TemplateStr, validate_from_spec_args
from ..builtin_tools import AbstractBuiltinTool
from ..capabilities import AbstractCapability, CombinedCapability
from ..capabilities import (
AbstractCapability,
CombinedCapability,
PendingMessageDrainCapability,
)
from ..capabilities._ordering import has_capability_type
from ..capabilities._tool_search import ToolSearch as ToolSearchCap
from ..capabilities.builtin_tool import BuiltinTool as BuiltinToolCap
Expand Down Expand Up @@ -1157,6 +1161,7 @@ def _merged_meta(ctx: RunContext[AgentDepsT]) -> dict[str, Any]:
messages=state.message_history,
tracer=tracer,
run_step=0,
pending_messages=state.pending_messages,
)

# Determine root capability: override > agent default
Expand Down Expand Up @@ -2665,7 +2670,10 @@ async def run_mcp_servers(
)
"""AgentSpec fields that are not supported at run/override time."""

_AUTO_INJECT_CAPABILITY_TYPES: tuple[type[AbstractCapability[Any]], ...] = (ToolSearchCap,)
_AUTO_INJECT_CAPABILITY_TYPES: tuple[type[AbstractCapability[Any]], ...] = (
ToolSearchCap,
PendingMessageDrainCapability,
)
"""Infrastructure capabilities auto-injected when not already present."""


Expand Down
2 changes: 2 additions & 0 deletions pydantic_ai_slim/pydantic_ai/capabilities/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from typing import Any

from ._pending_messages import PendingMessageDrainCapability
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PendingMessageDrainCapability is imported here and added to __all__ (line 69), but the other auto-injected internal capability (ToolSearch) is neither imported nor exported from this module. Since PendingMessageDrainCapability is similarly internal and auto-injected, it should follow the same pattern: remove the import and the __all__ entry. Users have no reason to reference this class directly.

from .abstract import (
AbstractCapability,
AgentNode,
Expand Down Expand Up @@ -64,6 +65,7 @@

__all__ = [
'AbstractCapability',
'PendingMessageDrainCapability',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other auto-injected internal capability (ToolSearch in _tool_search.py) is not imported or exported here. Since PendingMessageDrainCapability is similarly internal and auto-injected (living in _pending_messages.py with the underscore-prefixed module), it probably shouldn't be in __all__ either.

If users need to reference it for CapabilityOrdering constraints (e.g. wrapped_by=[PendingMessageDrainCapability]), they can import it directly from the private module — but that seems unlikely to be a common need.

'AgentNode',
'CapabilityOrdering',
'CapabilityPosition',
Expand Down
92 changes: 92 additions & 0 deletions pydantic_ai_slim/pydantic_ai/capabilities/_pending_messages.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
"""Auto-injected capability that drains the pending message queue at appropriate times."""

from __future__ import annotations

from typing import TYPE_CHECKING, Any

from pydantic_ai.capabilities.abstract import AbstractCapability, CapabilityOrdering
from pydantic_ai.messages import ModelRequest, PendingMessage, PendingMessagePriority
from pydantic_ai.tools import RunContext

if TYPE_CHECKING:
from pydantic_ai import _agent_graph
from pydantic_ai.models import ModelRequestContext
from pydantic_ai.result import FinalResult
from pydantic_graph import End


def _drain_by_priority(
queue: list[PendingMessage],
priority: PendingMessagePriority,
) -> list[PendingMessage]:
"""Remove and return all messages with the given priority from the queue."""
drained: list[PendingMessage] = []
remaining: list[PendingMessage] = []
for msg in queue:
if msg.priority == priority:
drained.append(msg)
else:
remaining.append(msg)
queue[:] = remaining
return drained
Comment on lines +22 to +35
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Thread safety relies on GIL atomicity of list.append and temporal separation of drain

The docstring on RunContext.enqueue claims thread safety for sync tools (auto-wrapped in a thread executor). This relies on two guarantees: (1) CPython's GIL makes list.append atomic, so a concurrent append from a worker thread won't corrupt the list, and (2) the drain (_drain_by_priority with its queue[:] = remaining pattern) only runs between graph nodes (before_model_request and after_node_run), never concurrently with tool execution. These guarantees hold for the current architecture but are CPython-specific (GIL) and rely on the graph lifecycle not changing. The docs correctly warn about cross-thread/cross-loop callers needing loop.call_soon_threadsafe.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.



class PendingMessageDrainCapability(AbstractCapability[Any]):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a get_serialization_name override returning None, matching the pattern established by ToolSearch in _tool_search.py:29-30. Without it, AgentSpec serialization would emit 'PendingMessageDrainCapability' which isn't registered in CAPABILITY_TYPES, causing deserialization to fail.

(Already flagged in earlier comments but still unaddressed — wanted to make sure it doesn't get lost.)

"""Drains the pending message queue at appropriate times.

- Steering messages are injected before each model request.
- Follow-up messages are injected when the agent would otherwise end,
redirecting to a new ModelRequestNode to continue the conversation.

This capability is always auto-injected and placed outermost via
[`CapabilityOrdering`][pydantic_ai.capabilities.abstract.CapabilityOrdering]
so it wraps around other capabilities. This ensures steering messages are
drained into the model request before user capabilities see it, and follow-up
redirection runs after all other `after_node_run` hooks (which run in reverse).
"""

def get_ordering(self) -> CapabilityOrdering:
# Outermost so steering messages are drained into the request before other
# capabilities see it, and follow-up redirection runs after all other
# after_node_run hooks (which run in reverse order).
return CapabilityOrdering(position='outermost')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PendingMessageDrainCapability inherits the default get_serialization_name() which returns the class name ('PendingMessageDrainCapability'). Since this is an auto-injected internal capability — like ToolSearch which returns None — it should opt out of spec-based construction:

@classmethod
def get_serialization_name(cls) -> str | None:
    return None  # not spec-constructible (auto-injected)

Without this, if get_serialization_name is ever called on this class (e.g. during spec serialization), it would incorrectly appear as a user-configurable capability.

Comment thread
devin-ai-integration[bot] marked this conversation as resolved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment repeats what the class docstring already explains at lines 74-79. Per project style, comments should only be added when the WHY is non-obvious — the class-level documentation already covers why outermost is chosen and how the ordering interacts with the drain lifecycle.

Comment on lines +78 to +79
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment repeats the explanation already in the class docstring at lines 74–79. Per the project's no-comments-unless-WHY-is-non-obvious convention, it can be dropped — the class docstring is the canonical place for this.

Comment on lines +78 to +79
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Two outermost capabilities may have non-deterministic relative ordering

Both PendingMessageDrainCapability (line 83) and Instrumentation (added separately via CombinedCapability in agent/__init__.py:1368-1370) request position='outermost' in their CapabilityOrdering. The relative order between two outermost capabilities depends on the ordering pass's tie-breaking logic. For before_model_request, the drain wants to fire first so subsequent hooks see drained messages, while Instrumentation wants to wrap everything for tracing. If the ordering pass doesn't guarantee a stable relative order between two outermost capabilities, the drain might fire after Instrumentation's before_model_request, meaning the Instrumentation span wouldn't capture the drained messages in its initial view. Tests pass, so the current ordering appears correct, but the implicit dependency on tie-breaking order is fragile.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


async def before_model_request(
self,
ctx: RunContext[Any],
request_context: ModelRequestContext,
) -> ModelRequestContext:
"""Drain steering messages into the model request.

Appends to both `request_context.messages` (so the model sees them in this
request) and `ctx.messages` (so they persist in the agent's message history).
"""
drained = _drain_by_priority(ctx.pending_messages, 'steering')
if drained:
parts = [part for msg in drained for part in msg.parts]
steering_request = ModelRequest(parts=parts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelRequest(parts=parts) will have timestamp=None and run_id=None since both default to None on ModelRequest. Unlike follow-up messages (which go through ModelRequestNode.run() where self.request.timestamp and self.request.run_id are set at _agent_graph.py:773-775), steering messages are appended directly to the message history here and bypass that processing.

This means the steering ModelRequest in the history will lack metadata that every other ModelRequest has. This could cause issues for code that expects timestamp/run_id to be set on all requests, and it also makes the test snapshot at test_enqueue_steering_message_from_tool likely incorrect — it asserts timestamp=IsDatetime() and run_id=IsStr(), which should fail against None values.

Suggest setting these explicitly:

steering_request = ModelRequest(parts=parts, timestamp=_utils.now_utc(), run_id=ctx.run_id)

(with from pydantic_ai import _utils added to the imports)

request_context.messages.append(steering_request)
ctx.messages.append(steering_request)
Comment thread
devin-ai-integration[bot] marked this conversation as resolved.
Outdated
Comment thread
devin-ai-integration[bot] marked this conversation as resolved.
Outdated
return request_context

async def after_node_run(
self,
ctx: RunContext[Any],
*,
node: _agent_graph.AgentNode[Any, Any],
result: _agent_graph.AgentNode[Any, Any] | End[FinalResult[Any]],
) -> _agent_graph.AgentNode[Any, Any] | End[FinalResult[Any]]:
"""Drain follow-up messages when the agent would otherwise end."""
from pydantic_ai._agent_graph import ModelRequestNode
from pydantic_graph import End
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to import these here inline?


if not isinstance(result, End):
return result

follow_ups = _drain_by_priority(ctx.pending_messages, 'follow_up')
if not follow_ups:
return result
Comment thread
devin-ai-integration[bot] marked this conversation as resolved.
Outdated

parts = [part for msg in follow_ups for part in msg.parts]
request = ModelRequest(parts=parts)
return ModelRequestNode(request=request)
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Follow-up drain creates ModelRequestNode without run_step increment or usage limit check

When PendingMessageDrainCapability.after_node_run redirects End to a ModelRequestNode (_pending_messages.py:83-84), this creates a new model request that bypasses the normal UserPromptNode flow. The new ModelRequestNode will trigger a model call, consuming tokens and incrementing usage. However, there's no explicit usage_limits check before this redirect — the check happens inside ModelRequestNode.run(). If the agent is near its usage limit, this redirect could cause UsageLimitExceeded to be raised during the follow-up model call, which would lose the follow-up context. This is arguably correct behavior (limits should be respected), but users may find it surprising that background tool results are silently lost when usage limits are reached.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 No usage limit / max iteration guard for follow-up message loops

The follow-up message drain mechanism (_pending_messages.py:86-92) converts an End to a ModelRequestNode whenever follow-up messages exist. If a tool continuously enqueues follow-up messages (e.g., a background tool that spawns more background work on completion), this creates an unbounded loop. The existing usage_limits mechanism would eventually catch this if token limits are set, but there's no direct guard against infinite follow-up cycling. The standard run_step limit in the graph may also help, but it's worth verifying this is bounded.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steering path (line 77) has a detailed comment explaining why explicit timestamp/run_id stamping is needed. A brief mirror comment here would help future maintainers understand why the follow-up ModelRequest intentionally omits them — because it's wrapped in a ModelRequestNode that goes through the full graph lifecycle where ModelRequestNode._prepare_request stamps self.request at _agent_graph.py:758-760.

Without this, someone reading both paths might think the follow-up case is a bug and "fix" it to match the steering case.

Something like:

# No explicit timestamp/run_id needed: ModelRequestNode.run() stamps
# self.request during the graph lifecycle (_agent_graph.py:758-760).
request = ModelRequest(parts=parts)

32 changes: 32 additions & 0 deletions pydantic_ai_slim/pydantic_ai/messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -2036,6 +2036,38 @@ def provider_request_id(self) -> str | None:
ModelMessage = Annotated[ModelRequest | ModelResponse, pydantic.Discriminator('kind')]
"""Any message sent to or returned by a model."""


PendingMessagePriority = Literal['steering', 'follow_up']
"""Priority level for a pending message.

- `'steering'`: Drained into the next model request (before the model call).
- `'follow_up'`: Drained only when the agent would otherwise end, preventing
premature termination while follow-up work is pending.
"""


@dataclass
class PendingMessage:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: PendingMessage is a plain dataclass in a module where all other message types (ModelRequest, ModelResponse, parts) are Pydantic BaseModels. That's a reasonable choice since PendingMessage is transient runtime state (not serialized), but it's worth noting explicitly — e.g. via a brief comment or by mentioning in the docstring that this type is intentionally not a Pydantic model since it's never persisted or serialized. This helps future readers understand the design choice and prevents someone from "upgrading" it to a BaseModel for consistency.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a public type? Do we expose pending_messages to the user? graph deps/state doesn't count, that's internal. I'd prefer to keep private if we can.

"""A message queued for injection into the agent conversation.

Pending messages are enqueued via [`RunContext.enqueue`][pydantic_ai.tools.RunContext.enqueue]
or [`AgentRun.enqueue`][pydantic_ai.run.AgentRun.enqueue] and are
automatically drained at the appropriate time during the agent run.
"""

parts: Sequence[ModelRequestPart]
"""The message parts to inject."""

_: KW_ONLY

priority: PendingMessagePriority = 'steering'
"""When to drain this message:

- `'steering'`: injected before the next model request.
- `'follow_up'`: injected only when the agent would otherwise finish.
"""


ModelMessagesTypeAdapter = pydantic.TypeAdapter(
list[ModelMessage], config=pydantic.ConfigDict(defer_build=True, ser_json_bytes='base64', val_json_bytes='base64')
)
Expand Down
24 changes: 24 additions & 0 deletions pydantic_ai_slim/pydantic_ai/run.py
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: when_idle drain only works with agent.run() or agent_run.next(), not bare async for

The when_idle drain fires in after_node_run, which is a capability hook invoked by _run_node_with_hooks (used by AgentRun.next() and Agent.run()). The bare async for node in agent_run: path uses __anext__ which calls the graph runner directly without firing capability hooks. This means when_idle messages are never drained in bare iteration mode. The asap drain still works because it fires in before_model_request which runs inside ModelRequestNode.run() regardless of the driving mode. This limitation is clearly documented in the PR at docs/message-history.md:465-469, including the recommendation to use AgentRun.next() for when_idle messages.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,30 @@ def run_id(self) -> str:
"""The unique identifier for the agent run."""
return self._graph_run.state.run_id

@property
def pending_messages(self) -> list[_messages.PendingMessage]:
"""Queue of messages waiting to be injected into the conversation.

Messages are drained automatically: `'steering'` messages before the next model
request, `'follow_up'` messages when the agent would otherwise end.
"""
return self._graph_run.state.pending_messages

def enqueue(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adtyavrdhn Here's a reason to bring the AgentEventStream class back: we're adding an AgentRun.enqueue method here that's useful to be able to call in the middle of a streaming agent run, but this one only works for iter, not run_stream_events. As a followup, consider bringing that class back (even thought we just removed it from v2-main) with the enqueue method, and possibly cancel as well at some point, right?

self,
*parts: _messages.ModelRequestPart,
priority: _messages.PendingMessagePriority = 'steering',
) -> None:
"""Enqueue message parts to be injected into the conversation.

Args:
*parts: One or more message parts (e.g. `SystemPromptPart`, `UserPromptPart`).
priority: When to inject:
`'steering'` (default) — before the next model request.
`'follow_up'` — when the agent would otherwise end.
"""
self._graph_run.state.pending_messages.append(_messages.PendingMessage(parts=parts, priority=priority))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation + append logic here is duplicated verbatim from RunContext.enqueue (_run_context.py:134-136). Consider extracting a shared helper (e.g. on PendingMessage itself, or a module-level function next to PendingMessage) to keep the validation in one place. Minor, but per the project's "extract duplicated logic into shared helpers after 2+ occurrences" guideline.


def __repr__(self) -> str: # pragma: no cover
result = self._graph_run.output
result_repr = '<run not finished>' if result is None else repr(result.output)
Expand Down
Loading
Loading