feat: input and output guardrails (block, redact, retry) by dsfaccini · Pull Request #249 · pydantic/pydantic-ai-harness

dsfaccini · 2026-05-21T20:12:19Z

Adds two guardrail capabilities with a minimal, callable-based API.

Supersedes #219. Original work by @DEENUU1 (Vstorm) — this PR is #219 plus a main merge and a review-driven redesign, opened from a branch on this repo so it can land while the original fork branch is unreachable for push. Full credit to @DEENUU1; review input from @Kludex and @adtyavrdhn carried over.

What it does

InputGuard(guard, parallel=False) — runs before the first model request.
OutputGuard(guard) — runs as the model output is processed (after_output_process).

A guard is any sync/async callable. It receives the inspected value — the prompt, or the output — optionally preceded by a RunContext (signature-detected, like pydantic-ai's output validators). It returns a bare bool (True = allow) or a GuardResult.

Guard outcomes

GuardResult, built via classmethods:

Outcome	`InputGuard`	`OutputGuard`
`allow()`	send the prompt	return the output
`block(message=None)`	skip the model call, refusal message becomes the response (`SkipModelRequest`)	raise `OutputBlocked`
`replace(value)`	rewrite the prompt sent to the model (redaction)	substitute a sanitized output
`retry(message)`	— usage error	send the output back to the model (`ModelRetry`)

A guard that raises propagates the exception as a hard failure.

Observability

replace and block emit spans on the run's OpenTelemetry tracer (guardrail redacted input, guardrail blocked output, …) with guardrail.* attributes, so redactions and refusals are visible in Logfire. Redacted content is attached only when RunContext.trace_include_content is set. retry needs no special tracing — the retried request appears in the trace on its own.

Notable design points

OutputGuard uses after_output_process (not after_run) so it can redact and trigger ModelRetry; it runs on the final output only, not streaming partials.
InputGuard replace requires sequential mode (a parallel guard races a model call already started with the original prompt); retry is rejected for input.
parallel=True trades tokens for latency — sequential never calls the model on a blocked prompt.

Follow-ups

Prepackaged LLM-based guardrails and Presidio/Azure/OpenAI moderation docs are tracked in #248.

Checks

make format && make lint && make typecheck clean. Guardrails: 49 tests, 100% branch coverage; full suite green.

…rdrails

block_message now accepts a callable so the refusal text can reflect the prompt/output that tripped the guard, rather than being frozen at construction time. InputGuard's sequential path moves from before_model_request into wrap_model_request, so a single hook covers both sequential and parallel modes instead of two hooks each branching on `parallel`. Tests move to tests/guardrails/ to match the tests/<capability>/ layout.

A guard now returns either a bare bool or a GuardResult carrying a refusal message, replacing the separate block_message constructor field. The message is produced when the guard decides, so it can reflect the guard's own reasoning rather than a string frozen at construction time. Guards (and the GuardResult path) may optionally take a RunContext as a first parameter, detected from the signature like pydantic-ai's output validators, so deps- and history-aware guards are possible without closing over globals. Prompt/output-only guards are unchanged.

A guard now reports one of four outcomes via GuardResult classmethods (bool shorthand still works): allow, block, replace, retry. - replace lets a guard redact rather than refuse — InputGuard rewrites the prompt sent to the model, OutputGuard substitutes the output. - retry lets OutputGuard send a bad output back to the model; OutputGuard moves from after_run to after_output_process so it can raise ModelRetry and return a modified output. - replace and block emit spans on the run tracer so a redaction or refusal is visible in Logfire; redacted content is included only when RunContext.trace_include_content is set. InputGuard replace requires sequential mode and retry is rejected as a usage error, since neither is meaningful for input.

The coverage gate measures test files too; the _prompt_text helper had unreached branches. Drop it and assert on message parts inline.

A pydantic-ai-correctness review surfaced two streaming points. Verified both empirically: - InputGuard(parallel=True) works under run_stream() — no deadlock. - OutputGuard GuardResult.retry() is unsupported under run_stream(): pydantic-ai does not retry output while streaming, so a retry verdict surfaces as UnexpectedModelBehavior. Document the retry limitation and that OutputGuard screens only the final output (partial chunks reach the caller first while streaming). Note that input redaction also rewrites persisted history and targets text prompts. Add streaming tests for both guards to lock the behavior in.

The test proving InputGuard(parallel=True) does not deadlock under run_stream() had no timeout — a reintroduced deadlock would hang CI instead of failing. Wrap it in asyncio.wait_for and document the reviewed concern it guards against.

adtyavrdhn · 2026-05-22T04:50:56Z

+"""
+
+
+def _takes_ctx(func: Callable[..., object]) -> bool:


We could import this from pydantic-ai although private we run the suite so we won't break harness

Claude here: pydantic-ai does not expose a shared takes_ctx helper — _output.py::OutputValidator and _system_prompt.py each inline the same len(inspect.signature(...).parameters) > N check. Our _takes_ctx mirrors that convention exactly. Happy to swap if a public helper lands.

Oh yeah it doesn't we'll need to import from a private module which I am fine with in this case given it is our private module

adtyavrdhn · 2026-05-22T04:51:38Z

+
+
+GuardOutcome = bool | GuardResult
+"""What a guard callable returns: a bare `bool` (`True` = allow), or a `GuardResult`."""


Public docstrings could take a look, it could be simpler

adtyavrdhn · 2026-05-22T04:57:10Z

+    outcome = guard(ctx, value) if _takes_ctx(guard) else guard(value)
+    if inspect.isawaitable(outcome):
+        outcome = await outcome
+    if isinstance(outcome, GuardResult):


Isn't this then enforced based on the type?

Claude here: the type annotation is just a hint — runtime discrimination of the bool | GuardResult union needs the isinstance check to narrow. Python doesn't enforce union types at runtime.

adtyavrdhn · 2026-05-22T04:58:07Z

+    original prompt.
+
+    Scope: the guard runs exactly once per run — on the first model request —
+    and evaluates the original user prompt. Subsequent model requests in the


Claude has become increasingly verbose. I am sure it could explain the same thing in fewer words. I dunno if they wanna increase their token costs with this ://

adtyavrdhn · 2026-05-26T11:35:32Z

+            raise UserError('InputGuard could not find a user prompt to redact in the request.')
+        _trace_redaction(ctx, direction='input', original=prompt, replacement=replacement)
+
+    async def wrap_model_request(


I see something tricky here, I think the position for this capability needs to be innermost so that no user morphed messages can sneak by and we can be confident this runs only after all the other capabilities are done.

OutputGuard similarly would need to be outermost but there are outermost capabilities already so it needs to be wrapped by instrumentation or something? Look at deferred caps PR(pydantic/pydantic-ai#5230) for reference, ended up doing something similar without breaking instrumentation

Claude here: applied. InputGuard.get_ordering() → position='innermost' so any message-morphing capability runs first and the guard sees the final prompt. OutputGuard.get_ordering() → position='outermost', wrapped_by=[Instrumentation] so the guard's block/redact spans are always captured by an enclosing Instrumentation span regardless of user list order. Tests added (TestInputGuardOrdering, TestOutputGuardOrdering).

adtyavrdhn · 2026-05-26T11:39:28Z

+        if ctx.partial_output:
+            return output
+        verdict = await _evaluate(self.guard, ctx, output)
+        if verdict.action == 'allow':


Can we use match instead?

Also, there are multiple if conditions, but this is a verdict, right? More than one cannot be true.

adtyavrdhn · 2026-05-26T11:40:35Z

+    """Record a zero-duration span marking a guardrail refusal."""
+    ctx.tracer.start_span(
+        f'guardrail blocked {direction}',
+        attributes={'guardrail.direction': direction, 'guardrail.action': 'block', 'guardrail.message': message},


Is it all right for us to dump the message as is? Is there a risk that the LLM explains in the message why it blocked it and ends up leaking it anyway?

Claude here: applied — _trace_block now gates guardrail.message behind RunContext.trace_include_content, matching how _trace_redaction handles content. Default traces stay content-free; debugging can opt in via include_content. The user-facing refusal text (the SkipModelRequest response / OutputBlocked exception) is unchanged.

adtyavrdhn · 2026-05-26T11:41:57Z

+_DEFAULT_OUTPUT_RETRY_MESSAGE = 'Output rejected by output guardrail.'
+
+
+@dataclass


Let us use frozen and kw_only

adtyavrdhn · 2026-05-26T11:47:18Z

+
+    replacement: object | None = None
+    """For `replace`, the value substituted for the inspected one."""
+


We should do a __post__init to verify the integrity of this result

adtyavrdhn · 2026-05-26T11:51:09Z

+        finally:
+            for task in (guard_task, handler_task):
+                if not task.done():
+                    task.cancel()


Worth letting Claude check if this needs a shield, anyio level cancellations from the graph can seep in and throw cancellations here which would crash the cancellation. I am still wrapping my head around it so I am not sure if that is 100% possible

Claude here: investigated with a regression test (test_no_dangling_tasks_on_outer_cancellation) that cancels the outer task mid-flight and asserts no leftover guard/handler tasks. It passes without asyncio.shield — the outer cancel is consumed by the asyncio.wait above the finally, so task.cancel() + await asyncio.gather(...) complete cleanly. Adding shield would be defensive but could mask a genuine re-cancellation; the test locks the current behavior in.

adtyavrdhn · 2026-05-26T11:55:03Z

+        if verdict.action == 'block':
+            message = verdict.message or _DEFAULT_INPUT_BLOCK_MESSAGE
+            _trace_block(ctx, direction='input', message=message)
+            raise SkipModelRequest(ModelResponse(parts=[TextPart(content=message)]))


If we do this then the message remains in history and the model can see it in the next turn because we only go through the messages and pick the last prompt for InputGuard?

Claude here: this is intended SkipModelRequest behavior in pydantic-ai — the canned ModelResponse is the assistant's turn, and a multi-turn chat that skips it would be broken. Per the SkipModelRequest docstring, the response is preserved in history. Callers that don't want the refusal carried forward can filter result.all_messages() or start a fresh agent for the next turn.

adtyavrdhn

Structurally great, have a few concerns.

@adtyavrdhn

Address @adtyavrdhn's review on #249. - `InputGuard.get_ordering()` → `position='innermost'` so any message-morphing capability runs first and the guard sees the final prompt the model will receive. - `OutputGuard.get_ordering()` → `position='outermost', wrapped_by= [Instrumentation]` so the guard's block/redact spans are always captured by an enclosing `Instrumentation` span regardless of user list order. - `GuardResult` is `frozen=True, kw_only=True` with a `__post_init__` that rejects field combinations the four-outcome contract does not allow (e.g. `replace` without a replacement). `block` with no message stays valid — the default kicks in at the use site. - `_run_guard` and `after_output_process` dispatch via `match action:` with `assert_never` exhaustiveness guards. - `_trace_block` gates the refusal `message` attribute behind `ctx.trace_include_content`, matching `_trace_redaction` — the message can quote sensitive content from the guarded value. - New tests: ordering declarations, `__post_init__` validation, frozen enforcement, outer-cancellation no-leak regression guard (which confirms `asyncio.shield` around the cleanup `gather` is not needed — the outer cancel is already consumed by `asyncio.wait`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

adtyavrdhn

Approving post discussion

DEENUU1 and others added 11 commits April 24, 2026 12:05

feat: add input and output guardrails

619b692

chore: use single backticks

fcbe3ae

fix: run InputGuard only on the first model request

22f4999

fix: replace list[Any] with Sequence[ModelMessage]

b9b8040

fix: pass raw output to OutputGuard, not str(result.output)

b27c6a3

refactor: organize tests into TestCapabilityName classes

abf5259

fix: drain cancelled tasks in InputGuard parallel finally

dd4c1b8

fix: re-raise task exceptions via await instead of .exception()

57d8e2c

refactor: consolidate parallel cancel/drain into single finally

38c3185

Merge remote-tracking branch 'origin/main' into vstorm-co--vstorm/gua…

876eca1

…rdrails

dsfaccini mentioned this pull request May 21, 2026

feat: add input and output guardrails #219

Closed

5 tasks

dsfaccini added 3 commits May 21, 2026 15:51

test: cover guardrail test helpers for the 100% gate

9d20480

The coverage gate measures test files too; the _prompt_text helper had unreached branches. Drop it and assert on message parts inline.

dsfaccini changed the title ~~feat: add input and output guardrails~~ feat: input and output guardrails (block, redact, retry) May 21, 2026

dsfaccini added 2 commits May 21, 2026 17:33

dsfaccini mentioned this pull request May 22, 2026

feat: prepackaged LLM guardrails + Presidio/moderation integration docs #248

Open

adtyavrdhn reviewed May 22, 2026

View reviewed changes

adtyavrdhn reviewed May 26, 2026

View reviewed changes

adtyavrdhn requested changes May 26, 2026

View reviewed changes

adtyavrdhn mentioned this pull request Jun 1, 2026

feat(guardrails): add InputGuard, OutputGuard, and LLM-based guardrails #264

Closed

adtyavrdhn approved these changes Jun 1, 2026

View reviewed changes

dsfaccini mentioned this pull request Jun 1, 2026

feat: add FileSystem and Shell capabilities #260

Merged



		GuardOutcome = bool \| GuardResult
		"""What a guard callable returns: a bare `bool` (`True` = allow), or a `GuardResult`."""

		_DEFAULT_OUTPUT_RETRY_MESSAGE = 'Output rejected by output guardrail.'


		@dataclass


		replacement: object \| None = None
		"""For `replace`, the value substituted for the inspected one."""

Conversation

dsfaccini commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What it does

Guard outcomes

Observability

Notable design points

Follow-ups

Checks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adtyavrdhn May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adtyavrdhn left a comment

Choose a reason for hiding this comment

Uh oh!

adtyavrdhn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dsfaccini commented May 21, 2026 •

edited

Loading

adtyavrdhn May 26, 2026 •

edited

Loading