refactor(backends): move generate_from_raw hook firing into Backend base class by ajbozarth · Pull Request #1264 · generative-computing/mellea

ajbozarth · 2026-06-12T20:50:00Z

Pull Request

Issue

Fixes #1183. Folds the additive half of #1218 Part 1.

Description

Brings the raw (batch) generation path in line with the chat path's
existing wrapper pattern: Backend.generate_from_raw becomes a @final
wrapper that owns hook firing, and backends implement only the new
_generate_from_raw abstract. The three generation_batch_* hooks now
fire from one place instead of being duplicated inline in all five
backends.

Reviewer call-outs:

Tuple return on _generate_from_raw. Backends return
tuple[list[ModelOutputThunk], dict | None] — (results, usage). The
wrapper unpacks the tuple, fires post_call with the aggregate, and
returns just results to callers (public signature unchanged).
Considered moving usage aggregation into the wrapper, but openai and
litellm only report whole-batch usage at the response root, so the
impl is the only place that knows the right shape. Matches the
existing _generate_from_context -> tuple[MOT, Context] precedent.
Standardized self._model_id and self._provider instead of new
abstracts. The wrapper needs model and provider for hook
payloads (pre_call has no MOT yet; error may have no MOT). Considered
adding _provider_name / _resolved_model_id() abstract members.
Instead, finished the partial convention 3 of 5 backends already
used: every backend now sets self._model_id: str and
self._provider: str in __init__. Inlines model-id resolution into
ollama and huggingface init bodies and deletes the now-unused
_get_*_model_id helpers. Renames LocalHFBackend._hf_model_id to
_model_id for consistency. ~10 chat-path setter sites switch to
reading these attributes — no value changes, just one canonical
source per backend.
BaseException on the raw-path error wrapper, matching chat. The
chat-path wrapper's BaseException was added in refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181; the raw path
was tentatively left at except Exception. The same gap exists:
synchronous KeyboardInterrupt / asyncio.CancelledError /
SystemExit inside the impl currently bypass the error hook
silently. This PR aligns to BaseException. Behavior change: raw
cancellation/interrupts now fire generation_batch_error before
propagating.
Batch pre_call payload mutations now propagate. When the batch
hooks were added in fix(backends): unify raw-path token usage on mot.generation.usage; guard eval_count=None #1218 they were wired for telemetry-only
observation: a plugin could mutate model_options / format /
tool_calls on the pre_call payload, and the chat path would honor
those mutations, but the batch path silently dropped them. Same
plugin, same intent, different result depending on which API the
user reached. The wrapper now captures the post-hook payload and
reassigns the locals before calling _generate_from_raw, identical
to the chat-path idiom. Adds model_options to
GenerationBatchPreCallPayload (it didn't exist on the batch
payload at all). Test inverted from _are_not_propagated to
_propagates, mirroring the chat-path test.
Folded the additive half of fix(backends): unify raw-path token usage on mot.generation.usage; guard eval_count=None #1218 Part 1. Backends with per-MOT
token counts (ollama, huggingface, watsonx) now also populate
mot.generation.usage per MOT. Backends with whole-batch-only usage
(openai, litellm) leave per-MOT mot.generation.usage = None and
surface the aggregate via the tuple return. Null-token policy:
all-or-nothing — if any of prompt_tokens / completion_tokens /
total_tokens cannot be determined for a MOT, that MOT's
generation.usage stays None (matches dict | None typing; ollama
docs document Optional[int] as "not yet available" rather than
"zero"). Watsonx's previous undocumented or 0 coerce switches to
the same policy.

The existing mot._meta["usage"] writes are preserved everywhere —
budget_forcing_alg.py still reads from them. fix(backends): unify raw-path token usage on mot.generation.usage; guard eval_count=None #1218's remaining
scope (consumer-side reads in budget_forcing_alg.py and the
mot._meta["usage"] deprecation path) stays open for follow-up.

Originally folded in to support generic usage aggregation in the
wrapper; kept after the openai/litellm pivot to call-out 1's tuple
return because the per-MOT writes are still the right thing for the
3 backends whose APIs expose them.
Custom-backend doc updated.
docs/docs/community/building-extensions.md was teaching users to
override the public generate_from_context / generate_from_raw
directly; those are @final now. The example was updated to
implement _generate_from_context / _generate_from_raw and set
_model_id / _provider.
Test coverage. Adds TestGenerationBatchHookCallSites in
test/plugins/test_hook_call_sites.py mirroring the chat-path
firing-site tests (9 tests). _MockBackend extended to cover both
paths with hardcoded behavior; error case via inline subclass per
the existing RecordingBackend(_MockBackend) pattern. Mocks in
test/stdlib/test_streaming.py, test/core/test_logger_plugin_hooks.py,
and test/stdlib/frameworks/test_react_framework.py updated to the
new abstract contract.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code was added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

Adding a new component, requirement, sampling strategy, or tool?

If your PR adds or modifies one of the types below, check the matching box. A checklist of type-specific review items will be posted as a comment.

Component
Requirement
Sampling Strategy
Tool

NOTE: Please ensure you have an issue that has been acknowledged by a core contributor and routed you to open a pull request against this repository. Otherwise, please open an issue before continuing with this pull request.

…ase class Closes generative-computing#1183. Folds the additive half of generative-computing#1218 Part 1. - `generate_from_raw` becomes a `@final` wrapper on `Backend` that owns pre/post/error hook firing; backends implement `_generate_from_raw` returning `(results, usage)`. - All five backends drop the inline gen_id, latency timing, and three hook-fire blocks; wrapper catches `BaseException` (matches chat-path generative-computing#1181). - `generation_batch_pre_call` payload mutations now propagate to the backend impl (model_options/format/tool_calls), matching the chat path. Adds `model_options` field to `GenerationBatchPreCallPayload`. Closes a gap from generative-computing#1218 where the batch hook was wired for telemetry observation only. - Standardizes `self._model_id` and `self._provider` on every backend. Inlines model-id resolution into ollama/huggingface `__init__`s and deletes the `_get_*_model_id` helpers; renames `_hf_model_id` to `_model_id`. - Backends with per-MOT token counts (ollama, hf, watsonx) now populate `mot.generation.usage` per MOT; openai/litellm leave it `None` since their APIs only report whole-batch usage. `mot._meta["usage"]` writes preserved for generative-computing#1218's follow-up. - Adds `TestGenerationBatchHookCallSites` mirroring the chat-path tests; updates the custom-backend doc snippet. Assisted-by: Claude Code Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>

ajbozarth requested review from a team, jakelorocco and nrfulton as code owners June 12, 2026 20:50

ajbozarth self-assigned this Jun 12, 2026

ajbozarth requested a review from planetf1 June 12, 2026 20:50

github-actions Bot added the enhancement New feature or request label Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(backends): move generate_from_raw hook firing into Backend base class#1264

refactor(backends): move generate_from_raw hook firing into Backend base class#1264
ajbozarth wants to merge 1 commit into
generative-computing:mainfrom
ajbozarth:refactor/1183-move-generate-from-raw-hook

ajbozarth commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajbozarth commented Jun 12, 2026

Pull Request

Issue

Description

Testing

Attribution

Adding a new component, requirement, sampling strategy, or tool?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant