Skip to content

refactor(P3a): move per-category behavior flags into a ModalityRegistry#253

Merged
divyasinghds merged 1 commit into
developfrom
refactor/p3a-modality-registry-flags
Jun 15, 2026
Merged

refactor(P3a): move per-category behavior flags into a ModalityRegistry#253
divyasinghds merged 1 commit into
developfrom
refactor/p3a-modality-registry-flags

Conversation

@LukasWodka

Copy link
Copy Markdown
Collaborator

Summary

Structural refactor — phase P3a (epic backend#796; design spike here). First slice of the ModalityRegistry — the single source of truth for per-category behavior.

This slice introduces the modalities/ package and migrates the three hand-maintained frozensets in ingestors/base.py onto it. Smallest, lowest-risk slice — it establishes the pattern; P3b (validators) and P3c (transfer plan) follow.

Changes

  • modalities/spec.pyModalitySpec dataclass. P3a carries the 3 behavior flags (is_file_bearing, is_tabular_family, is_self_supervised); it grows in P3b–P3d with the validator factory, the sidecar transfer plan, and the conventions defaults.
  • modalities/registry.pyREGISTRY (one spec per category) + spec_for() + the three category sets derived from the specs, so they can't drift from the source.
  • ingestors/base.py — deletes the 3 frozenset literals (~50 lines) and imports the derived sets under their previous names. Every category in <set> check — and its None/unknown → False semantics — is unchanged.

Why it's behaviour-preserving

The derived sets equal the old frozensets exactly: 7 file-bearing / 4 tabular / 1 self-supervised. base.py's logic didn't change — only where the sets come from.

  • Full suite: 1033 passed, 1 xfailed, coverage 97.3%, modalities/ 100%.
  • tests/test_category_congruence.py and tests/test_ingestor_base.py (which import _FILE_BEARING_CATEGORIES from base) pass unchanged — the aliased import keeps the name resolvable.
  • The test(e2e): characterization harness pinning clean-ingest behavior per modality #247 characterization goldens are unaffected (no data-path change); CI e2e re-confirms on real MySQL.

The new invariant

tests/test_modality_registry.py pins registry ↔ TaskCategory ↔ schema-enum equality and that the derived sets match the spec flags. This is the structural upgrade over #240's congruence test: once P3 completes, the per-category data has one source the call sites are derived from — making the instance_segmentation half-wired-zombie class (#240 / #99) unrepresentable rather than test-caught.

Scope

P3a is flags only — map_validators, map_file_transfer, and the conventions groupings are untouched (migrated in P3b/P3c/P3d). No conflict with other work.

🤖 Generated with Claude Code

First slice of P3 (backend#796): introduce tracebloc_ingestor/modalities/ —
one ModalitySpec per task category as the single source of truth — and
migrate the three hand-maintained frozensets in ingestors/base.py
(_TABULAR_FAMILY_CATEGORIES, _FILE_BEARING_CATEGORIES,
_SELF_SUPERVISED_CATEGORIES) onto it.

- modalities/spec.py: ModalitySpec dataclass (the 3 behavior flags; grows in
  P3b-P3d with the validator factory, sidecar transfer plan, conventions
  defaults).
- modalities/registry.py: REGISTRY (one spec per category) + spec_for() + the
  three category sets DERIVED from the specs, so they can't drift.
- base.py: deletes the 3 frozenset literals and imports the derived sets under
  their previous names — every `category in <set>` check (and its
  None/unknown -> False semantics) is unchanged.

Behaviour-preserving: the derived sets equal the old frozensets (7 file-bearing
/ 4 tabular / 1 self-supervised). Full suite 1033 passed, 97.3% coverage,
modalities/ 100%. tests/test_modality_registry.py pins the registry <->
TaskCategory <-> schema-enum invariant (making the instance_segmentation
half-wired-zombie class, #240/#99, unrepresentable); the existing
test_category_congruence + base tests pass unchanged via the aliased imports.

P3b (validators) and P3c (transfer plan; folds in semseg / #136) follow.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@LukasWodka

Copy link
Copy Markdown
Collaborator Author

👋 Heads-up — Code review queue is at 14 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

@divyasinghds divyasinghds merged commit c1a3201 into develop Jun 15, 2026
6 checks passed
@divyasinghds divyasinghds deleted the refactor/p3a-modality-registry-flags branch June 15, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants