refactor(P3a): move per-category behavior flags into a ModalityRegistry#253
Merged
Merged
Conversation
First slice of P3 (backend#796): introduce tracebloc_ingestor/modalities/ — one ModalitySpec per task category as the single source of truth — and migrate the three hand-maintained frozensets in ingestors/base.py (_TABULAR_FAMILY_CATEGORIES, _FILE_BEARING_CATEGORIES, _SELF_SUPERVISED_CATEGORIES) onto it. - modalities/spec.py: ModalitySpec dataclass (the 3 behavior flags; grows in P3b-P3d with the validator factory, sidecar transfer plan, conventions defaults). - modalities/registry.py: REGISTRY (one spec per category) + spec_for() + the three category sets DERIVED from the specs, so they can't drift. - base.py: deletes the 3 frozenset literals and imports the derived sets under their previous names — every `category in <set>` check (and its None/unknown -> False semantics) is unchanged. Behaviour-preserving: the derived sets equal the old frozensets (7 file-bearing / 4 tabular / 1 self-supervised). Full suite 1033 passed, 97.3% coverage, modalities/ 100%. tests/test_modality_registry.py pins the registry <-> TaskCategory <-> schema-enum invariant (making the instance_segmentation half-wired-zombie class, #240/#99, unrepresentable); the existing test_category_congruence + base tests pass unchanged via the aliased imports. P3b (validators) and P3c (transfer plan; folds in semseg / #136) follow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Collaborator
Author
|
👋 Heads-up — Code review queue is at 14 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
This was referenced Jun 15, 2026
Merged
Merged
fix(label): strip whitespace from label values to prevent silent class duplication (issue #261)
#262
Merged
fix(dataset rm): delete staging files from a uid-65532 pod, not jobs-manager (#259)
tracebloc/cli#78
Open
divyasinghds
approved these changes
Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Structural refactor — phase P3a (epic backend#796; design spike here). First slice of the
ModalityRegistry— the single source of truth for per-category behavior.This slice introduces the
modalities/package and migrates the three hand-maintained frozensets iningestors/base.pyonto it. Smallest, lowest-risk slice — it establishes the pattern; P3b (validators) and P3c (transfer plan) follow.Changes
modalities/spec.py—ModalitySpecdataclass. P3a carries the 3 behavior flags (is_file_bearing,is_tabular_family,is_self_supervised); it grows in P3b–P3d with the validator factory, the sidecar transfer plan, and the conventions defaults.modalities/registry.py—REGISTRY(one spec per category) +spec_for()+ the three category sets derived from the specs, so they can't drift from the source.ingestors/base.py— deletes the 3 frozenset literals (~50 lines) and imports the derived sets under their previous names. Everycategory in <set>check — and itsNone/unknown →Falsesemantics — is unchanged.Why it's behaviour-preserving
The derived sets equal the old frozensets exactly: 7 file-bearing / 4 tabular / 1 self-supervised. base.py's logic didn't change — only where the sets come from.
modalities/100%.tests/test_category_congruence.pyandtests/test_ingestor_base.py(which import_FILE_BEARING_CATEGORIESfrom base) pass unchanged — the aliased import keeps the name resolvable.The new invariant
tests/test_modality_registry.pypins registry ↔TaskCategory↔ schema-enum equality and that the derived sets match the spec flags. This is the structural upgrade over #240's congruence test: once P3 completes, the per-category data has one source the call sites are derived from — making theinstance_segmentationhalf-wired-zombie class (#240 / #99) unrepresentable rather than test-caught.Scope
P3a is flags only —
map_validators,map_file_transfer, and theconventionsgroupings are untouched (migrated in P3b/P3c/P3d). No conflict with other work.🤖 Generated with Claude Code