test(e2e): characterization harness pinning clean-ingest behavior per modality#247
Merged
Merged
Conversation
… modality Adds e2e/test_characterization.py — the safety net for the upcoming structural refactor. For each bundled template dataset it runs the real engine into real MySQL and pins three observable dimensions the refactor must preserve: 1. MySQL rows — count == source manifest; standard-column semantics (data_intent, single non-null ingestor_id, unique non-null data_id); feature-value round-trip for the tabular family (catches type corruption). 2. DEST_PATH file manifest — the sidecar files copied for file-bearing categories (catches the insert-rows-but-copy-no-files silent-half-ingest class). 3. Backend payloads — the records + metadata handed to the APIClient, captured by spying on the APIClient method args (CLIENT_ENV=local short-circuits the HTTP call before the payload is serialised). Covers all 9 cleanly-ingesting modalities. Expectations are DERIVED from the source files (row counts, sidecar listings, schema) — no hardcoded magic values — so the harness stays honest if a template changes. It characterises CLEAN-input behavior, which the in-flight fix PRs (#242-#245) don't change (they only touch malformed-input handling), so the goldens hold on develop today and become the contract the refactor is checked against. Building it already surfaced two behaviors worth pinning: MySQL FLOAT is 32-bit (so a ~1e-6 round-trip delta is correct, not corruption), and the label column is mapped onto the standard `label` column (so it's absent from the physical feature schema). Runs in the e2e job (real MySQL); auto-skipped in the unit suite when no MySQL is reachable (existing e2e/conftest mechanism). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Collaborator
Author
|
👋 Heads-up — Code review queue is at 15 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
divyasinghds
approved these changes
Jun 15, 2026
divyasinghds
pushed a commit
that referenced
this pull request
Jun 15, 2026
…onsoleRenderer (#248) Structural refactor phase P2 (backend#796) — pull presentation out of the ingestion logic. The summary box (success rate, per-channel counts, status banner) was ~114 lines of print + ANSI + emoji living inside the 1,196-line BaseIngestor god class. - New tracebloc_ingestor/reporting.py with ConsoleRenderer.render_summary — the single home for that presentation. The body moved VERBATIM, so the customer-facing output is byte-for-byte identical. - BaseIngestor._log_summary becomes a thin delegate (kept as a method so the existing callers/tests that reference BaseIngestor._log_summary keep working). base.py: -119/+10 lines. - Drops the now-unused BLUE import from base.py. - IngestionSummary stays the pure data object the renderer consumes; the renderer type-hints it via TYPE_CHECKING to avoid a runtime import cycle. Payoff: the presentation is now unit-testable without a DB or a full ingest — tests/test_reporting.py (8 cases, reporting.py 100% covered) locks the output contract, including #234's "dropped records count toward the failure total / disqualify the success banner" behavior. Behaviour-preserving: full unit suite 1026 passed (97.3% coverage); the characterization harness goldens (#247: DB cells / DEST_PATH manifest / backend payloads) are unaffected — they don't assert the summary box — and the CI e2e job re-confirms against real MySQL. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 0 of the refactor — the characterization harness (safety net). Adds
e2e/test_characterization.py: for each bundledtemplates/dataset it runs the real engine into real MySQL and pins the observable behavior the upcoming structural refactor must preserve. Validated locally against MySQL (docker-compose) across all 9 modalities.What it pins (three dimensions, per modality)
data_intent== configured intent, a single non-nullingestor_id, unique non-nulldata_id); and for the tabular family a feature-value round-trip (catches type corruption — leading zeros, NA handling, numeric coercion).instance_segmentationzombie pattern).APIClient. SinceCLIENT_ENV=localshort-circuits the HTTP call before serialising the payload, the harness spies on theAPIClientmethod args (passed regardless of mode) — capturing the engine's intent to send.Covers all 9 cleanly-ingesting modalities (tabular clf/reg, image clf, text clf, time-to-event, time-series, keypoint, object detection, MLM). Expectations are derived from the source files (row counts, sidecar listings, schema) — no hardcoded magic values — so the harness stays honest if a template changes.
Why it's stable on
developtodayIt characterises clean-input behavior. The in-flight fix PRs (#242–#245) only change malformed-input handling (bad cells, NA tokens, traversal filenames, dropped-record accounting), so these goldens hold now and become the contract the refactor is checked against. When those PRs merge, any golden that legitimately shifts documents the intended change.
It already earned its keep
Building it surfaced two behaviors worth pinning explicitly:
FLOATis 32-bit — a ~1e-6 feature round-trip delta is correct storage, not corruption (the round-trip tolerates float32 while still catching real corruption).labelcolumn — so e.g. regression'spriceis correctly absent from the physical feature schema.Operational
Runs in the e2e job (real MySQL service). Auto-skipped in the unit suite when no MySQL is reachable (existing
e2e/conftest.pycollect_ignore_globmechanism), so the defaultpytest+ coverage gate is unaffected. New file only — no conflict with #242–#246.🤖 Generated with Claude Code