test(e2e): characterization harness pinning clean-ingest behavior per modality by LukasWodka · Pull Request #247 · tracebloc/data-ingestors

LukasWodka · 2026-06-12T21:55:53Z

Summary

Phase 0 of the refactor — the characterization harness (safety net). Adds e2e/test_characterization.py: for each bundled templates/ dataset it runs the real engine into real MySQL and pins the observable behavior the upcoming structural refactor must preserve. Validated locally against MySQL (docker-compose) across all 9 modalities.

What it pins (three dimensions, per modality)

MySQL rows — row count == source manifest; standard-column semantics (data_intent == configured intent, a single non-null ingestor_id, unique non-null data_id); and for the tabular family a feature-value round-trip (catches type corruption — leading zeros, NA handling, numeric coercion).
DEST_PATH file manifest — exactly the sidecar files copied for file-bearing categories. This directly catches the insert-rows-but-copy-no-files silent-half-ingest class (the instance_segmentation zombie pattern).
Backend payloads — the records + metadata the engine hands the APIClient. Since CLIENT_ENV=local short-circuits the HTTP call before serialising the payload, the harness spies on the APIClient method args (passed regardless of mode) — capturing the engine's intent to send.

Covers all 9 cleanly-ingesting modalities (tabular clf/reg, image clf, text clf, time-to-event, time-series, keypoint, object detection, MLM). Expectations are derived from the source files (row counts, sidecar listings, schema) — no hardcoded magic values — so the harness stays honest if a template changes.

Why it's stable on `develop` today

It characterises clean-input behavior. The in-flight fix PRs (#242–#245) only change malformed-input handling (bad cells, NA tokens, traversal filenames, dropped-record accounting), so these goldens hold now and become the contract the refactor is checked against. When those PRs merge, any golden that legitimately shifts documents the intended change.

It already earned its keep

Building it surfaced two behaviors worth pinning explicitly:

MySQL FLOAT is 32-bit — a ~1e-6 feature round-trip delta is correct storage, not corruption (the round-trip tolerates float32 while still catching real corruption).
The label column is mapped onto the standard label column — so e.g. regression's price is correctly absent from the physical feature schema.

Operational

Runs in the e2e job (real MySQL service). Auto-skipped in the unit suite when no MySQL is reachable (existing e2e/conftest.py collect_ignore_glob mechanism), so the default pytest + coverage gate is unaffected. New file only — no conflict with #242–#246.

🤖 Generated with Claude Code

… modality Adds e2e/test_characterization.py — the safety net for the upcoming structural refactor. For each bundled template dataset it runs the real engine into real MySQL and pins three observable dimensions the refactor must preserve: 1. MySQL rows — count == source manifest; standard-column semantics (data_intent, single non-null ingestor_id, unique non-null data_id); feature-value round-trip for the tabular family (catches type corruption). 2. DEST_PATH file manifest — the sidecar files copied for file-bearing categories (catches the insert-rows-but-copy-no-files silent-half-ingest class). 3. Backend payloads — the records + metadata handed to the APIClient, captured by spying on the APIClient method args (CLIENT_ENV=local short-circuits the HTTP call before the payload is serialised). Covers all 9 cleanly-ingesting modalities. Expectations are DERIVED from the source files (row counts, sidecar listings, schema) — no hardcoded magic values — so the harness stays honest if a template changes. It characterises CLEAN-input behavior, which the in-flight fix PRs (#242-#245) don't change (they only touch malformed-input handling), so the goldens hold on develop today and become the contract the refactor is checked against. Building it already surfaced two behaviors worth pinning: MySQL FLOAT is 32-bit (so a ~1e-6 round-trip delta is correct, not corruption), and the label column is mapped onto the standard `label` column (so it's absent from the physical feature schema). Runs in the e2e job (real MySQL); auto-skipped in the unit suite when no MySQL is reachable (existing e2e/conftest mechanism). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

LukasWodka · 2026-06-12T21:56:52Z

👋 Heads-up — Code review queue is at 15 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

client#254 — Sync develop → main for v1.7.1 chart release (egress-enforcement preflight helm test, inert) · author: @saadqbal · no reviewer assigned
client#255 — fix(egress-proxy): enforcement check probes TCP reachability, not HTTP code [image_validator falsely rejects images when target_size matches exactly #104] · author: @saadqbal · no reviewer assigned
data-ingestors#233 — fix(validators): accept single-dict JSON top-level shape (issue DataValidator rejects single-dict JSON inputs that JSONIngestor.read_data is designed to handle #232) · author: @divyasinghds · no reviewer assigned
data-ingestors#240 — fix: remove instance_segmentation zombie category, add dispatch-site congruence guard · author: @LukasWodka · no reviewer assigned
data-ingestors#241 — fix: report real MySQL column types from get_table_schema (dialect-type reflection) · author: @LukasWodka · no reviewer assigned
data-ingestors#242 — fix(coercion): single source of truth for NA policy + int64 range (bug: huge integer in INT column raises cryptic numpy "ufunc 'isinf' not supported" instead of a clean overflow error #236, bug: NA/null/None tokens crash non-tabular CSV ingest but are NULL for tabular — validator passes them (validate-pass → ingest-crash) #237) · author: @LukasWodka · no reviewer assigned
data-ingestors#243 — fix(accounting): dropped records fail the run; JSON read-layer fails fast (bug: records dropped by skip-paths don't fail the run — exit 0 / Job 'Succeeded' + self-contradictory summary (follow-up to #99/#230) #234, bug: CSV aborts the whole ingest on one bad cell while JSON silently skips the row — make the policy consistent (follow-up to #189) #235) · author: @LukasWodka · no reviewer assigned
data-ingestors#244 — fix(security): block path traversal via manifest filename/mask_id (security/bug: path traversal via unsanitised 'filename' column in file_transfer (read + write outside SRC_PATH/DEST_PATH) #239) · author: @LukasWodka · no reviewer assigned
data-ingestors#245 — fix(ux): four ingestion validation/UX papercuts (bug: ingestion validation/UX papercuts (delimiter hint, NUL truncation, table-name message, Config numeric coercion) #238) · author: @LukasWodka · no reviewer assigned
data-ingestors#246 — refactor(packaging): split runtime vs dev requirements; declare numpy · author: @LukasWodka · no reviewer assigned

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

…onsoleRenderer (#248) Structural refactor phase P2 (backend#796) — pull presentation out of the ingestion logic. The summary box (success rate, per-channel counts, status banner) was ~114 lines of print + ANSI + emoji living inside the 1,196-line BaseIngestor god class. - New tracebloc_ingestor/reporting.py with ConsoleRenderer.render_summary — the single home for that presentation. The body moved VERBATIM, so the customer-facing output is byte-for-byte identical. - BaseIngestor._log_summary becomes a thin delegate (kept as a method so the existing callers/tests that reference BaseIngestor._log_summary keep working). base.py: -119/+10 lines. - Drops the now-unused BLUE import from base.py. - IngestionSummary stays the pure data object the renderer consumes; the renderer type-hints it via TYPE_CHECKING to avoid a runtime import cycle. Payoff: the presentation is now unit-testable without a DB or a full ingest — tests/test_reporting.py (8 cases, reporting.py 100% covered) locks the output contract, including #234's "dropped records count toward the failure total / disqualify the success banner" behavior. Behaviour-preserving: full unit suite 1026 passed (97.3% coverage); the characterization harness goldens (#247: DB cells / DEST_PATH manifest / backend payloads) are unaffected — they don't assert the summary box — and the CI e2e job re-confirms against real MySQL. Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

divyasinghds approved these changes Jun 15, 2026

View reviewed changes

divyasinghds merged commit deca72e into develop Jun 15, 2026
6 checks passed

divyasinghds deleted the test/characterization-harness branch June 15, 2026 06:39

LukasWodka mentioned this pull request Jun 15, 2026

refactor(P2): extract ingestion-summary renderer into reporting.ConsoleRenderer #248

Merged

This was referenced Jun 15, 2026

refactor(P3a): move per-category behavior flags into a ModalityRegistry #253

Merged

refactor(P3c): move sidecar-transfer factories into the ModalityRegistry #256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): characterization harness pinning clean-ingest behavior per modality#247

test(e2e): characterization harness pinning clean-ingest behavior per modality#247
divyasinghds merged 1 commit into
developfrom
test/characterization-harness

LukasWodka commented Jun 12, 2026

Uh oh!

LukasWodka commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LukasWodka commented Jun 12, 2026

Summary

What it pins (three dimensions, per modality)

Why it's stable on develop today

It already earned its keep

Operational

Uh oh!

LukasWodka commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why it's stable on `develop` today