Skip to content

Release v0.3.10 (ingestion hardening + path-traversal fix + single-label preflight)#259

Merged
divyasinghds merged 1 commit into
developfrom
release/v0.3.10
Jun 15, 2026
Merged

Release v0.3.10 (ingestion hardening + path-traversal fix + single-label preflight)#259
divyasinghds merged 1 commit into
developfrom
release/v0.3.10

Conversation

@divyasinghds

@divyasinghds divyasinghds commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Release v0.3.10. Bumps __version__ 0.3.9 → 0.3.10. Bundles the fixes merged to develop since v0.3.9.

Security

Ingestion correctness & accounting

Single-label classification preflight (#251#252)

  • New LabelDiversityValidator catches unlearnable single-class datasets at preflight and surfaces the backend reason — instead of a misleading "Backend failed to prepare the dataset" after rows land in MySQL.
  • Hardened through six Cursor Bugbot rounds so the validator reads the label column identically to CSVIngestor: fail-closed on read errors, pandas-native header parse, build_csv_na_values NA sentinels, string-dtype pin, full (unstripped) schema, and whitespace-insensitive column resolution.

UX / CLI / packaging

Test plan

  • pytest green on release/v0.3.10 (local: 1043 passed, 1 xfailed)
  • CI green
  • python setup.py sdist bdist_wheel builds cleanly
  • pip install dist/tracebloc_ingestor-0.3.10-*.whl then python -c "import tracebloc_ingestor; print(tracebloc_ingestor.__version__)"0.3.10

🤖 Generated with Claude Code


Note

High Risk
Changes touch security-sensitive path joining, core ingest/validate/DB/API paths, and packaging/CI install graphs across many modalities—high blast radius despite strong test coverage.

Overview
v0.3.10 bumps the package version and bundles correctness, security, and UX fixes with a large expansion of tests and CI wiring.

Security & file handling: Manifest filename/mask_id values are joined via _safe_join so reads/writes cannot escape SRC_PATH/DEST_PATH (#239).

Ingestion behavior: Dropped or invalid records and empty CSVs fail the run instead of silent success; JSON read/validate paths align with CSV (fail-fast, single-dict JSON, filter non-objects). Shared NA coercion and int64 overflow checks keep validator and ingest layers in agreement. get_table_schema maps reflected MySQL dialect types correctly; CHAR(N) is supported in DDL. Mid-batch DB failures send only inserted rows to the API; skipped_records count toward has_failures.

Preflight: New LabelDiversityValidator rejects single-class classification datasets locally; prepare_dataset stashes last_prepare_error for clearer errors (#251). instance_segmentation is removed from the schema; test_category_congruence guards full dispatch wiring.

Templates & packaging: All template scripts delegate to exported run_ingestion; summary UI moves to ConsoleRenderer. Runtime deps stay in requirements.txt (adds explicit numpy); test/lint tools move to requirements-dev.txt with setup.py filtering comment lines. CI installs requirements-dev.txt.

Tests: New e2e characterization harness over bundled modalities; expanded unit/e2e coverage for the above; CLI schema errors prefer rule descriptions over raw JSON Schema mechanics (#254).

Reviewed by Cursor Bugbot for commit aabfa20. Bugbot is set up for automated code reviews on this repo. Configure here.

Release bundles the fixes merged to develop since 0.3.9 (#230#254):
- Ingestion accounting: dropped records fail the run; JSON read-layer fails
  fast (#230, #234, #235)
- Coercion: single source of truth for NA policy + int64 range (#236, #237)
- Security: block path traversal via manifest filename/mask_id (#239)
- UX papercuts: delimiter hint, NUL truncation, table-name message, Config
  numeric coercion (#238)
- Schema/DB: real MySQL column types, CHAR(N) mapping, drop
  instance_segmentation from enum, empty-CSV fast fail (#240, #241, #249, #250)
- DataValidator: accept single-dict / filter non-dict JSON (#232, #233)
- Single-label classification caught at preflight + friendly backend reason,
  with the full bugbot-hardened LabelDiversityValidator (#251, #252)
- CLI: schema descriptions surfaced in validation errors (#254)
- Reporting: ConsoleRenderer extraction; packaging split (#248, #246)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@divyasinghds divyasinghds self-assigned this Jun 15, 2026
@LukasWodka

Copy link
Copy Markdown
Collaborator

👋 Heads-up — Code review queue is at 16 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

aptracebloc
aptracebloc previously approved these changes Jun 15, 2026
@divyasinghds divyasinghds changed the base branch from master to develop June 15, 2026 09:27
@divyasinghds divyasinghds dismissed aptracebloc’s stale review June 15, 2026 09:27

The base branch was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants