Skip to content

feat(dataset): add CreateML format support to DetectionDataset#2284

Open
madhavcodez wants to merge 4 commits into
roboflow:developfrom
madhavcodez:feat/dataset-createml
Open

feat(dataset): add CreateML format support to DetectionDataset#2284
madhavcodez wants to merge 4 commits into
roboflow:developfrom
madhavcodez:feat/dataset-createml

Conversation

@madhavcodez
Copy link
Copy Markdown
Contributor

Description

Adds CreateML object-detection format support to DetectionDataset, with
from_createml() for loading and as_createml() for exporting. supervision
already supports COCO, YOLO, and Pascal VOC; this fills the remaining common
format with a symmetric loader/exporter that mirrors those implementations.

Type of Change

  • ✨ New feature (non-breaking change which adds functionality)

Motivation and Context

CreateML is a widely used object-detection annotation format (Apple Create ML,
and one of Roboflow's dataset export options). DetectionDataset can already
round-trip COCO, YOLO, and Pascal VOC, but not CreateML, so users exporting in
that format have to convert manually before loading into supervision. This adds
first-class support following the existing from_<format> / as_<format>
convention.

No existing tracking issue — opening as a feature addition; happy to file one if
the maintainers prefer.

Changes Made

  • src/supervision/dataset/formats/createml.py — new module:
    load_createml_annotations, save_createml_annotations, and the helpers
    createml_annotations_to_detections / detections_to_createml_annotations.
    Boxes use CreateML's pixel-space centre + width/height and are converted
    to/from xyxy. Class names are inferred from the labels present in the file
    and assigned sorted, zero-based ids. Image paths are validated against the
    images directory (rejecting .. traversal, absolute paths, the directory
    itself, and directory targets), matching the COCO loader's protection.
  • src/supervision/dataset/core.pyDetectionDataset.from_createml() and
    DetectionDataset.as_createml(), plus the format import. Method docstrings
    render automatically in the API docs.
  • tests/dataset/formats/test_createml.py — unit tests for the conversion
    helpers, loader, exporter, save→load round-trip (integer and float
    coordinates), global class-id consistency across images, and the path-safety
    guards.

Testing

  • I have tested this code locally
  • I have added unit tests that prove my feature works
  • All new and existing tests pass

Local run: pytest tests/dataset/ passes (including the new test_createml.py),
and ruff check / ruff format --check are clean on the changed files.

@madhavcodez madhavcodez requested a review from SkalskiP as a code owner May 31, 2026 18:22
@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2026

Codecov Report

❌ Patch coverage is 86.95652% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 79%. Comparing base (7d22596) to head (0d2ae43).

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2284   +/-   ##
=======================================
  Coverage       79%     79%           
=======================================
  Files           66      67    +1     
  Lines         8569    8638   +69     
=======================================
+ Hits          6806    6866   +60     
- Misses        1763    1772    +9     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add DetectionDataset.from_createml and as_createml plus a new
formats/createml.py module (load/save helpers), mirroring the existing
COCO, YOLO, and Pascal VOC format support. Boxes use CreateML's
pixel-space centre + width/height and are converted to/from xyxy; class
names are inferred from the labels present in the file. Image paths are
validated against the images directory, matching the COCO loader's
path-traversal protection. Adds unit tests for the helpers, loader,
exporter, integer/float round-trip, global class-id consistency, and the
path-safety guards.
Cast the JSON payload read via read_json_file to list[CreateMLDict] and
the data passed to save_json_file to dict[str, Any] (both helpers are
annotated for dict only), and iterate xyxy/class_id arrays directly so
the class_id None-guard narrows the loop variable for mypy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant