Skip to content

feat: add show_progress to dataset load/save operations#2275

Open
murillo-ro-silva wants to merge 3 commits into
roboflow:developfrom
murillo-ro-silva:feat/dataset-progress-bars
Open

feat: add show_progress to dataset load/save operations#2275
murillo-ro-silva wants to merge 3 commits into
roboflow:developfrom
murillo-ro-silva:feat/dataset-progress-bars

Conversation

@murillo-ro-silva
Copy link
Copy Markdown

@murillo-ro-silva murillo-ro-silva commented May 25, 2026

Summary

Closes #183.

Note: PR #2181 by @satishkc7 addresses the same issue but currently has merge conflicts and has been inactive since March. This is a fresh implementation with the same approach — happy to close this if #2181 is preferred or to help resolve its conflicts instead.

Adds an optional show_progress: bool = False parameter to all DetectionDataset loading and saving methods. When enabled, a tqdm progress bar is displayed during time-consuming operations.

Requirements from maintainers (comment)

  • Progress bar works in terminal and notebook (tqdm.auto)
  • Progress bar is optional (show_progress=False by default)

Supported methods

Method Description
DetectionDataset.from_coco() Loading COCO annotations
DetectionDataset.from_yolo() Loading YOLO annotations
DetectionDataset.from_pascal_voc() Loading Pascal VOC annotations
DetectionDataset.as_coco() Saving to COCO format
DetectionDataset.as_yolo() Saving to YOLO format
DetectionDataset.as_pascal_voc() Saving to Pascal VOC format
save_dataset_images() Saving images to directory

Usage

import supervision as sv

# Loading with progress bar
ds = sv.DetectionDataset.from_yolo(
    images_directory_path="train/images",
    annotations_directory_path="train/labels",
    data_yaml_path="data.yaml",
    show_progress=True,
)

# Saving with progress bar
ds.as_coco(
    images_directory_path="output/images",
    annotations_path="output/annotations.json",
    show_progress=True,
)

Design decisions

  • Defaults to False — fully backward compatible, no behavior change for existing code
  • Uses tqdm.auto — works in both terminal and Jupyter notebooks (same pattern as supervision.utils.video)
  • tqdm is already a dependency — no new dependencies added

Test plan

  • 13 new tests in tests/dataset/test_progress.py
  • All 157 existing dataset tests pass with no regressions
  • pre-commit run --all-files passes (ruff, mypy, codespell, etc.)
  • Verified show_progress=False (default) keeps tqdm disabled
  • Verified show_progress=True enables tqdm for each format
  • Backward compatibility: existing calls without show_progress work unchanged

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 25, 2026

CLA assistant check
All committers have signed the CLA.

@satishkc7
Copy link
Copy Markdown
Contributor

Just a heads up - PR #2181 (my earlier implementation) has been rebased onto current develop and is now conflict-free. Happy to close #2181 if you prefer this fresh implementation, or defer to maintainer preference on which to merge.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79%. Comparing base (fb2dec9) to head (5d1ba15).

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2275   +/-   ##
=======================================
+ Coverage       78%     79%   +1%     
=======================================
  Files           66      66           
  Lines         8410    8415    +5     
=======================================
+ Hits          6552    6636   +84     
+ Misses        1858    1779   -79     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional show_progress: bool = False flag to Supervision’s DetectionDataset import/export pathways (COCO/YOLO/Pascal VOC) and related helpers, enabling tqdm.auto progress bars for long-running dataset operations while keeping default behavior unchanged.

Changes:

  • Thread show_progress through DetectionDataset.from_* / as_* methods and underlying format load/save functions.
  • Wrap load/save loops with tqdm(..., disable=not show_progress, desc=...) for consistent terminal/notebook progress display.
  • Add a dedicated test suite validating that tqdm is disabled by default and enabled when requested.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/dataset/test_progress.py Adds tests asserting tqdm is disabled by default and enabled when show_progress=True.
src/supervision/dataset/utils.py Adds show_progress to save_dataset_images() and wraps image saving in tqdm.
src/supervision/dataset/formats/yolo.py Adds show_progress to YOLO load/save annotation functions and wraps loops in tqdm.
src/supervision/dataset/formats/pascal_voc.py Adds show_progress to Pascal VOC loader and wraps the image loop in tqdm.
src/supervision/dataset/formats/coco.py Adds show_progress to COCO load/save functions and wraps iteration in tqdm.
src/supervision/dataset/core.py Exposes show_progress on DetectionDataset public APIs and propagates it to helpers/format functions.

Comment on lines +114 to +117
with patch(
"supervision.dataset.formats.yolo.tqdm",
wraps=__import__("tqdm").auto.tqdm,
) as mock_tqdm:
Comment on lines +268 to +275
) as mock_tqdm:
ds.as_pascal_voc(
images_directory_path=os.path.join(out_dir, "images"),
annotations_directory_path=os.path.join(out_dir, "annotations"),
show_progress=True,
)
call_kwargs = mock_tqdm.call_args
assert call_kwargs[1]["disable"] is False
@Borda
Copy link
Copy Markdown
Member

Borda commented May 26, 2026

Murillo Rodrigues seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.

@murillo-ro-silva seems your commits are authorized by a different mail that i primary for GH, could you pls share prinscreem that you signed CLA 🦝

Add `show_progress` parameter to all dataset loading and saving methods.
When enabled, displays a tqdm progress bar during time-consuming
operations like loading/saving COCO, YOLO, and Pascal VOC datasets.

- Defaults to `False` for full backward compatibility
- Uses `tqdm.auto` for terminal and Jupyter notebook support
- Includes 13 new tests covering all formats and backward compatibility
@murillo-ro-silva murillo-ro-silva force-pushed the feat/dataset-progress-bars branch from 5d1ba15 to 0f080fc Compare May 27, 2026 18:32
@murillo-ro-silva murillo-ro-silva force-pushed the feat/dataset-progress-bars branch from 16dbd2a to 79d1926 Compare May 27, 2026 18:38
@murillo-ro-silva
Copy link
Copy Markdown
Author

Hi @Borda, thanks for catching that!

The merge commit had a different author name/email (Murillo Rodrigues <murillo.rodrigues@branchingminds.com>) from my local git config, which didn't link to my GitHub account. I've force-pushed with the corrected author (murillo-ro-silva <murillo@datoga.io>) on all commits now.

The CLA assistant confirms all committers have signed: CLA assistant check

Here's the signed CLA confirmation page: https://cla-assistant.io/roboflow/supervision?pullRequest=2275

Let me know if you need anything else!

@murillo-ro-silva
Copy link
Copy Markdown
Author

murillo-ro-silva commented May 27, 2026

Just a heads up - PR #2181 (my earlier implementation) has been rebased onto current develop and is now conflict-free. Happy to close #2181 if you prefer this fresh implementation, or defer to maintainer preference on which to merge.

Hey @satishkc7, thanks for the heads up and for being so collaborative about this! 🙏

I actually built on top of the direction you started with #2181, your earlier work helped shape the approach here. I think the best path is to let the maintainers decide which one fits best for the project. Either way, the important thing is that #183 gets resolved!

Appreciate the sportsmanship 🤝

@kounelisagis
Copy link
Copy Markdown

@murillo-ro-silva, ClassificationDataset.from_folder_structure and as_folder_structure look like the same bare-loop pattern this PR is fixing. #183 reads as broad enough to include them - worth pulling in here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show Progress in time consuming tasks

6 participants