Add adaptive TP/FP/FN validation mosaic export by K-saif · Pull Request #2271 · roboflow/supervision

K-saif · 2026-05-25T10:11:12Z

Before submitting

Self-reviewed the code
Updated documentation
Added/updated tests
All tests pass locally

Description

Adds an optional validation visualization flow for exporting per-image GT/TP/FP/FN mosaics during confusion-matrix benchmarking.

Generated mosaics improve qualitative error analysis by providing a side-by-side comparison of:

Ground Truth
True Positives
False Positives
False Negatives

Type of Change

✨ New feature
📝 Documentation update
🧪 Test update

Motivation and Context

Current benchmarking utilities provide aggregate metrics but limited per-image visual inspection support.

This feature adds optional qualitative visualization exports to simplify:

model debugging
localization error inspection
false positive analysis
false negative analysis

without affecting existing benchmark behavior.

Changes Made

added optional save_result_images and save_directory_path support in ConfusionMatrix.benchmark(...)
added per-image GT/TP/FP/FN mosaic export under a result/ directory
added class-consistent bounding-box coloring across panels
added adaptive label and bounding-box scaling based on image resolution
added white outer borders and center dividers for readability
updated benchmark documentation
added regression coverage in test_detection.py

Testing

I have tested this code locally
I have added unit tests that prove the feature works
All new and existing tests pass

Additional Notes

Existing benchmark behavior remains unchanged unless save_result_images=True is enabled.

CLAassistant · 2026-05-25T10:11:22Z

All committers have signed the CLA.

codecov · 2026-05-26T16:42:25Z

Codecov Report

❌ Patch coverage is 86.48649% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 78%. Comparing base (befdb7c) to head (4285062).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #2271    +/-   ##
========================================
  Coverage       78%     78%            
========================================
  Files           66      66            
  Lines         8410    8520   +110     
========================================
+ Hits          6552    6653   +101     
- Misses        1858    1867     +9

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds an optional qualitative visualization export to ConfusionMatrix.benchmark(...), writing per-image 2x2 mosaics (GT / TP / FP / FN) to disk to aid error analysis during benchmarking.

Changes:

Added save_result_images and save_directory_path options to ConfusionMatrix.benchmark(...) to export validation mosaics under result/.
Implemented per-image panel rendering with class-consistent box coloring, labels, and grid styling.
Added a regression test for image export and updated benchmarking documentation to mention the feature.

Assessment (n/5):

Code quality: 3/5
Tests: 2/5
Docs: 4/5

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`src/supervision/metrics/detection.py`	Adds the visualization export pipeline and new `benchmark(...)` parameters to save per-image GT/TP/FP/FN mosaics.
`tests/metrics/test_detection.py`	Adds a regression test that exercises `save_result_images=True` and checks a saved mosaic is created/readable.
`docs/how_to/benchmark_a_model.md`	Documents the new `save_result_images` option and what gets written to `result/`.

K-saif · 2026-05-27T07:50:49Z

+    targets: Detections,
+    conf_threshold: float,
+    iou_threshold: float,
+) -> tuple[Detections, Detections, Detections]:


Implemented in commit

K-saif · 2026-05-27T08:21:19Z

+    fp_indices: list[int] = []
+    fn_indices: list[int] = []
+
+    class_ids = np.unique(
+        np.concatenate((filtered_predictions.class_id, targets.class_id)).astype(int)
+    )
+
+    for class_id in class_ids:
+        prediction_indices = np.flatnonzero(filtered_predictions.class_id == class_id)
+        target_indices = np.flatnonzero(targets.class_id == class_id)
+
+        if len(prediction_indices) == 0:
+            fn_indices.extend(target_indices.tolist())
+            continue
+
+        if len(target_indices) == 0:
+            fp_indices.extend(prediction_indices.tolist())
+            continue
+
+        if filtered_predictions.confidence is None:
+            ordered_prediction_indices = prediction_indices
+        else:
+            prediction_confidence = filtered_predictions.confidence[prediction_indices]
+            ordered_prediction_indices = prediction_indices[
+                np.argsort(prediction_confidence)[::-1]
+            ]
+
+        iou_matrix = box_iou_batch(
+            filtered_predictions.xyxy[ordered_prediction_indices],
+            targets.xyxy[target_indices],
+        )
+        matched_targets: npt.NDArray[np.bool_] = np.zeros(
+            len(target_indices), dtype=bool
+        )
+
+        for row_index, prediction_index in enumerate(ordered_prediction_indices):
+            available_target_indices = np.flatnonzero(~matched_targets)
+            if len(available_target_indices) == 0:
+                fp_indices.append(prediction_index)
+                continue
+
+            best_available_target = available_target_indices[
+                np.argmax(iou_matrix[row_index, available_target_indices])
+            ]
+            best_iou = iou_matrix[row_index, best_available_target]
+
+            if best_iou >= iou_threshold:
+                tp_indices.append(prediction_index)
+                matched_targets[best_available_target] = True
+            else:
+                fp_indices.append(prediction_index)
+
+        fn_indices.extend(target_indices[~matched_targets].tolist())


Updated the TP/FP/FN matching flow to use global IoU-based matching with same-class prioritization, aligning the visualization behavior more closely with ConfusionMatrix.evaluate_detection_batch.

K-saif · 2026-05-27T08:22:30Z

+            best_available_target = available_target_indices[
+                np.argmax(iou_matrix[row_index, available_target_indices])
+            ]
+            best_iou = iou_matrix[row_index, best_available_target]
+
+            if best_iou >= iou_threshold:
+                tp_indices.append(prediction_index)
+                matched_targets[best_available_target] = True
+            else:
+                fp_indices.append(prediction_index)


Aligned the IoU comparison operator with ConfusionMatrix.evaluate_detection_batch by changing the threshold check from >= to > for consistent attribution behavior.

K-saif · 2026-05-27T08:08:21Z

+    cv2.putText(
+        panel,
+        title,
+        (40, 100),
+        cv2.FONT_HERSHEY_SIMPLEX,
+        float(max(1.0, font_size / 18.0)),
+        (240, 240, 240),
+        max(2, round(font_size / 8)),


Implemented in the latest commit

K-saif · 2026-05-27T08:07:53Z

+    cv2.rectangle(
+        result,
+        (0, 0),
+        (result.shape[1] - 1, result.shape[0] - 1),
+        (255, 255, 255),
+        thickness=8,
+    )
+
+    center_x = result.shape[1] // 2
+    center_y = result.shape[0] // 2
+    cv2.line(result, (center_x, 0), (center_x, result.shape[0] - 1), (255, 255, 255), 8)
+    cv2.line(result, (0, center_y), (result.shape[1] - 1, center_y), (255, 255, 255), 8)


Implemented in the latest commit

K-saif · 2026-05-27T08:07:01Z

+    cv2.line(result, (center_x, 0), (center_x, result.shape[0] - 1), (255, 255, 255), 8)
+    cv2.line(result, (0, center_y), (result.shape[1] - 1, center_y), (255, 255, 255), 8)
+
+    cv2.imwrite(str(save_path), result)


In the updated commit instead of hard fail implemented a used warning.
write_success = cv2.imwrite(str(save_path), result) if not write_success: warnings.warn( f"Failed to write validation image to '{save_path}'.", UserWarning, stacklevel=2 )

K-saif · 2026-05-27T08:01:50Z

+                if Path(image_filename).suffix == "":
+                    image_filename = f"{image_filename}.jpg"
+
+                save_path = save_directory / image_filename


DetectionDataset.from_yolo only lists files at the top level (via list_files_with_extensions using Path.glob), so images inside subfolders will not be found and won't be processed. Therefore current code is correct.

K-saif · 2026-05-27T08:00:44Z

+        saved_image = cv2.imread(str(saved_image_path))
+        assert saved_image is not None
+        assert saved_image.shape[:2] == (64, 64)
+        assert np.any(saved_image[:32, :32] != 0)
+        assert np.any(saved_image[:32, 32:] != 0)
+        assert np.any(saved_image[32:, :32] != 0)
+        assert np.any(saved_image[32:, 32:] != 0)


Implemented in new commit, now its check for detection rendering, panel logic and visualization correctness

K-saif · 2026-05-31T05:41:32Z

Hi @Borda, i have resolved all the issues raised by the copilot in the latest commit. Can you please review the latest commit, happy to address any feedback or make improvements if needed.

feat(metrics): add TP/FP/FN validation mosaic export

18e5e94

K-saif requested a review from SkalskiP as a code owner May 25, 2026 10:11

pre-commit-ci Bot and others added 4 commits May 25, 2026 10:13

fix(pre_commit): 🎨 auto format pre-commit hooks

89de32e

Fix lint and typing issues

e492af0

fix(pre_commit): 🎨 auto format pre-commit hooks

f1bbd84

fix mypy issue

4285062

Borda requested a review from Copilot May 26, 2026 16:40

Copilot started reviewing on behalf of Borda May 26, 2026 16:40 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

K-saif and others added 3 commits May 27, 2026 13:28

fix all issues and suggestions

08e5b60

fix(pre_commit): 🎨 auto format pre-commit hooks

6695d7f

Align validation visualization matching with confusion matrix

5b31583

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add adaptive TP/FP/FN validation mosaic export#2271

Add adaptive TP/FP/FN validation mosaic export#2271
K-saif wants to merge 8 commits into
roboflow:developfrom
K-saif:feat-validation-visualization

K-saif commented May 25, 2026

Uh oh!

CLAassistant commented May 25, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

K-saif May 27, 2026

Uh oh!

K-saif May 27, 2026 •

edited

Loading

Uh oh!

K-saif May 27, 2026

Uh oh!

K-saif May 27, 2026

Uh oh!

K-saif May 27, 2026

Uh oh!

K-saif May 27, 2026

Uh oh!

K-saif May 27, 2026

Uh oh!

K-saif May 27, 2026

Uh oh!

K-saif commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

K-saif commented May 25, 2026

Description

Type of Change

Motivation and Context

Changes Made

Testing

Additional Notes

Uh oh!

CLAassistant commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

K-saif May 27, 2026

Choose a reason for hiding this comment

Uh oh!

K-saif May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

K-saif May 27, 2026

Choose a reason for hiding this comment

Uh oh!

K-saif May 27, 2026

Choose a reason for hiding this comment

Uh oh!

K-saif May 27, 2026

Choose a reason for hiding this comment

Uh oh!

K-saif May 27, 2026

Choose a reason for hiding this comment

Uh oh!

K-saif May 27, 2026

Choose a reason for hiding this comment

Uh oh!

K-saif May 27, 2026

Choose a reason for hiding this comment

Uh oh!

K-saif commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented May 25, 2026 •

edited

Loading

codecov Bot commented May 26, 2026 •

edited

Loading

K-saif May 27, 2026 •

edited

Loading