Skip to content

Add adaptive TP/FP/FN validation mosaic export#2271

Open
K-saif wants to merge 8 commits into
roboflow:developfrom
K-saif:feat-validation-visualization
Open

Add adaptive TP/FP/FN validation mosaic export#2271
K-saif wants to merge 8 commits into
roboflow:developfrom
K-saif:feat-validation-visualization

Conversation

@K-saif
Copy link
Copy Markdown

@K-saif K-saif commented May 25, 2026

Before submitting
  • Self-reviewed the code
  • Updated documentation
  • Added/updated tests
  • All tests pass locally

Description

Adds an optional validation visualization flow for exporting per-image GT/TP/FP/FN mosaics during confusion-matrix benchmarking.

Generated mosaics improve qualitative error analysis by providing a side-by-side comparison of:

  • Ground Truth
  • True Positives
  • False Positives
  • False Negatives

Type of Change

  • ✨ New feature
  • 📝 Documentation update
  • 🧪 Test update

Motivation and Context

Current benchmarking utilities provide aggregate metrics but limited per-image visual inspection support.

This feature adds optional qualitative visualization exports to simplify:

  • model debugging
  • localization error inspection
  • false positive analysis
  • false negative analysis

without affecting existing benchmark behavior.

Changes Made

  • added optional save_result_images and save_directory_path support in ConfusionMatrix.benchmark(...)
  • added per-image GT/TP/FP/FN mosaic export under a result/ directory
  • added class-consistent bounding-box coloring across panels
  • added adaptive label and bounding-box scaling based on image resolution
  • added white outer borders and center dividers for readability
  • updated benchmark documentation
  • added regression coverage in test_detection.py

Testing

  • I have tested this code locally
  • I have added unit tests that prove the feature works
  • All new and existing tests pass

Additional Notes

Existing benchmark behavior remains unchanged unless save_result_images=True is enabled.

@K-saif K-saif requested a review from SkalskiP as a code owner May 25, 2026 10:11
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 25, 2026

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

❌ Patch coverage is 86.48649% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 78%. Comparing base (befdb7c) to head (4285062).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #2271    +/-   ##
========================================
  Coverage       78%     78%            
========================================
  Files           66      66            
  Lines         8410    8520   +110     
========================================
+ Hits          6552    6653   +101     
- Misses        1858    1867     +9     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional qualitative visualization export to ConfusionMatrix.benchmark(...), writing per-image 2x2 mosaics (GT / TP / FP / FN) to disk to aid error analysis during benchmarking.

Changes:

  • Added save_result_images and save_directory_path options to ConfusionMatrix.benchmark(...) to export validation mosaics under result/.
  • Implemented per-image panel rendering with class-consistent box coloring, labels, and grid styling.
  • Added a regression test for image export and updated benchmarking documentation to mention the feature.

Assessment (n/5):

  • Code quality: 3/5
  • Tests: 2/5
  • Docs: 4/5

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
src/supervision/metrics/detection.py Adds the visualization export pipeline and new benchmark(...) parameters to save per-image GT/TP/FP/FN mosaics.
tests/metrics/test_detection.py Adds a regression test that exercises save_result_images=True and checks a saved mosaic is created/readable.
docs/how_to/benchmark_a_model.md Documents the new save_result_images option and what gets written to result/.

targets: Detections,
conf_threshold: float,
iou_threshold: float,
) -> tuple[Detections, Detections, Detections]:
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in commit

Comment thread src/supervision/metrics/detection.py Outdated
Comment on lines +105 to +157
fp_indices: list[int] = []
fn_indices: list[int] = []

class_ids = np.unique(
np.concatenate((filtered_predictions.class_id, targets.class_id)).astype(int)
)

for class_id in class_ids:
prediction_indices = np.flatnonzero(filtered_predictions.class_id == class_id)
target_indices = np.flatnonzero(targets.class_id == class_id)

if len(prediction_indices) == 0:
fn_indices.extend(target_indices.tolist())
continue

if len(target_indices) == 0:
fp_indices.extend(prediction_indices.tolist())
continue

if filtered_predictions.confidence is None:
ordered_prediction_indices = prediction_indices
else:
prediction_confidence = filtered_predictions.confidence[prediction_indices]
ordered_prediction_indices = prediction_indices[
np.argsort(prediction_confidence)[::-1]
]

iou_matrix = box_iou_batch(
filtered_predictions.xyxy[ordered_prediction_indices],
targets.xyxy[target_indices],
)
matched_targets: npt.NDArray[np.bool_] = np.zeros(
len(target_indices), dtype=bool
)

for row_index, prediction_index in enumerate(ordered_prediction_indices):
available_target_indices = np.flatnonzero(~matched_targets)
if len(available_target_indices) == 0:
fp_indices.append(prediction_index)
continue

best_available_target = available_target_indices[
np.argmax(iou_matrix[row_index, available_target_indices])
]
best_iou = iou_matrix[row_index, best_available_target]

if best_iou >= iou_threshold:
tp_indices.append(prediction_index)
matched_targets[best_available_target] = True
else:
fp_indices.append(prediction_index)

fn_indices.extend(target_indices[~matched_targets].tolist())
Copy link
Copy Markdown
Author

@K-saif K-saif May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the TP/FP/FN matching flow to use global IoU-based matching with same-class prioritization, aligning the visualization behavior more closely with ConfusionMatrix.evaluate_detection_batch.

Comment thread src/supervision/metrics/detection.py Outdated
Comment on lines +146 to +155
best_available_target = available_target_indices[
np.argmax(iou_matrix[row_index, available_target_indices])
]
best_iou = iou_matrix[row_index, best_available_target]

if best_iou >= iou_threshold:
tp_indices.append(prediction_index)
matched_targets[best_available_target] = True
else:
fp_indices.append(prediction_index)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligned the IoU comparison operator with ConfusionMatrix.evaluate_detection_batch by changing the threshold check from >= to > for consistent attribution behavior.

Comment thread src/supervision/metrics/detection.py Outdated
Comment on lines +235 to +242
cv2.putText(
panel,
title,
(40, 100),
cv2.FONT_HERSHEY_SIMPLEX,
float(max(1.0, font_size / 18.0)),
(240, 240, 240),
max(2, round(font_size / 8)),
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in the latest commit

Comment thread src/supervision/metrics/detection.py Outdated
Comment on lines +299 to +310
cv2.rectangle(
result,
(0, 0),
(result.shape[1] - 1, result.shape[0] - 1),
(255, 255, 255),
thickness=8,
)

center_x = result.shape[1] // 2
center_y = result.shape[0] // 2
cv2.line(result, (center_x, 0), (center_x, result.shape[0] - 1), (255, 255, 255), 8)
cv2.line(result, (0, center_y), (result.shape[1] - 1, center_y), (255, 255, 255), 8)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in the latest commit

Comment thread src/supervision/metrics/detection.py Outdated
cv2.line(result, (center_x, 0), (center_x, result.shape[0] - 1), (255, 255, 255), 8)
cv2.line(result, (0, center_y), (result.shape[1] - 1, center_y), (255, 255, 255), 8)

cv2.imwrite(str(save_path), result)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the updated commit instead of hard fail implemented a used warning.
write_success = cv2.imwrite(str(save_path), result) if not write_success: warnings.warn( f"Failed to write validation image to '{save_path}'.", UserWarning, stacklevel=2 )

if Path(image_filename).suffix == "":
image_filename = f"{image_filename}.jpg"

save_path = save_directory / image_filename
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DetectionDataset.from_yolo only lists files at the top level (via list_files_with_extensions using Path.glob), so images inside subfolders will not be found and won't be processed. Therefore current code is correct.

Comment thread tests/metrics/test_detection.py Outdated
Comment on lines +1078 to +1084
saved_image = cv2.imread(str(saved_image_path))
assert saved_image is not None
assert saved_image.shape[:2] == (64, 64)
assert np.any(saved_image[:32, :32] != 0)
assert np.any(saved_image[:32, 32:] != 0)
assert np.any(saved_image[32:, :32] != 0)
assert np.any(saved_image[32:, 32:] != 0)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in new commit, now its check for detection rendering, panel logic and visualization correctness

@K-saif
Copy link
Copy Markdown
Author

K-saif commented May 31, 2026

Hi @Borda, i have resolved all the issues raised by the copilot in the latest commit. Can you please review the latest commit, happy to address any feedback or make improvements if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants