Add adaptive TP/FP/FN validation mosaic export#2271
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #2271 +/- ##
========================================
Coverage 78% 78%
========================================
Files 66 66
Lines 8410 8520 +110
========================================
+ Hits 6552 6653 +101
- Misses 1858 1867 +9 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds an optional qualitative visualization export to ConfusionMatrix.benchmark(...), writing per-image 2x2 mosaics (GT / TP / FP / FN) to disk to aid error analysis during benchmarking.
Changes:
- Added
save_result_imagesandsave_directory_pathoptions toConfusionMatrix.benchmark(...)to export validation mosaics underresult/. - Implemented per-image panel rendering with class-consistent box coloring, labels, and grid styling.
- Added a regression test for image export and updated benchmarking documentation to mention the feature.
Assessment (n/5):
- Code quality: 3/5
- Tests: 2/5
- Docs: 4/5
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/supervision/metrics/detection.py |
Adds the visualization export pipeline and new benchmark(...) parameters to save per-image GT/TP/FP/FN mosaics. |
tests/metrics/test_detection.py |
Adds a regression test that exercises save_result_images=True and checks a saved mosaic is created/readable. |
docs/how_to/benchmark_a_model.md |
Documents the new save_result_images option and what gets written to result/. |
| targets: Detections, | ||
| conf_threshold: float, | ||
| iou_threshold: float, | ||
| ) -> tuple[Detections, Detections, Detections]: |
| fp_indices: list[int] = [] | ||
| fn_indices: list[int] = [] | ||
|
|
||
| class_ids = np.unique( | ||
| np.concatenate((filtered_predictions.class_id, targets.class_id)).astype(int) | ||
| ) | ||
|
|
||
| for class_id in class_ids: | ||
| prediction_indices = np.flatnonzero(filtered_predictions.class_id == class_id) | ||
| target_indices = np.flatnonzero(targets.class_id == class_id) | ||
|
|
||
| if len(prediction_indices) == 0: | ||
| fn_indices.extend(target_indices.tolist()) | ||
| continue | ||
|
|
||
| if len(target_indices) == 0: | ||
| fp_indices.extend(prediction_indices.tolist()) | ||
| continue | ||
|
|
||
| if filtered_predictions.confidence is None: | ||
| ordered_prediction_indices = prediction_indices | ||
| else: | ||
| prediction_confidence = filtered_predictions.confidence[prediction_indices] | ||
| ordered_prediction_indices = prediction_indices[ | ||
| np.argsort(prediction_confidence)[::-1] | ||
| ] | ||
|
|
||
| iou_matrix = box_iou_batch( | ||
| filtered_predictions.xyxy[ordered_prediction_indices], | ||
| targets.xyxy[target_indices], | ||
| ) | ||
| matched_targets: npt.NDArray[np.bool_] = np.zeros( | ||
| len(target_indices), dtype=bool | ||
| ) | ||
|
|
||
| for row_index, prediction_index in enumerate(ordered_prediction_indices): | ||
| available_target_indices = np.flatnonzero(~matched_targets) | ||
| if len(available_target_indices) == 0: | ||
| fp_indices.append(prediction_index) | ||
| continue | ||
|
|
||
| best_available_target = available_target_indices[ | ||
| np.argmax(iou_matrix[row_index, available_target_indices]) | ||
| ] | ||
| best_iou = iou_matrix[row_index, best_available_target] | ||
|
|
||
| if best_iou >= iou_threshold: | ||
| tp_indices.append(prediction_index) | ||
| matched_targets[best_available_target] = True | ||
| else: | ||
| fp_indices.append(prediction_index) | ||
|
|
||
| fn_indices.extend(target_indices[~matched_targets].tolist()) |
There was a problem hiding this comment.
Updated the TP/FP/FN matching flow to use global IoU-based matching with same-class prioritization, aligning the visualization behavior more closely with ConfusionMatrix.evaluate_detection_batch.
| best_available_target = available_target_indices[ | ||
| np.argmax(iou_matrix[row_index, available_target_indices]) | ||
| ] | ||
| best_iou = iou_matrix[row_index, best_available_target] | ||
|
|
||
| if best_iou >= iou_threshold: | ||
| tp_indices.append(prediction_index) | ||
| matched_targets[best_available_target] = True | ||
| else: | ||
| fp_indices.append(prediction_index) |
There was a problem hiding this comment.
Aligned the IoU comparison operator with ConfusionMatrix.evaluate_detection_batch by changing the threshold check from >= to > for consistent attribution behavior.
| cv2.putText( | ||
| panel, | ||
| title, | ||
| (40, 100), | ||
| cv2.FONT_HERSHEY_SIMPLEX, | ||
| float(max(1.0, font_size / 18.0)), | ||
| (240, 240, 240), | ||
| max(2, round(font_size / 8)), |
There was a problem hiding this comment.
Implemented in the latest commit
| cv2.rectangle( | ||
| result, | ||
| (0, 0), | ||
| (result.shape[1] - 1, result.shape[0] - 1), | ||
| (255, 255, 255), | ||
| thickness=8, | ||
| ) | ||
|
|
||
| center_x = result.shape[1] // 2 | ||
| center_y = result.shape[0] // 2 | ||
| cv2.line(result, (center_x, 0), (center_x, result.shape[0] - 1), (255, 255, 255), 8) | ||
| cv2.line(result, (0, center_y), (result.shape[1] - 1, center_y), (255, 255, 255), 8) |
There was a problem hiding this comment.
Implemented in the latest commit
| cv2.line(result, (center_x, 0), (center_x, result.shape[0] - 1), (255, 255, 255), 8) | ||
| cv2.line(result, (0, center_y), (result.shape[1] - 1, center_y), (255, 255, 255), 8) | ||
|
|
||
| cv2.imwrite(str(save_path), result) |
There was a problem hiding this comment.
In the updated commit instead of hard fail implemented a used warning.
write_success = cv2.imwrite(str(save_path), result) if not write_success: warnings.warn( f"Failed to write validation image to '{save_path}'.", UserWarning, stacklevel=2 )
| if Path(image_filename).suffix == "": | ||
| image_filename = f"{image_filename}.jpg" | ||
|
|
||
| save_path = save_directory / image_filename |
There was a problem hiding this comment.
DetectionDataset.from_yolo only lists files at the top level (via list_files_with_extensions using Path.glob), so images inside subfolders will not be found and won't be processed. Therefore current code is correct.
| saved_image = cv2.imread(str(saved_image_path)) | ||
| assert saved_image is not None | ||
| assert saved_image.shape[:2] == (64, 64) | ||
| assert np.any(saved_image[:32, :32] != 0) | ||
| assert np.any(saved_image[:32, 32:] != 0) | ||
| assert np.any(saved_image[32:, :32] != 0) | ||
| assert np.any(saved_image[32:, 32:] != 0) |
There was a problem hiding this comment.
Implemented in new commit, now its check for detection rendering, panel logic and visualization correctness
|
Hi @Borda, i have resolved all the issues raised by the copilot in the latest commit. Can you please review the latest commit, happy to address any feedback or make improvements if needed. |
Before submitting
Description
Adds an optional validation visualization flow for exporting per-image GT/TP/FP/FN mosaics during confusion-matrix benchmarking.
Generated mosaics improve qualitative error analysis by providing a side-by-side comparison of:
Type of Change
Motivation and Context
Current benchmarking utilities provide aggregate metrics but limited per-image visual inspection support.
This feature adds optional qualitative visualization exports to simplify:
without affecting existing benchmark behavior.
Changes Made
save_result_imagesandsave_directory_pathsupport inConfusionMatrix.benchmark(...)result/directorytest_detection.pyTesting
Additional Notes
Existing benchmark behavior remains unchanged unless
save_result_images=Trueis enabled.