Fix MeanAverageRecall: compute mAR@K using top-K detections per image (COCO-compliant)#2136
Fix MeanAverageRecall: compute mAR@K using top-K detections per image (COCO-compliant)#2136stop1one wants to merge 8 commits intoroboflow:developfrom
Conversation
|
@Borda I accidentally closed the previous PR (#1967). Regarding the implementation, please check my last question. |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project check has failed because the head coverage (72%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2136 +/- ##
=======================================
Coverage 72% 72%
=======================================
Files 61 61
Lines 7249 7246 -3
=======================================
+ Hits 5246 5249 +3
+ Misses 2003 1997 -6 🚀 New features to boost your workflow:
|
|
I've finished the implementation and resolved all conflicts.
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug in the MeanAverageRecall metric calculation to comply with the COCO evaluation protocol. Previously, the implementation selected the top-K predictions globally across all images, rather than per image as specified by COCO. This fix ensures that mAR@K is calculated by considering the top-K highest-confidence detections for each image independently.
Changes:
- Modified the
_computemethod to track prediction positions within each image using indices instead of confidence scores - Updated
_compute_average_recall_for_classesto filter predictions by per-image rank before computing confusion matrix - Removed the
max_detectionsparameter from_compute_confusion_matrixsince filtering now happens upstream - Added comprehensive integration tests with 15 test images covering various scenarios
- Fixed a duplicate error handling statement (code cleanup)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/supervision/metrics/mean_average_recall.py | Core logic fix: Changed from global top-K to per-image top-K by tracking prediction indices within each image and filtering before confusion matrix computation |
| tests/metrics/test_mean_average_recall.py | Added comprehensive test suite with 15 test images covering various detection scenarios including perfect detections, mismatches, and empty predictions |
Supersedes #1967
Description
This PR fixes the calculation of mAR@K in
MeanAverageRecallto comply with the COCO evaluation protocol.Previously, the implementation selected the top-K predictions globally across all images, rather than per image.
According to the COCO evaluation protocol, mAR@K should be calculated by considering the top-K highest-confidence detections for each image.
This issue is tracked in issue #1966
To resolve this, I modified the
_computeand_compute_average_recall_for_classesfunction to first filter the statistics by confidence score before concatenating them and calculate the confusion matrix.No new dependencies are required for this change.
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
I tested the change by running the metric on a dataset with varying numbers of predictions per image and verified that, for each image, only the top-K predictions (by confidence) were used in the mAR@K calculation.
Any specific deployment considerations
No special deployment considerations are required.
Docs