Skip to content

fixed precision converting annotations with "force_mask=True"#1746

Open
0xD4rky wants to merge 8 commits intoroboflow:developfrom
0xD4rky:develop
Open

fixed precision converting annotations with "force_mask=True"#1746
0xD4rky wants to merge 8 commits intoroboflow:developfrom
0xD4rky:develop

Conversation

@0xD4rky
Copy link

@0xD4rky 0xD4rky commented Dec 16, 2024

Description

When we use supervision to load YOLO annotations with force_masks=True, it internally converts normalized polygon coordinates from your YOLO text files into pixel coordinates (multiplying by image width/height) and then back into normalized coordinates when saving them out. During this round-trip, integer casting or rounding may occur, causing slight shifts in the polygon coordinates. This leads to “crooked” or misaligned masks.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

YOUR_ANSWER

Minimal Reproducible Code:

import numpy as np
import cv2
import os

resolution_wh = (640, 480)  
relative_polygon = np.array([
    [0.25, 0.4],
    [0.25, 0.6],
    [0.45, 0.6],
    [0.45, 0.4]
], dtype=np.float32)

def polygon_to_mask(polygon: np.ndarray, resolution_wh: tuple[int, int]) -> np.ndarray:
    """
    New approach: Convert to int at the last moment.
    """
    polygon_int = np.round(polygon).astype(np.int32)
    mask = np.zeros((resolution_wh[1], resolution_wh[0]), dtype=np.uint8)
    cv2.fillPoly(mask, [polygon_int], 1)
    return mask

def old_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    Old (problematic) approach: Cast to int too early.
    """
    polygons = (relative_polygon * np.array(resolution_wh)).astype(int)
    return polygon_to_mask(polygons, resolution_wh)

def new_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    New (improved) approach: Keep floats until mask creation.
    """
    polygons = relative_polygon * np.array(resolution_wh, dtype=np.float32)
    return polygon_to_mask(polygons, resolution_wh)

old_mask = old_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("old_mask.png", old_mask.astype(np.uint8)*255)  

new_mask = new_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("new_mask.png", new_mask.astype(np.uint8)*255) 

difference = np.bitwise_xor(old_mask, new_mask)
print("Number of differing pixels:", difference.sum())

# Instructions for Analysis:
# 1. Open old_mask.png and new_mask.png.
# 2. Check if the polygon edges appear more accurate in new_mask.png.
# 3. A reduced "Number of differing pixels" may indicate less distortion if comparing to a ground-truth mask.

Docs

The Docs haven't been updated yet, I need to check the validity of the PR with the maintainers first!

@CLAassistant
Copy link

CLAassistant commented Dec 16, 2024

CLA assistant check
All committers have signed the CLA.

@0xD4rky 0xD4rky changed the title Resolving Issue #368 ["force_mask = True"] fixed issue in precision converting annotations with "force_mask=True" Dec 17, 2024
@SkalskiP
Copy link
Collaborator

Hi @0xD4rky 👋🏻 thanks a lot for your interest in our library. It's true that the YOLO format requires normalization of box coordinates and masks, and loading and re-saving the dataset can lead to distortions, and we would like to minimize the level of these distortions.

However, before we decide to introduce any changes to supervision datasets, I need to see that your proposed solution actually minimizes the distortions. The test you attached only shows that the masks processed in two different ways are different. However, there is no reference point to the source polygon. That is, we don't know if and by how much the output polygon differs from the input one.

I would like to see a test where we have the source .txt file with annotations. This file is loaded and then saved back to disk. We can then compare the level of distortion.

@0xD4rky
Copy link
Author

0xD4rky commented Dec 17, 2024

Thanks @SkalskiP for pointing out the need to verify that change. I forgot to add the verification to it. I created a sample label file to notice how polygon's coordinates used to change before the change and how does the change handle the polygon rounding.

The below is the piece of code I used to analyze the changes in polygon's observed coordinates.

import os
import numpy as np
import supervision as sv

test_dir = "test_annotation"
os.makedirs(test_dir, exist_ok=True)
images_dir = os.path.join(test_dir, "images")
labels_dir = os.path.join(test_dir, "labels")
os.makedirs(images_dir, exist_ok=True)
os.makedirs(labels_dir, exist_ok=True)

data_yaml_path = os.path.join(test_dir, "data.yaml")

with open(data_yaml_path, "w") as f:
    f.write("train: ./\nval: ./\nnames: ['class0']\n")
image_name = "example.jpg"
image_path = os.path.join(images_dir, image_name)
import cv2
dummy_img = np.zeros((480, 640, 3), dtype=np.uint8)
cv2.imwrite(image_path, dummy_img)

original_polygon = [
    "0 0.25 0.4 0.25 0.6 0.45 0.6 0.45 0.4\n"
]

label_path = os.path.join(labels_dir, "example.txt")
with open(label_path, "w") as f:
    f.writelines(original_polygon)

ds = sv.DetectionDataset.from_yolo(
    images_directory_path=images_dir,
    annotations_directory_path=labels_dir,
    data_yaml_path=data_yaml_path,
    force_masks=True
)

ds.as_yolo(annotations_directory_path=labels_dir)
with open(label_path, "r") as f:
    processed_lines = f.readlines()
processed_polygon_line = processed_lines[0].strip()

def parse_yolo_polygon(line):
    vals = line.split()
    cls = vals[0]
    coords = list(map(float, vals[1:]))
    return cls, np.array(coords, dtype=float).reshape(-1, 2)

orig_cls, orig_coords = parse_yolo_polygon(original_polygon[0])
proc_cls, proc_coords = parse_yolo_polygon(processed_polygon_line)

print("Original Polygon Coordinates (Normalized):")
print(orig_coords)
print("Processed Polygon Coordinates (Normalized):")
print(proc_coords)

differences = np.linalg.norm(orig_coords - proc_coords, axis=1)
avg_difference = np.mean(differences)
max_difference = np.max(differences)

print("Average per-point difference:", avg_difference)
print("Max per-point difference:", max_difference)

We start with a known polygon in normalized YOLO coordinates. After loading and saving via supervision, we compare the polygon coordinates before and after. By computing the numeric difference, we get a quantitative measure of how much the polygon has been distorted.

  • the results before the changes are as follows:
Screenshot 2024-12-17 at 11 23 05 PM
  • the results after the changes are as follows:
Screenshot 2024-12-17 at 11 23 39 PM

You can see how the processed polygon coordinates are similar to the original coordinates after we have taken the changes into consideration.

  • One extra point: I will make one extra change in the code in the _polygons_to_masks function i.e. mask = mask[None, ...] so as to make mask (1,H,W) in dimension from (H,W).

@0xD4rky
Copy link
Author

0xD4rky commented Jan 24, 2025

hello @SkalskiP please review the changes once you have time, thanks!

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0xD4rky, thank you for your contribution. To help us land fixing PR, could you pls add a test to demonstrate the previous issue and that your fix resolves it? 🐰

@0xD4rky
Copy link
Author

0xD4rky commented Jan 20, 2026

@0xD4rky, thank you for your contribution. To help us land fixing PR, could you pls add a test to demonstrate the previous issue and that your fix resolves it? 🐰

Sure thing @Borda, will add them today. Although will have to get a lot of context back haha

@Borda Borda added the bug Something isn't working label Jan 26, 2026
@Borda
Copy link
Member

Borda commented Feb 2, 2026

Although will have to get a lot of context back haha

sure, I undestabd that :)

@Borda Borda changed the title fixed issue in precision converting annotations with "force_mask=True" fixed precision converting annotations with "force_mask=True" Feb 4, 2026
@Borda Borda requested review from Copilot and removed request for onuralpszr February 4, 2026 02:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix precision loss when loading YOLO polygon annotations with force_mask=True. The issue occurs when normalized polygon coordinates are converted to pixel coordinates and back, causing rounding errors that result in misaligned masks. The proposed solution is to delay integer conversion until the last moment before calling cv2.fillPoly.

Changes:

  • Modified _polygons_to_masks function to inline the mask creation logic instead of calling polygon_to_mask
  • Removed import of polygon_to_mask from converters module
  • Added blank line in docstring formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

polygon_int = np.round(polygon).astype(np.int32)
mask = np.zeros((resolution_wh[1], resolution_wh[0]), dtype=np.uint8)

cv2.fillPoly(mask, [polygon_int], 1)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module cv2 is used here but not imported. An import statement for cv2 is required at the top of the file.

Copilot uses AI. Check for mistakes.
Comment on lines 49 to +57
def _polygons_to_masks(
polygons: list[np.ndarray], resolution_wh: tuple[int, int]
polygon: list[np.ndarray], resolution_wh: tuple[int, int]
) -> np.ndarray:
return np.array(
[
polygon_to_mask(polygon=polygon, resolution_wh=resolution_wh)
for polygon in polygons
],
dtype=bool,
)
polygon_int = np.round(polygon).astype(np.int32)
mask = np.zeros((resolution_wh[1], resolution_wh[0]), dtype=np.uint8)

cv2.fillPoly(mask, [polygon_int], 1)
mask = mask[None, ...]
return mask.astype(bool)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature indicates it accepts a list of polygons (polygon: list[np.ndarray]), but the implementation treats it as a single polygon. The function attempts to call np.round(polygon) and cv2.fillPoly with [polygon_int], which assumes polygon is a single array, not a list of arrays. This will fail when multiple polygons are passed. The function should either loop over all polygons in the list to create multiple masks, or the parameter type should be changed to np.ndarray if only single polygons are expected.

Copilot uses AI. Check for mistakes.
@@ -120,7 +119,7 @@ def yolo_annotations_to_detections(
np.round(polygon * np.array(resolution_wh, dtype=np.float32)).astype(int)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rounding and casting to int happen before passing polygons to _polygons_to_masks, which defeats the purpose of this PR. The PR description states the fix is to "keep floats until mask creation" to avoid precision loss, but here the conversion to int occurs at line 119 before the mask creation function is called. The rounding should be removed from this location and only performed inside _polygons_to_masks just before calling cv2.fillPoly.

Suggested change
np.round(polygon * np.array(resolution_wh, dtype=np.float32)).astype(int)
polygon * np.array(resolution_wh, dtype=np.float32)

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +57
mask = mask[None, ...]
return mask.astype(bool)
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function always returns a mask array with shape (1, H, W) regardless of how many polygons are in the input list. The mask should have shape (N, H, W) where N is the number of polygons. The original implementation correctly handled multiple polygons by iterating over them and stacking the results. The new implementation needs to loop over the list of polygons and create a mask for each one, then stack them into a single array.

Copilot uses AI. Check for mistakes.

def _polygons_to_masks(
polygons: list[np.ndarray], resolution_wh: tuple[int, int]
polygon: list[np.ndarray], resolution_wh: tuple[int, int]
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name "polygon" is misleading since it accepts a list of polygons, not a single polygon. The parameter should be renamed to "polygons" to match the expected input type and improve code clarity.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants