fixed precision converting annotations with `"force_mask=True"` by 0xD4rky · Pull Request #1746 · roboflow/supervision

0xD4rky · 2024-12-16T17:56:34Z

Description

When we use supervision to load YOLO annotations with force_masks=True, it internally converts normalized polygon coordinates from your YOLO text files into pixel coordinates (multiplying by image width/height) and then back into normalized coordinates when saving them out. During this round-trip, integer casting or rounding may occur, causing slight shifts in the polygon coordinates. This leads to “crooked” or misaligned masks.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

YOUR_ANSWER

Minimal Reproducible Code:

import numpy as np
import cv2
import os

resolution_wh = (640, 480)  
relative_polygon = np.array([
    [0.25, 0.4],
    [0.25, 0.6],
    [0.45, 0.6],
    [0.45, 0.4]
], dtype=np.float32)

def polygon_to_mask(polygon: np.ndarray, resolution_wh: tuple[int, int]) -> np.ndarray:
    """
    New approach: Convert to int at the last moment.
    """
    polygon_int = np.round(polygon).astype(np.int32)
    mask = np.zeros((resolution_wh[1], resolution_wh[0]), dtype=np.uint8)
    cv2.fillPoly(mask, [polygon_int], 1)
    return mask

def old_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    Old (problematic) approach: Cast to int too early.
    """
    polygons = (relative_polygon * np.array(resolution_wh)).astype(int)
    return polygon_to_mask(polygons, resolution_wh)

def new_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    New (improved) approach: Keep floats until mask creation.
    """
    polygons = relative_polygon * np.array(resolution_wh, dtype=np.float32)
    return polygon_to_mask(polygons, resolution_wh)

old_mask = old_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("old_mask.png", old_mask.astype(np.uint8)*255)  

new_mask = new_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("new_mask.png", new_mask.astype(np.uint8)*255) 

difference = np.bitwise_xor(old_mask, new_mask)
print("Number of differing pixels:", difference.sum())

# Instructions for Analysis:
# 1. Open old_mask.png and new_mask.png.
# 2. Check if the polygon edges appear more accurate in new_mask.png.
# 3. A reduced "Number of differing pixels" may indicate less distortion if comparing to a ground-truth mask.

Docs

The Docs haven't been updated yet, I need to check the validity of the PR with the maintainers first!

CLAassistant · 2024-12-16T17:56:40Z

All committers have signed the CLA.

SkalskiP · 2024-12-17T16:25:26Z

Hi @0xD4rky 👋🏻 thanks a lot for your interest in our library. It's true that the YOLO format requires normalization of box coordinates and masks, and loading and re-saving the dataset can lead to distortions, and we would like to minimize the level of these distortions.

However, before we decide to introduce any changes to supervision datasets, I need to see that your proposed solution actually minimizes the distortions. The test you attached only shows that the masks processed in two different ways are different. However, there is no reference point to the source polygon. That is, we don't know if and by how much the output polygon differs from the input one.

I would like to see a test where we have the source .txt file with annotations. This file is loaded and then saved back to disk. We can then compare the level of distortion.

0xD4rky · 2024-12-17T18:03:05Z

Thanks @SkalskiP for pointing out the need to verify that change. I forgot to add the verification to it. I created a sample label file to notice how polygon's coordinates used to change before the change and how does the change handle the polygon rounding.

The below is the piece of code I used to analyze the changes in polygon's observed coordinates.

import os
import numpy as np
import supervision as sv

test_dir = "test_annotation"
os.makedirs(test_dir, exist_ok=True)
images_dir = os.path.join(test_dir, "images")
labels_dir = os.path.join(test_dir, "labels")
os.makedirs(images_dir, exist_ok=True)
os.makedirs(labels_dir, exist_ok=True)

data_yaml_path = os.path.join(test_dir, "data.yaml")

with open(data_yaml_path, "w") as f:
    f.write("train: ./\nval: ./\nnames: ['class0']\n")
image_name = "example.jpg"
image_path = os.path.join(images_dir, image_name)
import cv2
dummy_img = np.zeros((480, 640, 3), dtype=np.uint8)
cv2.imwrite(image_path, dummy_img)

original_polygon = [
    "0 0.25 0.4 0.25 0.6 0.45 0.6 0.45 0.4\n"
]

label_path = os.path.join(labels_dir, "example.txt")
with open(label_path, "w") as f:
    f.writelines(original_polygon)

ds = sv.DetectionDataset.from_yolo(
    images_directory_path=images_dir,
    annotations_directory_path=labels_dir,
    data_yaml_path=data_yaml_path,
    force_masks=True
)

ds.as_yolo(annotations_directory_path=labels_dir)
with open(label_path, "r") as f:
    processed_lines = f.readlines()
processed_polygon_line = processed_lines[0].strip()

def parse_yolo_polygon(line):
    vals = line.split()
    cls = vals[0]
    coords = list(map(float, vals[1:]))
    return cls, np.array(coords, dtype=float).reshape(-1, 2)

orig_cls, orig_coords = parse_yolo_polygon(original_polygon[0])
proc_cls, proc_coords = parse_yolo_polygon(processed_polygon_line)

print("Original Polygon Coordinates (Normalized):")
print(orig_coords)
print("Processed Polygon Coordinates (Normalized):")
print(proc_coords)

differences = np.linalg.norm(orig_coords - proc_coords, axis=1)
avg_difference = np.mean(differences)
max_difference = np.max(differences)

print("Average per-point difference:", avg_difference)
print("Max per-point difference:", max_difference)

We start with a known polygon in normalized YOLO coordinates. After loading and saving via supervision, we compare the polygon coordinates before and after. By computing the numeric difference, we get a quantitative measure of how much the polygon has been distorted.

the results before the changes are as follows:

the results after the changes are as follows:

You can see how the processed polygon coordinates are similar to the original coordinates after we have taken the changes into consideration.

One extra point: I will make one extra change in the code in the _polygons_to_masks function i.e. mask = mask[None, ...] so as to make mask (1,H,W) in dimension from (H,W).

0xD4rky · 2025-01-24T16:19:18Z

hello @SkalskiP please review the changes once you have time, thanks!

Borda

@0xD4rky, thank you for your contribution. To help us land fixing PR, could you pls add a test to demonstrate the previous issue and that your fix resolves it? 🐰

0xD4rky · 2026-01-20T06:49:13Z

@0xD4rky, thank you for your contribution. To help us land fixing PR, could you pls add a test to demonstrate the previous issue and that your fix resolves it? 🐰

Sure thing @Borda, will add them today. Although will have to get a lot of context back haha

Borda · 2026-02-02T13:10:18Z

Although will have to get a lot of context back haha

sure, I undestabd that :)

Copilot

Pull request overview

This PR attempts to fix precision loss when loading YOLO polygon annotations with force_mask=True. The issue occurs when normalized polygon coordinates are converted to pixel coordinates and back, causing rounding errors that result in misaligned masks. The proposed solution is to delay integer conversion until the last moment before calling cv2.fillPoly.

Changes:

Modified _polygons_to_masks function to inline the mask creation logic instead of calling polygon_to_mask
Removed import of polygon_to_mask from converters module
Added blank line in docstring formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-04T02:43:53Z