Skip to content

FEATURE: New Video API#1924

Open
Ashp116 wants to merge 139 commits intoroboflow:developfrom
Ashp116:bug/process-video-audio
Open

FEATURE: New Video API#1924
Ashp116 wants to merge 139 commits intoroboflow:developfrom
Ashp116:bug/process-video-audio

Conversation

@Ashp116
Copy link
Contributor

@Ashp116 Ashp116 commented Jul 30, 2025

Description

This PR introduces a new Video API that streamlines video processing and rendering workflows. It addresses both issues #1923 and #1929 by enabling more flexible backend support and improved audio-video synchronization.

With this update, the video processing function now supports multiple backends, including PyAV and OpenCV. Notably, PyAV is the only backend currently supporting audio rendering, which significantly improves output quality.

This PR requires the optional dependency pyAV for the video rendering backend.

Tags:
Fixes #1923
Fixes #1929

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Please refer to #1923 and #1929

Any specific deployment considerations

Ensure that pyAV is installed in the environment to test pyAV backend.

Docs

  • Docs updated? What were the changes

@Ashp116 Ashp116 requested a review from SkalskiP as a code owner July 30, 2025 19:29
@Ashp116 Ashp116 changed the title ADD: Added audio stream for process_video BUG: Added audio stream for process_video Jul 30, 2025
@SkalskiP
Copy link
Collaborator

Hi @Ashp116 👋🏻 Another great idea! Video processing is probably the oldest part of supervision, written over two years ago, and I’ve been wanting to update its API for a while. Would you be open to not only adding audio support but also helping me with the update?

@Ashp116
Copy link
Contributor Author

Ashp116 commented Jul 31, 2025

Hi @SkalskiP, yea, I would like to help update the API. I was thinking of changing how videos are written in process_video. The original compression is lost when annotations are added and the file is written to a target_path. But yea, I would like to help out with the update.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 1, 2025

Hi @Ashp116 I'm really glad you want to help me! Let's goooo! 🔥 🔥 🔥

I want the functionalities currently found in supervision.utils.video to be reorganized around a new Video class. Importantly, all features previously available in the old API must still be supported in the new one. Ideally, the new API should be more consistent and expressive.

  • get video info (works for files, RTSP, webcams)

    import supervision as sv
     
    # static video
    sv.Video("source.mp4").info
    
    # video stream
    sv.Video("rtsp://...").info
    
    # webcam
    sv.Video(0).info
  • simple frame iteration (object is iterable)

    import supervision as sv
    
    video = sv.Video("source.mp4")
    for frame in video:
        ...
  • advanced frame iteration (stride, sub-clip, on-the-fly resize)

    import supervision as sv
    
    for frame in sv.Video("source.mp4").frames(stride=5, start=100, end=500, resolution_wh=(1280, 720)):
        ...
  • process the video

    import cv2
    import supervision as sv
    
    def blur(frame, i):
        return cv2.GaussianBlur(frame, (11, 11), 0)
    
    sv.Video("source.mp4").save(
        "blurred.mp4",
        callback=blur,
        show_progress=True
    )
  • overwrite target video parameters

    import supervision as sv
    
    sv.Video("source.mp4").save(
        "timelapse.mp4",
        fps=60,
        callback=lambda f, i: f,
        show_progress=True
    )
  • complete manual control with explicit VideoInfo

    from supervision import Video, VideoInfo
    
    source = Video("source.mp4")
    target_info = VideoInfo(width=800, height=800, fps=24)
    
    with src.sink("square.mp4", info=target_info) as sink:
        for f in src.frames():
            f = cv2.resize(f, target_info.resolution_wh)
            sink.write(f)
  • multi-backend support decode/encode

    import supervision as sv
    
    video = sv.Video("source.mkv", backend="pyav")
    
    video = sv.Video("source.mkv", backend="opencv")

    suggested minimal protocol

    class Backend(Protocol):
        def open(self, path: str) -> Any: ...
        def info(self, handle: Any) -> VideoInfo: ...
    
        def read(self, handle: Any) -> tuple[bool, np.ndarray]: ...
        def grab(self, handle: Any) -> bool: ...
        def seek(self, handle: Any, frame_idx: int) -> None: ...
    
        def writer(self, path: str, info: VideoInfo, codec: str) -> Writer: ...
    
    class Writer(Protocol):
        def write(self, frame: np.ndarray) -> None: ...
        def close(self) -> None: ...

@Ashp116
Copy link
Contributor Author

Ashp116 commented Aug 2, 2025

Hi @SkalskiP,

I’ve addressed most of the features you mentioned, but I have some thoughts on a few aspects of the implementation:

  • .save Functionality
    How would you handle .save for a video feed coming from a webcam or an RTSP stream? Currently, I have it where only video files can be saved.

  • Writer and Backend Classes
    This is just my personal opinion, but should these classes be moved to separate scripts/modules? If we add more writers and backends in the future, keeping everything inside the main video script might become cluttered.

  • “Complete manual control with explicit VideoInfo” Functionality

    from supervision import Video, VideoInfo
    
    source = Video("source.mp4")
    target_info = VideoInfo(width=800, height=800, fps=24)
    
    with src.sink("square.mp4", info=target_info) as sink:
        for f in src.frames():
            f = cv2.resize(f, target_info.resolution_wh)
            sink.write(f)

    I’m not fully clear on what this feature is intended to do. In this snippet, the Video instance source is created but never used afterward. Is src supposed to be source? Also, is the goal to create sinks for each backend? Could you please clarify the purpose and expected usage here?

@Ashp116 Ashp116 changed the title BUG: Added audio stream for process_video FEATURE: Versatile Video class Aug 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reimplement video utils BUG: Audio stream not captured in process_video

3 participants