Synchronization of Multiple Videos

Avihai Naaman*, Ron Shapira Weber*, Oren Freifeld
Ben-Gurion University of the Negev, Israel
ICCV 2025

*Indicates Equal Contribution
Seagulls set alignment

Temporal Prototype Learning (TPL) uses an "off-the-shelf" feature extractor, denoted by φ, to generate initial multichannel action-progression sequences for videos of the same action (e.g., Ball pitch). Colors indicate different (and temporally-misaligned) videos of the same action. TPL produces the joint alignment and prototypical sequence, mapping key events (e.g., Ball Release).

Abstract

Synchronizing videos captured simultaneously from multiple cameras in the same scene is often easy and typically requires only simple time shifts. However, synchronizing videos from different scenes or, more recently, generative AI videos, poses a far more complex challenge due to diverse subjects, backgrounds, and nonlinear temporal misalignment. We propose Temporal Prototype Learning (TPL), a prototype-based framework that constructs a shared, compact 1D representation from high-dimensional embeddings extracted by any of various pretrained models. TPL robustly aligns videos by learning a unified prototype sequence that anchors key action phases, thereby avoiding exhaustive pairwise matching. Our experiments show that TPL improves synchronization accuracy, efficiency, and robustness across diverse datasets, including fine-grained frame retrieval and phase classification tasks. Importantly, TPL is the first approach to mitigate synchronization issues in multiple generative AI videos depicting the same action.

Videos Synchronization by TPL - GEN-MVS Dataset

Top - Original Videos. Bottom - Videos After Synchronization.

Videos Synchronization by TPL - Penn Action Dataset

Top - Original Videos. Bottom - Videos After Synchronization.