SpaceJAM: A Lightweight and Regularization-free Method for Fast Joint Alignment of Images

Ben-Gurion University of the Negev, Israel
ECCV 2024

*Indicates Equal Contribution
Seagulls set alignment

Our framework jointly aligns a set of images of an object category in only a few minutes.
Top-to-bottom: 1) input images; 2) learned low-dimensional representations; 3) aligned features; 4) aligned images.
The last column depicts the average representation (atlas) obtained after training.

Abstract

The unsupervised task of Joint Alignment (JA) of images is beset by challenges such as high complexity, geometric distortions, and convergence to poor local or even global optima. Although Vision Transformers (ViT) have recently provided valuable features for JA, they fall short of fully addressing these issues. Consequently, researchers frequently depend on expensive models and numerous regularization terms, resulting in long training times and challenging hyperparameter tuning. We introduce the Spatial Joint Alignment Model (SpaceJAM), a novel approach that addresses the JA task with efficiency and simplicity. SpaceJAM leverages a compact architecture with only ∼16K trainable parameters and uniquely operates without the need for regularization or atlas main- tenance. Evaluations on SPair-71K and CUB datasets demonstrate that SpaceJAM matches the alignment capabilities of existing methods while significantly reducing computational demands and achieving at least a 10x speedup. SpaceJAM sets a new standard for rapid and effective image alignment, making the process more accessible and efficient

Joint Alignment (qualitative results)

Quantitative Results

JA comparison table JA comparison table

Image-to-Image Alignment (qualitative results)