Cross-Instance Gaussian Splatting Registration via Geometry-Aware Feature-Guided Alignment

Abstract

We present Gaussian Splatting Alignment (GSA), a novel method for aligning two independent 3D Gaussian Splatting (3DGS) models via a similarity transformation (rotation; translation; scale), even when they are of different objects in the same category (e.g., different cars). In contrast, existing methods can only align 3DGS models of the same object (e.g., the same car) and often must be given true scale as input, while we estimate it successfully. Our approach leverages viewpoint-guided spherical map features to obtain robust correspondences and introduces a two-step optimization framework that aligns models while keeping the 3DGS models fixed. First, we perform an iterative feature-guided absolute orientation solver as our coarse registration, which is robust to extremely poor initialization (e.g., 180° misalignment or a 10× scale gap), followed by a fine registration step enforcing multi-view feature consistency, inspired by inverse radiance-field formulations. The first step already achieves state-of-the-art performance, and the second further improves results. In the same-object case, GSA outperforms prior works, often by a large margin, even when the other methods are given the true scale. In the harder case of different objects in the same category, GSA vastly surpasses them, providing the first effective solution for category-level 3DGS registration and unlocking new applications.

Interactive Coarse Registration Demo

Iterative Feature-Guided Absolute Orientation Solver (R, s, t)

Load two scenes, assign source/target, apply a random transformation, then run the coarse registration solver. Step through iterations to see the alignment progress.

Rot X 0°

Rot Y 0°

Rot Z 0°

Scale 1.0

Tx 0

Ty 0

Tz 0

Max Iterations

Samples

Semantic Thresh

Iteration:

Run solver first

Left-click drag: rotate | Ctrl/Shift + drag: pan | Scroll: zoom

One finger: rotate | Two fingers: pan & pinch zoom

Try it! ⤳

Loading models...

Category:

Source:

Target:

Citation

If you find this work useful, please cite:

@inproceedings{Amoyal:CVPR:2026:GSA,
      title={{Cross-Instance Gaussian Splatting Registration via Geometry-Aware Feature-Guided Alignment}},
      author={Amoyal, Roy and Freifeld, Oren and Baskin, Chaim},
      year={2026},
      booktitle={CVPR},
}

Comparisons to other methods

Compare the renders and depth maps of our method CAT3D (right) with baseline methods (left). Try selecting different methods and scenes!

Baseline

Method overview

A diagram explaining the method in broad strokes, like explained in the caption.

CAT3D uses a multi-view latent diffusion model to generate novel views of the scene. This model can be conditioned on any number of observed views (input images with corresponding camera poses embedded as ray coordinates), and is trained to produce multiple consistent novel images of the scene at specified target viewpoints. This architecture is similar to video diffusion models, but with camera pose embeddings for each image instead of time embeddings. The generated views are passed into a robust 3D reconstruction pipeline to create the 3D representation (Zip-NeRF or 3DGS).

Acknowledgements

We would like to thank Daniel Watson, Rundi Wu, Richard Tucker, Jason Baldridge, Michael Niemeyer, Rick Szeliski, Dana Roth, Jordi Pont-Tuset, Andeep Torr, Irina Blok, Doug Eck, and Henna Nandwani for their valuable contributions to this work. We also extend our gratitude to Shlomi Fruchter, Kevin Murphy, Mohammad Babaeizadeh, Han Zhang and Amir Hertz for training the base text-to-image latent diffusion model.

BibTeX

@article{gao2024cat3d,
    title={CAT3D: Create Anything in 3D with Multi-View Diffusion Models},
    author={Ruiqi Gao* and Aleksander Holynski* and Philipp Henzler and Arthur Brussee and Ricardo Martin-Brualla and Pratul P. Srinivasan and Jonathan T. Barron and Ben Poole*
    },
    journal={Advances in Neural Information Processing Systems},
    year={2024}
}