Abstract

We introduce a technique for pairwise registration of neural fields that extends classical optimization-based local registration (i.e. ICP) to operate on Neural Radiance Fields (NeRF) – neural 3D scene representations trained from collections ofcalibrated images. NeRF does not decompose illumination and color, so to make registration invariant to illumination, we introduce the concept of a “surface field” – a field distilled from a pre-trained NeRF model that measures the likelihood of a point being on the surface of an object. We then cast nerf2nerf registration as a robust optimization that iteratively seeks a rigid transformation that aligns the surface fields of the two scenes. We evaluate the effectiveness of our technique by introducing a dataset of pre-trained NeRF scenes – our synthetic scenes enable quantitative evaluations and comparisons to classical registration techniques, while our real scenes demonstrate the validity of our technique in real-world scenarios.

Surface Field

To enable renergy-based optimization for registration between NeRF scenes with different illumination, one cannot rely on radiance and instead needs to extract a geometric representation from NeRF that is independent of illumination and viewing direction. To address this we introduce surface field, a geometric representation that takes the value of 1 on object surfaces and 0 elsewhere.

Surface field is designed using NeRF's density field. The density of a point t measures the differential probability of hitting a particle at a point (view-independent). Transmittance is the probability that a ray hits no solid particle on its way to the point (view-dependent), and can be derived directly from density through integration along the ray. Using the multiplication theorem for independent events, we can then define the differential probability of hitting a surface while looking from a certain viewing direction as the product of density and transmittance (view-dependent):



To achieve view-independence, we define the surface field as the maximum of the likelihoods of hitting a surface given ray travelling from any camera o through the point x:



To obtain a conservative estimate of surface field we can threshold the field at ε:



As can be seen in the video above, the amount of random noise of density in the occluded area does not affect the surface field, and surface value remains stable even when the amplitude of random noise is big. Additionally it can be seen that the surface moves with the signal as the signal's bandwidth is changed.

Energy-based Optimization

By utilizing surface fields, we perform an energy-based optimization to find the optimal rigid registration to align target object in a pair of scenes. Our loss function consists of Keypoint Energy and Matching Energy. Keypoint energy provides an initial approximate solution by minimizing the ditance between manually annotated keypoint coordinates, and is then gradually annealed through the optimization. The robust matching energy compares the surface field of the two scenes on an active set of samples, given the current estimate of the rigid transform. To make the comparison robust to outliers, we utilize a robust kernel and apply it on our residuals.



Performing gradient-based optimization on a field with a co-domain of {0,1} is challenging, hence we smooth the surface field by convolving the categorical field with a zero-mean Gaussian of isotropic covariance matrix , and derive . The residual is then efined using this smooth surface field:



Sampling

A Metropolis-Hastings sampling algorithm is used to iteratively update set of samples. The sample set is initializaed by the keypoints and every N=20 iterations is updated by adding uniform noise to current set and accepting new points that are in close correspondence, not too close to current set of samples and are close to surface.


Results

Registration results:
BenchBustJar

Comparison to Fast Global Registration applied to point clouds extracted from NeRF estimated depth map:

Registration iterations:

Ablation Study:

Video

Citation

Acknowledgements


The website template was borrowed from Dor Verbin.