Authors: Toon Van de Maele, Ozan Catal, Alexander Tschantz, Christopher L. Buckley, Tim Verbelen
In recent years, computer vision has made remarkable progress in understanding 3D scenes, fueled by innovations like Neural Radiance Fields (NeRF) and Gaussian Splatting. These methods are the backbone of applications, ranging from virtual reality to robotics, where creating realistic 3D models from simple 2D images is key. However, one major challenge remains unsolved: how can systems continually learn from new data without forgetting what they already know?
Our new method, Variational Bayes Gaussian Splatting (VBGS), offers a promising solution. Designed with continual learning in mind, VBGS enables machines to integrate new data efficiently and adapt to dynamic environments without needing massive computational resources or memory. Let’s explore what makes VBGS special and why it’s a leap forward for 3D scene understanding.
Gaussian Splatting (3DGS) represents a 3D scene as a mixture of ellipsoids—or Gaussians—floating in space. These Gaussians capture details like color, position, and transparency, which can be rendered into 2D images from different angles. Traditional 3DGS methods rely on differentiable rendering pipelines that backpropagate gradients to optimize the Gaussian parameters. While this works well in static settings, it struggles in real-world scenarios where data arrives continuously, like in a self-driving car collecting sensory information on the go.
Why? Because these methods tend to “catastrophically forget” older data when new information is added—a major bottleneck for real-time applications. To mitigate forgetting, methods backpropagating gradients rely on replaying previously seen data from a replay buffer —a costly solution in terms of memory and computation.
Instead of treating Gaussian parameters as parameters to be optimized, VBGS represents them probabilistically. Each Gaussian’s attributes—like color, position, and scale—are treated as random variables. This gives VBGS the flexibility to maintain uncertainty about its parameters and continuously refine them as new data arrives. It also turns “learning” into a problem of variational inference to estimate a posterior distribution over these parameters. Given that a Gaussian splat is a Gaussian Mixture Model (GMM), we choose a Normal-Inverse-Wishart (NIW) prior which is a conjugate prior for the Gaussian likelihood, which enables closed-form updates for Coordinate Ascent Variational Inference (CAVI).
The use of conjugate priors ensures that each update step has a closed-form solution, avoiding the need for expensive numerical optimization. This is in stark contrast to gradient-based approaches, which require iterative backpropagation and are sensitive to learning rates and initialization.
By turning training into an inference problem, VBGS achieves several benefits:
Because these update steps naturally accumulate information rather than overwriting it, VBGS can seamlessly integrate new observations while retaining prior knowledge. This eliminates the need for replay buffers and ensures that the model retains its performance on earlier data, making it inherently robust to catastrophic forgetting.
To illustrate this we fit a Gaussian splat on a single image. However, instead of training on all pixel data, we feed pixels patch per patch and run a single VBGS update per patch. When we update the Gaussian splat using backpropagation without a replay buffer, we clearly see how information from the first patches is “forgotten” as the final patches are used for learning. The VBGS model however correctly accumulates information and yields a correct fit.
In our experiments, VBGS yields comparable performance compared to gradient-based optimization of 3DGS on static datasets. For example, these are novel view renders from a VBGS model of 100k Gaussian components, updated in a single CAVI step.
Although the reconstruction performance of VBGS (measured in PSNR) is typically comparable, but not superior, to 3DGS on static datasets, this comparison misses the point. VBGS was designed to excel in streaming and continual learning settings, where 3DGS struggles. Imagine training a robot to navigate a building. As the robot moves, it sees new parts of the environment and updates its internal map. With 3DGS, the robot would need to constantly reprocess past observations to ensure it doesn’t overwrite its memory. This not only consumes time but also requires massive amounts of storage.
VBGS eliminates this need. By leveraging the Bayesian nature of its updates, it can integrate new observations without revisiting old ones. This makes it a perfect fit for applications like Simultaneous Localization and Mapping (SLAM), where efficient, real-time updates are critical. To demonstrate this, we continually update a Gaussian splat using VBGS on RGBD data captured by a simulated robot in the Habitat Rearrange environment. As the robot navigates and captures the space, it gradually builds a full 3D representation of the room.
On the right, we reconstruct to opposing views from Room 0 in the Replica dataset, which get filled in as the robot explores the scene. As the robot turns around, it fills in the view on the windows, while retaining a good representation of the door.
VBGS represents a paradigm shift in how we think about 3D learning. While still evolving, VBGS signals a shift in focus toward more adaptive, memory-efficient, and robust learning systems. From robots that learn as they explore to augmented reality systems that update in real-time, VBGS has the potential to transform how machines perceive and interact with the world. It challenges the prevailing assumptions of static training pipelines and highlights the potential of probabilistic methods in dynamic, real-world scenarios.
The paper may be found at https://arxiv.org/abs/2410.03592