3 min read

Gradient-free variational learning with conditional mixture networks

Gradient-free variational learning with conditional mixture networks

Authors: Conor Heins, Hao Wu, Dimitrije Markovic, Alexander Tschantz, Jeff Beck, Christopher Buckley

Deep Learning (DL), enables many incredible capabilities such as image recognition, natural language processing, and content generation and results are fast and accurate.  However, it uses a technique called Maximum Likelihood Estimation (MLE) and so the outputs tend to be the most likely prediction based on their training data (hence maximum likelihood) and exclude a confidence, or certainty, qualifier on the accuracy of the answer.  Uncertainty and probability estimation, which are critical for conditions that are complex or unpredictable such as autonomous vehicles, financial modeling, health diagnostics, and high risk decision-making, is something that DL does quite poorly.

A recent paper by VERSES Research Lab titled Gradient-free variational learning with conditional mixture networks (accepted at NeurIPS 2024 Bayesian Decision-Making and Uncertainty Workshop, and invited to give a lightning talk) explores a novel method called CAVI-CMN (Coordinate Ascent Variational Inference for Conditional Mixture Networks) which tackles the significant challenge of making machine learning models more reliable by allowing them to understand what they don't know.

Existing Bayesian methods such as HMC (Hamiltonian Monte Carlo) and BBVI (Black-Box Variational Inference) provide a way to build models that can quantify uncertainty.  Instead of providing one answer, it generates a range of possibilities and assigns a likelihood or probability to each possibility being correct.  However, because these Bayesian methods can be computationally expensive (which follows from having to calculate myriad possibilities which grow exponentially with complexity), they famously struggle with speed and scalability.

Enter CAVI-CMN which is demonstrably faster than other Bayesian methods while maintaining comparable accuracy and uncertainty quantification. CAVI-CMN provides a more efficient way to bring the benefits of Bayesian modeling of uncertainty to existing deep learning models.  CAVI-CMN is a gradient-free learning algorithm specifically designed to optimize Conditional Mixture Networks (CMN) which one can think of like teams of specialized experts, each tackling a different aspect of a problem.  CAVI makes CMNs more efficient by letting those teams learn quickly and effectively even with limited computing power and examples and, critically, adds the ability to express uncertainty.

CAVI-CMN, like the other Bayesian methods, did consistently better than maximum likelihood estimation (i.e., training CMNs through backpropagation) when it comes to calibration (i.e., uncertainty quantification).  And CAVI-CMN does this while not only matching the accuracy of other top performing methods, but also running much faster than the competing Bayesian methods like black-box variational inference (BBVI) and No U-Turn Sampler (NUTS).

CAVI-CMN was benchmarked on six (6) real-world datasets from the UCI Machine Learning Repository which includes tasks like vehicle classification, breast cancer diagnosis, and sonar signal analysis.  Findings across the benchmarks covered in the paper are:

  • Same speed or faster (up to 3x) compared to MLE / DL / Backpropagation
  • 10 - 50x faster than gradient-based Bayesian methods
  • Test accuracy is comparable to architecture-matched feedforward neural networks
  • Predictions are 4 – 40x better calibrated than those made by feedforward networks (varies based on dataset)
  • CAVI-CMN converges in 10-100x fewer iterations compared to other methods

CAVI-CMN is ongoing research that employs a radically different approach to conventional deep learning and signals a step towards more reliable and trustworthy machine intelligence capable of making the right decisions for a given context. It also paves the way for fast systems that smartly choose data to sample and thus reduce their own uncertainty efficiently (i.e. active sampling).

Following are high resolution diagrams of the benchmarked datasets.


Diagram CAVI waveform performance and runtime plot
Diagram CAVI connectionist performance and runtime plot
Diagram CAVI connectionist performance and runtime plot-1
Diagram CAVI statlog performance and runtime plot
Diagram CAVI banknote performance and runtime plot
Diagram CAVI breast cancer performance and runtime plot
Diagram CAVI pinwheel performance and runtime plot

 

One limitation with CAVI-CMN is that performance can slow as complexity increases. Researchers contemplated several strategies for handling this scaling issue and because each has pros and cons, the best approach can depend on the specific problem space.  For a real-time self-driving car, fast decision making may be more important than accuracy up to a point, whereas with a medical diagnosis, accuracy may be more important than speed.

The researchers used several datasets including a particularly tricky one called the Pinwheel in which data points are arranged in a complex interwoven pattern.  Imagine trying to make sense of a really chaotic dance floor.  CAVI-CMN was able to learn the complex relationships in the data.

CAVI-CMN was also tested on six real-world datasets from the UCI Machine Learning Repository which includes vehicle classification, breast cancer diagnosis, and sonar signal analysis.  One standout is the Vehicle Silhouettes dataset where the goal is to classify images of vehicles e.g. bus, van, car.  A real world application of the dataset is self-driving cars where misclassification could have dire consequences and measuring uncertainty level is critical.  CAVI-CMN shined by consistently outperforming the standard approaches.

On the breast cancer diagnosis dataset where stakes are high and a false positive or a false negative can be a matter of life and death CAVI-CMN provided more accurate and reliable predictions as a result of factoring in nuance.

The SONAR dataset is applicable to ocean exploration where a signal bounces off the ocean floor to help identify depth and formations.  When dealing with low resolution or partial data, measuring confidence and nuance is key to making informed decisions and CAVI-CMN stood out for overall accuracy.

  • NUTS-HMC and BBVI are state-of-the-art Bayesian methods used as our “baselines" to ensure CAVI-CMN has the probabilistic benefits of being Bayesian.
  • MLE (Maximum Likelihood Estimation) is equivalent to traditional Deep Learning via Backpropagation.  Comparing CAVI-CMN to MLE is the critical comparison to verify CAVI-CMN scales, is comparably fast in terms of training, while also being accurate.

The paper may be found at https://arxiv.org/pdf/2408.16429