Blog | VERSES

A More Versatile, Efficient, Physics Foundation for Next-Gen AI

Written by VERSES | Aug 2, 2024 5:58:49 PM

New research led by Karl Friston showcases new foundation for AI that achieves 99% accuracy with 90% less data on popular MNIST benchmark.

A team led by VERSES Chief Scientist Professor Karl Friston has published a new paper titled, “From pixels to planning: scale-free active inference”, that introduces an efficient alternative to deep learning, reinforcement learning and generative AI called Renormalizing Generative Models (RGMs) that address foundational problems in artificial intelligence (AI), namely versatility, efficiency, explainability, and accuracy, using a physics-based approach.

"Active inference" is a framework with origins in neuroscience and physics that describes how biological systems, including the human brain, continuously generate and refine predictions based on sensory input with the objective of becoming increasingly accurate.  While the science behind active inference has been well established and is considered to be a promising alternative to state of the art AI, it has not yet demonstrated a viable pathway to scalable commercial solutions until now. RGMs accomplish this using a “scale-free” technique that adjusts to any scale of data.

“RGMs are more than an evolution; they’re a fundamental shift in how we think about building intelligent systems from first principles and it could be the ‘one method to rule them all.’"

Gabriel René, CEO of VERSES

RGMs can model physics and learn the causal structure of information.  This enables one to develop a multimodal agent that can model information in space and time, recognize objects, sounds and activities as well as plan and make complex decisions based on real world understanding—all from the same underlying model. “This has the potential to dramatically scale AI development, expanding its capabilities, while reducing its cost,” René added.

The paper describes how Renormalized Generative Models using active inference were able to perform many of the fundamental learning tasks that today require multiple independent AI models such as object recognition, image classification, natural language processing, content generation, file compression and more.  RGMs are a versatile “universal architecture” that can be configured and reconfigured to perform any or all of the same tasks as today’s AI but with greater efficiency.  The paper describes how an RGM achieved 99.8% accuracy on a subset of the MNIST digit recognition task, a common benchmark in machine learning, using only 10,000 training images (90% less data).  Sample and compute efficiency translates directly into cost savings and development speed for businesses building and employing AI systems.  Upcoming papers are expected to further demonstrate the effective and efficient learning of RGMs and related research applied to MNIST and other industry standard benchmarks such as the Atari Challenge.

"The brain is incredibly efficient at learning and adapting.  The mathematics in the paper offer a proof of principle for a scale-agnostic, algorithmic approach to replicating human-like cognition in software."

Professor Karl Friston

Instead of conventional brute-force training on a massive number of examples, RGMs “grow” by learning about the underlying structure and hidden causes of their observations.  "The inference process itself can be cast as selecting (the right) actions that minimize the energy cost for an optimal outcome,” Friston continued.

Your brain doesn't process and store every pixel independently; instead it “coarse-grains” patterns, objects, and relationships from a mental model of concepts - a door handle, a tree, a bicycle.  RGMs likewise break down complex data like images or sounds into simpler, compact, hierarchical components and learn to predict these components efficiently, reserving attention for the most informative or unique details.  For example, driving a car becomes “second nature” when we’ve mastered it well enough such that the brain is primarily looking for anomalies to our normal expectations.

By way of analogy, Google Maps is made up of an enormous amount of data, estimated at many thousands of terabytes, yet it renders viewports in real time even as users zoom in and out to different levels of resolution.  Rather than render the entire data set at once, Google Maps serves up a small portion at the appropriate level of detail.  Similarly, RGMs are designed to structure and traverse data such that scale – that is, the amount, diversity, and complexity of data – is not expected to be a limiting factor.

“Within Genius, developers will be able to create a variety of composable RGM agents with diverse skills that can be fitted to any sized problem space, from a single room to an entire supply network, all from a single architecture,” says Hari Thiruvengada, VERSES's Chief Product Officer.

Further validation of the findings in the paper is required and expected to be presented in future papers slated for publication this year.  Thiruvengada adds, “We’re optimistic that RGMs are a strong contender for replacing deep learning, reinforcement learning, and generative AI.”