Main » News and comments » 2023 » Meta Launches a Multi-Sensor Generative Artificial Intelligence Model

Meta Launches a Multi-Sensor Generative Artificial Intelligence Model


Meta has announced a new open source AI model that ties together multiple streams of data, including text, audio, visual data, temperature, and motion readings.

For now, the model is only a research project with no direct consumer or practical application, but it points to the future of generative AI systems that can create immersive multi-sensory experiences, and shows that Meta continues to share AI research at a time when competitors like OpenAI and Google are becoming increasingly closed.

The basic concept of exploration is to combine multiple data types into a single multidimensional index (or "space embedding" in AI parlance). This idea may seem a little abstract, but it is this concept that is at the heart of the observed boom of generative AI systems.

For example, artificial intelligence image generators such as DALL-E, Stable Diffusion, and Midjourney rely on systems that link text and images together during the training phase. They look for patterns in visual data and associate that information with image descriptions. This is what allows these systems to generate images that follow user input. The same applies to many AI tools that generate video or audio in a similar way.

Meta says its ImageBind model is the first to combine six data types into a single embedding space. Six types of data included in the model: visual (both image and video); thermal (infrared images); text; audio; depth information; and, most intriguingly, the motion readings generated by the Inertial Measurement Unit, or IMU. (IMUs can be found in phones and smartwatches, where they are used for a range of tasks, from switching your phone from landscape to portrait to distinguishing between different types of physical activity.)

The idea is that future AI systems will be able to refer to this data in the same way that modern AI systems do for text input. Imagine, for example, a futuristic virtual reality device that not only generates audio and visual input, but also your surroundings and movements on a physical platform. You can ask it to simulate a long sea voyage, and it will not only put you on a ship with the sound of the waves in the background, but also the deck swaying under your feet and the cool breeze of the ocean air.

In a blog post, Meta notes that a different stream of sensory information could be added to future models, including "touch, speech, smell, and brain MRI signals." It also claims that the study "brings machines closer to humans' ability to learn simultaneously, holistically, and directly from many different forms of information."

Opponents of open source, such as OpenAI, say the practice is harmful to creators because competitors can copy their work and that it could be potentially dangerous, allowing attackers to take advantage of modern AI models. Proponents respond that open source allows third parties to scrutinize systems for bugs and fix some of their shortcomings. They note that it could even bring commercial benefits, as it essentially allows companies to hire third-party developers as free workers to improve their work.


Read also:

Heads of the Largest IT Companies Were Called to the White House to Discuss the Security of AI

The Creator of Artificial Intelligence Warned of its Danger and Quit Google