The first multimodal AI model was presented in Russia; AIRI Artificial Intelligence Institute developed OmniFusion 1.1 and opened its source code. A language model that can support visual dialogue and answer image-based questions can also be used for commercial purposes. This was reported by the AIRI press service.
Join
OmniFusion is a multimodal artificial intelligence model. It is designed to expand the capabilities of conventional language processing systems through images and, in the future, audio, 3D and video materials.
OmniFusion 1.1 Multimodal Specifics
The model architecture is based on a method that combines a large pre-trained LLM and special visual encoders that encode image information into a numerical vector. It’s called embedding.
Foreign analogues of OmniFusion are products such as LLaVA, Gemini, GPT4-Vision and the Chinese Qwen, DeepSeek and LVIS.
Features of OmniFusion 1.1
The model recognizes and describes the image. Thus, the user can, for example, upload a photo and the system will provide a recipe for the dish that appears in it. You can also analyze a map of the premises or discover how to assemble a device from photographs of its individual components.
The model also recognizes text as standard. At the same time, he knows how to solve logical problems. Using the model, you can solve a mathematical example written on the board, or recognize a formula and obtain its representation in LaTeX format.
How OmniFusion 1.1 Was Taught
The quality of the model was evaluated on different versions of its architecture using eight benchmarks (specialized texts to analyze the effectiveness of AI models in answering visual questions).
Tests have shown that OmniFusion shows results on major benchmarks that are not inferior to foreign competitors.
The open source code of the model is published on the Github platform.
Author:
Natalia Gormaleva
Source: RB

I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.