Researchers believe that multimodal AI, which combines various input methods such as text, audio, image and video, is an important step towards creating general purpose AI that can perform human-level tasks.
Microsoft trained Kosmos-1 using data from the Internet. After training, they evaluated Kosmos-1’s ability in a variety of tests, including language comprehension, language rendering, OCR-free text classification, picture captioning, visual question answering, and web page question answering.
According to Microsoft, the Kosmos-1 outperformed current models in most of these tests.
Source: Ferra

I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.