Russian and British scientists have developed a tool in Russian and English to evaluate the real performance of artificial intelligence models working with large amounts of data. The AIRI Institute explained that the development will help find optimal settings for language systems, which will simplify training and configuration. Yuri Kuratov, head of the AIRI research group, stated that the new criterion allows us to evaluate how models cope with tasks in long texts and in what aspects they need improvement.
The tool includes approximately twenty different tasks that require the AI to be able to connect disparate facts, draw logical conclusions, work with datasets, and perform basic calculations. Researchers from MIPT, the London Institute of Mathematical Sciences and SberDevices were involved in the development. For testing, we used extracts from fiction and the popular BABI dataset, which was originally designed to test understanding of logic and arithmetic in short texts.
Experiments have shown that popular AI models use only 10-20% of the available context, and the accuracy of their work decreases as task complexity and text volume increases. Scientists emphasize that this points to the need to develop methods for processing contextual information that will help create more efficient language models in the future.
Source: Ferra

I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.