The model is focused on tasks where context and details are important: not only “look”, but realizes what is happening on the screen.

The Eagle 2.5 allows the user to find the right moment in the video and explains it with the text. For example, “Show the moment a person gets the key from his pocket.” And the model itself will find a suitable section.

He showed confident results in the comparisons: 74.8 points in MVBench, 77.6 in MLVU and 66.4 in Longvideobench. In the tasks of understanding the images of the Eagle 2.5, I got 94.1 in Docvqa, 87.5 in Chartqa and 80.4 in Infovqa.

According to Nvidia, the model shows good scalability and compete with giants such as GPT-4O from Openai and QWEN2.5-VL-72B from Alibaba. This is another step of “green” towards the strong multimodal and solutions.

Source: Ferra

Previous articleHonor, RTX 5070 and 16 “MagicBook Pro 16 Hunter Edition game with distributors and tablets will release the laptop 24 April 2025, 07:34
Next articleMaternal boards Asus, MSI and Gigabyte 24 April 2025, 07:50
I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.

LEAVE A REPLY

Please enter your comment!
Please enter your name here