As part of the collaboration, Apple integrated ReDrafter with Nvidia TensorRT-LLM, a framework designed to accelerate LLM performance on the company’s graphics cards. This integration made it possible to increase the speed of marker creation by 2.7 times when testing a model with tens of billions of parameters. The new system reduces latency and power consumption by using fewer GPUs.
Nvidia has made changes to its operators to better support ReDrafter, allowing machine learning developers to take advantage of faster token generation.
Source: Ferra

I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.