This accelerator is designed to increase data center computing performance, especially when working with large language models (LLMs).
According to NVIDIA, the GB200 NVL2 accelerator provides a fivefold increase in the throughput speed of the Llama 3 model compared to the previous H100.
When searching a vector database, the speed is nine times faster than conventional processors, and the overall data processing performance is 18 times faster.
GB200 NVL2 supports up to 960 GB LPDDR5X RAM with up to 1024 GB/s bandwidth and up to 384 GB VRAM with up to 16 TB/s bandwidth.
Accelerator performance indicators include: FP4 (tensor cores) reaches 40 PFLOPS, FP8/FP6 (tensor cores) – 20 PFLOPS, INT8 (tensor cores) – 20 POPS, FP16/BF16 (tensor cores) – 10 PFLOPS, TF32 (tensor cores) – 5 PFLOPS, FP32 – 180 TFLOPS, FP64/FP64 (tensor cores) – 90 TFLOPS.
One of the key features of the new product is key value (KV) caching technology, which increases information throughput speed by saving context and query history.
Using high-speed NVLink-C2C interconnects between the core processor and GPU, data transfer speeds are up to seven times faster than PCIe.
Source: Ferra

I am a professional journalist and content creator with extensive experience writing for news websites. I currently work as an author at Gadget Onus, where I specialize in covering hot news topics. My written pieces have been published on some of the biggest media outlets around the world, including The Guardian and BBC News.