Company video cardsNVIDIA, introduced on September 20 a series RTX4000. During the conference, we saw the RTX 4090 and 4080 along with the architecture that brought them to life: Ada Lovelace. These GPU promise to be GeForce the strongest in history.
If we’ve already talked about everything, we needed to talk about the RTX 4000: models, release date, prices, specifications… Now it’s time to focus on its architecture, which makes these graphics cards unique.
The heart of the GeForce RTX 4090 is the giant AD102 silicon. This chip, built on a 4nm process, has an area of 608mm² and contains 76.3 billion transistors.
And the good news is that we can now take a closer look at the AD102 block diagram at the silicon level, which includes the introduction of several new components.
Thus, the new NVIDIA GeForce architecture
AD102 has an interface for PCI-Express 4.0 x16 and 384-bit GDDR6X memory interface. The Gigathread mechanism acts as the main component of silicon resource allocation.
Ada introduces the Optical Flow Accelerator, an essential component of DLSS 3 for full frame rendering. without interference from the graphics rendering engine.
Chip has Twice as many hardware media encoding engines as Ampere, including hardware accelerated AV1 encoding/decoding. Multiple accelerators allow multiple video streams to be transcoded (great for content creators).
The core graphics rendering components of the AD102 are the GPCs (Graphics Processing Groups). There are 12 of them, compared to 7 in the previous generation GA102.. Each GPC shares a rasterization engine and rendering backends with six TPCs (Texture Processing Clusters).
Each TPC contains two SMs (Streaming Multiprocessors), the indivisible computing engine of the NVIDIA GPU. SM is where NVIDIA implements the greatest architectural innovations and extracts performance what is expected
Each SM contains a third-generation RT core, 128 KB L1 cache, and four TMUs., among four clusters, each containing 16 CUDA FP32 cores, 16 CUDA cores, 4 load/storage modules, a tiny L0 cache; log file and most importantly the fourth generation Tensor Core.
So each SM contains a total of 128 CUDA cores, 4 Tensor cores, and one RT core. There are 12 SMs per GPC, i.e. 1536 CUDA cores, 48 Tensor cores and 12 RT cores per GPC. That is, twelve GPCs give a total of 18,432 CUDA cores, 576 Tensor cores, and 144 RT cores.
Then each GPC contributes 16 ROPs, so there are a total of 192 ROPs on the chip. The L2 cache serves as a place for communication between various GPCs, memory controllers, and the PCIe host interface.
NVIDIA did not mention the size of this L2 cache, but it is said to be significantly larger than the previous generation. and that it plays an important role in lubricating the memory subsystem enough for NVIDIA to maintain the same 21Gb/s 384-bit data transfer rate as the previous generation.
NVIDIA is introducing Shader Execution Reordering (SER), a new technology that reorganizes math workloads to match each worker thread so they are handled more efficiently by SIMD components.
This is expected to have a particularly strong impact on the rendering of ray-traced games. In Cyberpunk 2077, with the new Overdrive graphics preset, which significantly increases per-pixel RT calculations, SER improves performance by up to 44%.
NVIDIA has hard work to justify their new generation after two years of shortages, exorbitant prices and scarce information. Of course, launching these first high-end models at official prices that didn’t exist before is not the best idea.
Source: Computer Hoy
