A recent report from David Kanter of Real World Technologies investigates why graphics chips provided by Nvidia — namely those based on the company’s Maxwell and Pascal architectures — perform better than their peak theoretical numbers, and why they’re more efficient than competing graphics chips. In a nutshell, the pixel output on Nvidia GPUs is buffered by what’s called tile-based immediate-mode rasterizers, which is fast and power-efficient. The competitor’s graphics chips rely on slower, conventional full-screen immediate-mode rasterizers.
According to Kanter, tile-based rasterization has been around since the 1990s, first popping up in the PowerVR architecture and adopted by ARM and Qualcomm in their mobile processors’ GPUs. Up until Nvidia introduced this system into its Maxwell GM20x architecture, tile-based rasterization wasn’t successfully implemented into desktop graphics chips.
Tile-based rasterization essentially means that each triangle-based, three-dimensional scene is split up into tiles, and each tile is broken down (rasterized) into pixels on the graphics chip itself to be “printed” on a two-dimensional screen. By contrast, full-screen immediate-mode rasterizers use more memory and more power by breaking down the entire scene into pixels in one pass (or scan).
“Using tiled regions and buffering the rasterizer data on-die reduces the memory bandwidth for rendering, improving performance and power-efficiency,” Kanter explains. “Consistent with this hypothesis, our testing shows that Nvidia GPUs change the tile size to ensure that the pixel output from rasterization fits within a fixed size on-chip buffer or cache.”
Kanter explains that mobile GPUs from the likes of Apple and other device makers use a method called tile-based deferred rendering where geometry and pixel-based work is done in two separate passes. The scene is divided into tiles, triangles are processed for each tile at once, and then pixel shading for each tile occurs after that.
However, Nvidia is reportedly using a tile-based “immediate” technique in its desktop GPUs that divides the screen up into tiles, and then rasterizes small batches of triangles within the tile. The triangles are typically buffered or cached on-chip, he says, which in turn improves performance and saves power.
In a demonstration using a tool called Triangles. HLSL running on an AMD Radeon HD 6670 GPU and Windows 10, he shows how AMD’s graphics chip renders twelve identical, flat objects on the screen, moving from right to left and line by line until they’re rendered one by one from the top of the screen to the bottom, overwriting each other. He revealed this technique by moving a slider that sets the number of pixels that can be rendered on the screen. Just imagine an invisible printer going back and forth across the screen quicker than the human eye can fully detect.
After revealing AMD’s current draw technique, the demonstration moves to a different system using the same tool, Windows 10, and a Nvidia GeForce GTX 970 graphics card. Here you’ll notice that when the rendering process is paused, the stacked twelve objects are rendered simultaneously, with two completed tiles on the left and five more tiles appearing in various states in a checkerboard pattern to the right. Overall, the rasterization path is left to right, and top to bottom.
That all said, Nvidia fully rasterizes one tile containing a portion of all objects before moving on to the next tile. AMD, on the other hand, rasterizes each object in a printer-type fashion from top to bottom first before going back to the beginning and rendering the next object. Things get even more interesting when Nvidia’s GeForce GTX 970 is installed into the test bed, revealing even larger tiles with a different pattern.
To check out this latest investigation, be sure to hit the video embedded above for the full 19:45 demonstration.