Skip to main content

Analysis reveals why Nvidia graphics chips are power-saving performance beasts

A recent report from David Kanter of Real World Technologies investigates why graphics chips provided by Nvidia — namely those based on the company’s Maxwell and Pascal architectures — perform better than their peak theoretical numbers, and why they’re more efficient than competing graphics chips. In a nutshell, the pixel output on Nvidia GPUs is buffered by what’s called tile-based immediate-mode rasterizers, which is fast and power-efficient. The competitor’s graphics chips rely on slower, conventional full-screen immediate-mode rasterizers.

According to Kanter, tile-based rasterization has been around since the 1990s, first popping up in the PowerVR architecture and adopted by ARM and Qualcomm in their mobile processors’ GPUs. Up until Nvidia introduced this system into its Maxwell GM20x architecture, tile-based rasterization wasn’t successfully implemented into desktop graphics chips.

Recommended Videos

Tile-based rasterization essentially means that each triangle-based, three-dimensional scene is split up into tiles, and each tile is broken down (rasterized) into pixels on the graphics chip itself to be “printed” on a two-dimensional screen. By contrast, full-screen immediate-mode rasterizers use more memory and more power by breaking down the entire scene into pixels in one pass (or scan).

“Using tiled regions and buffering the rasterizer data on-die reduces the memory bandwidth for rendering, improving performance and power-efficiency,” Kanter explains. “Consistent with this hypothesis, our testing shows that Nvidia GPUs change the tile size to ensure that the pixel output from rasterization fits within a fixed size on-chip buffer or cache.”

Kanter explains that mobile GPUs from the likes of Apple and other device makers use a method called tile-based deferred rendering where geometry and pixel-based work is done in two separate passes. The scene is divided into tiles, triangles are processed for each tile at once, and then pixel shading for each tile occurs after that.

However, Nvidia is reportedly using a tile-based “immediate” technique in its desktop GPUs that divides the screen up into tiles, and then rasterizes small batches of triangles within the tile. The triangles are typically buffered or cached on-chip, he says, which in turn improves performance and saves power.

In a demonstration using a tool called Triangles. HLSL running on an AMD Radeon HD 6670 GPU and Windows 10, he shows how AMD’s graphics chip renders twelve identical, flat objects on the screen, moving from right to left and line by line until they’re rendered one by one from the top of the screen to the bottom, overwriting each other. He revealed this technique by moving a slider that sets the number of pixels that can be rendered on the screen. Just imagine an invisible printer going back and forth across the screen quicker than the human eye can fully detect.

After revealing AMD’s current draw technique, the demonstration moves to a different system using the same tool, Windows 10, and a Nvidia GeForce GTX 970 graphics card. Here you’ll notice that when the rendering process is paused, the stacked twelve objects are rendered simultaneously, with two completed tiles on the left and five more tiles appearing in various states in a checkerboard pattern to the right. Overall, the rasterization path is left to right, and top to bottom.

That all said, Nvidia fully rasterizes one tile containing a portion of all objects before moving on to the next tile. AMD, on the other hand, rasterizes each object in a printer-type fashion from top to bottom first before going back to the beginning and rendering the next object. Things get even more interesting when Nvidia’s GeForce GTX 970 is installed into the test bed, revealing even larger tiles with a different pattern.

To check out this latest investigation, be sure to hit the video embedded above for the full 19:45 demonstration.

Kevin Parrish
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
I wish Nvidia had waited with the RTX 50-series
The RTX 5090 sitting on top of the RTX 4080.

As a huge GPU enthusiast, a new launch is basically a party for me -- and it's a party that often lasts for months. With both Nvidia and AMD staggering their product releases, there's plenty to get excited for as new models keep getting added to our ranking of the best graphics cards every few weeks.

This time, the excitement is only true on paper. In fact, I'm beginning to feel disappointed. After waiting for Blackwell for a long time, it's starting to feel like it would've been better if we waited a little bit longer.
The GPU shortage is back

Read more
The competition between AMD and Nvidia is finally heating up
Two RTX 4070 Ti Super graphics cards sitting next to each other.

Nvidia opened this year with two of the best graphics cards, but AMD largely stayed silent. Now, for the first time in 2025, the competition will start heating up. Yesterday, Nvidia announced the release date for its next GPU, the RTX 5070 Ti. AMD immediately struck back with an important announcement about the RX 9070 XT. Here's what we know.
Nvidia's RTX 5070 Ti is almost here
Nvidia's RTX 5070 Ti is right around the corner, and it marks yet another win for leakers who predicted the release date correctly. Set to arrive on February 20, the GPU will start at $749, but realistically, finding one at MSRP (recommended list price) might be tough. Prices aside, the RTX 5070 Ti will feature 8,960 CUDA cores, a boost clock of up to 2.45GHz, and 16GB of GDDR7 VRAM across a 256-bit memory bus alongside a 300-watt TGP.

https://x.com/NVIDIAGeForce/status/1890038221314077048

Read more
Indiana Jones and the Great Circle proves Nvidia wrong about 8GB GPUs
Indiana jones buried in the sand.

Nvidia was wrong, and Indiana Jones and the Great Circle is proof of that. Despite being a game that's sponsored by Nvidia due to its use of full ray tracing -- which is said to arrive on December 9 -- multiple of Nvidia's best graphics cards struggle to maintain a playable frame rate in the game, and that largely comes down to VRAM.

Computer Base tested a swath of GPUs in the game across resolutions with the highest graphics preset, and one consistent trend emerged. Any GPUs packing less than 12GB of VRAM couldn't even maintain 30 frames per second (fps) in the game at its highest graphics settings. That led to some wild comparisons as you can see in the chart below. The Intel Arc A770, for example, which is a budget-focused 1080p graphics card, beats the RTX 3080, which was the 4K champion when it launched. Why? The A770 has 16GB of VRAM, while the RTX 3080 has 10GB.

Read more