Skip to main content

No, ChatGPT isn’t going to cause another GPU shortage

ChatGPT is exploding, and the backbone of its AI model relies on Nvidia graphics cards. One analyst said around 10,000 Nvidia GPUs were used to train ChatGPT, and as the service continues to expand, so does the need for GPUs. Anyone who lived through the rise of crypto in 2021 can smell a GPU shortage on the horizon.

I’ve seen a few reporters build that exact connection, but it’s misguided. The days of crypto-driven-type GPU shortages are behind us. Although we’ll likely see a surge in demand for graphics cards as AI continues to boom, that demand isn’t directed toward the best graphics cards installed in gaming rigs.

Related Videos

Why Nvidia GPUs are built for AI

A render of Nvidia's RTX A6000 GPU.

First, we’ll address why Nvidia graphics cards are so great for AI. Nvidia has bet on AI for the past several years, and it’s paid off with the company’s stock price soaring after the rise of ChatGPT. There are two reasons why you see Nvidia at the heart of AI training: tensor cores and CUDA.

CUDA is Nvidia’s Application Programming Interface (API) used in everything from its most expensive data center GPUs to its cheapest gaming GPUs. CUDA acceleration is supported in machine learning libraries like TensorFlow, vastly speeding training and inference. CUDA is the driving force behind AMD being so far behind in AI compared to Nvidia.

Don’t confuse CUDA with Nvidia’s CUDA cores, however. CUDA is the platform that a ton of AI apps run on, while CUDA cores are just the cores inside Nvidia GPUs. They share a name, and CUDA cores are better optimized to run CUDA applications. Nvidia’s gaming GPUs have CUDA cores and they support CUDA apps.

Tensor cores are basically dedicated AI cores. They handle matrix multiplication, which is the secret sauce that speeds up AI training. The idea here is simple. Multiply multiple sets of data at once, and train AI models exponentially faster by generating possible outcomes. Most processors handle tasks in a linear fashion, while Tensor cores can rapidly generate scenarios in a single clock cycle.

Again, Nvidia’s gaming GPUs like the RTX 4080 have Tensor cores (and sometimes even more than costly data center GPUs). However, for all of the specs Nvidia cards have to accelerate AI models, none of them are as important as memory. And Nvidia’s gaming GPUs don’t have a lot of memory.

It all comes down to memory

A stack of HBM memory.
Wikimedia

“Memory size is the most important,” according to Jeffrey Heaton, author of several books on artificial intelligence and a professor at Washington University in St. Louis. “If you do not have enough GPU RAM, your model fitting/inference simply stops.”

Heaton, who has a YouTube channel dedicated to how well AI models run on certain GPUs, noted that CUDA cores are important as well, but memory capacity is the dominant factor when it comes to how a GPU functions for AI. The RTX 4090 has a lot of memory by gaming standards — 24GB of GDDR6X — but very little compared to a data center-class GPU. For instance, Nvidia’s latest H100 GPU has 80GB of HBM3 memory, as well as a massive 5,120-bit memory bus.

You can get by with less, but you still need a lot of memory. Heaton recommends beginners have no less than 12GB, while a typical machine learning engineer will have one or two 48GB professional Nvidia GPUs. According to Heaton, “most workloads will fall more in the single A100 to eight A100 range.” Nvidia’s A100 GPU has 40GB of memory.

You can see this scaling in action, too. Puget Systems shows a single A100 with 40GB of memory performing around twice as fast as a single RTX 3090 with its 24GB of memory. And that’s despite the fact that the RTX 3090 has almost twice as many CUDA cores and nearly as many Tensor cores.

Memory is the bottleneck, not raw processing power. That’s because training AI models relies on large datasets, and the more of that data you can store in memory, the faster (and more accurately) you can train a model.

Different needs, different dies

Hopper H100 graphics card.

Nvidia’s gaming GPUs generally aren’t suitable for AI due to how little video memory they have compared to enterprise-grade hardware, but there’s a separate issue here as well. Nvidia’s workstation GPUs don’t usually share a GPU die with its gaming cards.

For instance, the A100 that Heaton referenced uses the GA100 GPU, which is a die from Nvidia’s Ampere range that was never used on gaming-focused cards (including the high-end RTX 3090 Ti). Similarly, Nvidia’s latest H100 uses a completely different architecture than the RTX 40-series, meaning it uses a different die as well.

There are exceptions. Nvidia’s AD102 GPU, which is inside the RTX 4090 and RTX 4080, is also used in a small range of Ada Lovelace enterprise GPUs (the L40 and RTX 6000). In most cases, though, Nvidia can’t just repurpose a gaming GPU die for a data center card. They’re separate worlds.

There are some fundamental differences between the GPU shortage we saw due to crypto-mining and the rise in popularity of AI models. According to Heaton, the GPT-3 model required over 1,000 A100 Nvidia GPUs to trains and about eight to run. These GPUs have access to the high-bandwidth NVLink interconnect as well, while Nvidia’s RTX 40-series GPUs don’t. It’s comparing a maximum of 24GB of memory on Nvidia’s gaming cards to multiple hundreds on GPUs like the A100 with NVLink.

There are some other concerns, such as memory dies being allocated for professional GPUs over gaming ones, but the days of rushing to your local Micro Center or Best Buy for the chance to find a GPU in stock are gone. Heaton summed that point up nicely: “Large language models, such as ChatGPT, are estimated to require at least eight GPUs to run. Such estimates assume the high-end A100 GPUs. My speculation is that this could cause a shortage of the higher-end GPUs, but may not affect gamer-class GPUs, with less RAM.”

Editors' Recommendations

PowerPoint will use ChatGPT to create entire slideshows for you
Microsoft Copilot creating a PowerPoint presentation for a user.

Microsoft has revealed its thoughts on how artificial intelligence (AI) could shape how we work in the years to come -- and how it plans to help guide those changes. The announcement was made by Microsoft’s Satya Nadella and Jared Spataro at a company event titled The Future of Work with AI.

As the name suggests, the show was focused on how artificial intelligence (AI) could affect how we work, both now and in the future. More specifically, the tech giant discussed how it will add AI smarts into its suite of Office apps.

Read more
Microsoft’s Bing Chat waitlist is gone — how to sign up now
Microsoft Edge browser showing Bing Chat on an iPhone.

It appears Microsoft is doing away with the long Bing Chat waitlist. As originally reported by Windows Central, new users who sign up for the waitlist are immediately given access to the AI chatbot, without having to wait, and Digital Trends has confirmed this to be the case.

Microsoft hasn't officially killed the waitlist, but it should go away in short order. On Tuesday, Microsoft bolstered OpenAI's launch of the GPT-4 model by confirming that it was the model behind Bing Chat. Microsoft is also set to host an AI-focused event on Thursday, where we expect to hear about AI integrations in Microsoft's Office apps like Word and PowerPoint. It's possible Microsoft could remove the waitlist during the presentation.

Read more
Here’s how to rewatch the first public demo of ChatGPT-4
ChatGPT versus Google on smartphones.

OpenAI hosted a developer live stream that showed the first public demo of ChatGPT-4. The new Large Language Model (LLM) has reportedly been in development for a few years, and Microsoft confirmed it's the tech powering the company's new Bing Chat service.

The presentation started at 1 p.m. PT on Monday, March 14. OpenAI President and co-founder Greg Brockman led the presentation, walking through what GPT-4 is capable of, as well as its limitations. You can see a replay of the of the event below.

Read more