Skip to main content

Researchers just solved AI’s biggest conundrum

The Harth Sleep-Shift Light Bulb running next to a bed.
Harth / Amazon

The large language models that power today’s chatbots like ChatGPT, Gemini, and Claude are immensely powerful generative AI systems, and immensely power-hungry ones to boot.

They apparently don’t need to be, as recent research out of University of California, Santa Cruz has shown that modern LLMs running billions of parameters can operate on just 13 watts of power without a loss in performance. That’s roughly the draw of a 100W light bulb, and a 50x improvement over the 700W that an Nvidia H100 GPU consumes.

Recommended Videos

“We got the same performance at way less cost — all we had to do was fundamentally change how neural networks work,” lead author of the paper, Jason Eshraghian, said. “Then we took it a step further and built custom hardware.” They did so by doing away with the neural network’s multiplication matrix.

Matrix multiplication is a cornerstone of the algorithms that power today’s LLMs. Words are represented as numbers and then organized into matrices where they are weighted and multiplied against one another to produce language outputs depending on the importance of certain words and their relationship to other words in the sentence or paragraph.

These matrices are stored on hundreds of physically separate GPUs and fetched with each new query or operation. The process of shuttling data that needs to be multiplied among the multitude of matrices costs a significant amount of electrical power, and therefore money.

To get around that issue, the UC Santa Cruz team forced the numbers within the matrices into a ternary state — every single number carried a value of either negative one, zero, or positive one. This allows the processors to simply sum the numbers instead of multiplying them, a tweak that makes no difference to the algorithm but saves a huge amount of cost in terms of hardware. To maintain performance despite the reduction in the number of operations, the team introduced time-based computation to the system, effectively creating a “memory” for the network, increasing the speed at which it could process the diminished operations.

“From a circuit designer standpoint, you don’t need the overhead of multiplication, which carries a whole heap of cost,” Eshraghian said. And while the team did implement its new network on custom FGPA hardware, they remain confident that many of the efficiency improvements can be retrofitted to existing models using open-source software and minor hardware tweaks. Even on standard GPUs, the team saw a 10 times reduction in memory consumption while improving operational speed by 25%.

With chip manufacturers like Nvidia and AMD continually pushing the boundaries of GPU processor performance, electrical demands (and their associated financial costs) for the data centers housing these systems have soared in recent years. With the increase in computing power comes a commensurate increase in the amount of waste heat the chips produce — waste heat that now requires resource-intensive liquid cooling systems to fully dissipate.

Arm CEO Rene Haas warned The Register in April that AI data centers could consume as much as 20-25% of the entire U.S. electrical output by the end of the decade if corrective measures are not taken, and quickly.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Apple needs to fix the basics for macOS 26, or let AI run the show
Background apps on M4 MacBook Air.

The Mac apps community is a wonderful place to find utilities that can supercharge your computing experience. Alfred, Raycast, AlDente, and Rectangle are some of the most highly recommended apps for macOS users these days. The open-source community has also produced a few utilities (and their forks) that I use on a daily basis. 

If you read between the lines, you'll notice that these apps fill a functional gap that Apple has yet to offer natively. On the other side of the computing ecosystem, Windows has served those perks for years. Will the next big software upgrade, macOS 26, finally give users an in-house fix? We’ll only get the answer at WWDC 2025 in just over a week from now. 

Read more
Mountainhead creator says he ‘scraped AI companies back’ to make his movie
A group of four men pose for a photo on a mountain.

Mountainhead writer and director Jesse Armstrong has said he’s “pretty sure that the AI companies have been scraping my material along with everyone else’s to train their large language models,” and that to find the right voices for the movie’s tech-bro characters, “I’ve been scraping them back.” 

Mountainhead, which lands on HBO this weekend, is a dark satire about a group of tech billionaires who retreat to a secluded mountain lodge during a global crisis --  a crisis that’s exacerbated by their own creations, including highly convincing AI-generated deepfakes and a social media platform that fuels misinformation and instability.

Read more
Can AI really replace your keyboard and mouse?
Gemini Live on a phone atop a keyboard.

“Hey ChatGPT, left-click on the enter password field in the pop-up window appearing in the lower left quadrant of the screen and fill XUS&(#($J, and press Enter.” 

Fun, eh? No, thanks. I'll just move my cheap mouse and type the 12 characters on my needlessly clicky keyboard, instead of speaking the password out loud in my co-working space.

Read more