Skip to main content

Researchers just solved AI’s biggest conundrum

The Harth Sleep-Shift Light Bulb running next to a bed.
Harth / Amazon

The large language models that power today’s chatbots like ChatGPT, Gemini, and Claude are immensely powerful generative AI systems, and immensely power-hungry ones to boot.

They apparently don’t need to be, as recent research out of University of California, Santa Cruz has shown that modern LLMs running billions of parameters can operate on just 13 watts of power without a loss in performance. That’s roughly the draw of a 100W light bulb, and a 50x improvement over the 700W that an Nvidia H100 GPU consumes.

Recommended Videos

“We got the same performance at way less cost — all we had to do was fundamentally change how neural networks work,” lead author of the paper, Jason Eshraghian, said. “Then we took it a step further and built custom hardware.” They did so by doing away with the neural network’s multiplication matrix.

Matrix multiplication is a cornerstone of the algorithms that power today’s LLMs. Words are represented as numbers and then organized into matrices where they are weighted and multiplied against one another to produce language outputs depending on the importance of certain words and their relationship to other words in the sentence or paragraph.

These matrices are stored on hundreds of physically separate GPUs and fetched with each new query or operation. The process of shuttling data that needs to be multiplied among the multitude of matrices costs a significant amount of electrical power, and therefore money.

To get around that issue, the UC Santa Cruz team forced the numbers within the matrices into a ternary state — every single number carried a value of either negative one, zero, or positive one. This allows the processors to simply sum the numbers instead of multiplying them, a tweak that makes no difference to the algorithm but saves a huge amount of cost in terms of hardware. To maintain performance despite the reduction in the number of operations, the team introduced time-based computation to the system, effectively creating a “memory” for the network, increasing the speed at which it could process the diminished operations.

“From a circuit designer standpoint, you don’t need the overhead of multiplication, which carries a whole heap of cost,” Eshraghian said. And while the team did implement its new network on custom FGPA hardware, they remain confident that many of the efficiency improvements can be retrofitted to existing models using open-source software and minor hardware tweaks. Even on standard GPUs, the team saw a 10 times reduction in memory consumption while improving operational speed by 25%.

With chip manufacturers like Nvidia and AMD continually pushing the boundaries of GPU processor performance, electrical demands (and their associated financial costs) for the data centers housing these systems have soared in recent years. With the increase in computing power comes a commensurate increase in the amount of waste heat the chips produce — waste heat that now requires resource-intensive liquid cooling systems to fully dissipate.

Arm CEO Rene Haas warned The Register in April that AI data centers could consume as much as 20-25% of the entire U.S. electrical output by the end of the decade if corrective measures are not taken, and quickly.

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Everything you need to know about OpenAI’s browser-based agent, Operator
Operator home screen

OpenAI has finally entered the agentic AI race with the release of its Operator AI in January. The agentic system is designed to work autonomously on its user's behalf and is primed to compete against already established industry rivals like Claude's Computer Use API and Microsoft's Copilot agents -- at least, once it sheds its "research preview" status. Here's everything you need to know about OpenAI's new agent and when you might be able to try it for yourself.
What is Operator?
OpenAI's Operator is an agent AI, meaning that it is designed to take autonomous action based on the information available to it. But unlike conventional programs, AI agents are able to review changing conditions in real-time and react accordingly, rather than simply execute predetermined commands. As such, AI agents are able to perform a variety of complex, multi-step tasks ranging from transcribing, summarizing and generating action items from a business meeting to booking the flight, hotel accommodations, and rental car for an upcoming vacation based on your family's various schedules to autonomously researching topics and assembling multi-page studies on those subjects.

Operator works slightly differently than other agents currently available. While Claude's Computer Use is an API and Microsoft's AI agents work within the Copilot chat UI itself, Operator is designed to, well, operate, within a dedicated web browser window that runs on OpenAI's servers and executes its tasks remotely. Your local web browser has nothing to do with the process and can be used normally even when Operator is running.

Read more
OpenAI’s rebrand is meant to make the company appear ‘more human’
OpenAI's new typeface OpenAI Sans

OpenAI has unveiled a rebrand that brings changes to its logo, typeface, and color palette. It is the company’s first rebrand since it became notable in 2022 with the popularity of its ChatGPT chatbot. 

OpenAI, Head of Design Veit Moeller, and Design Director Shannon Jager spoke with Wallpaper about the rebrand changes noting that the company aimed to create a “more organic and more human” image visual identity. This included collaborating with outside partners to develop a new typeface, OpenAI Sans that is unique to the brand. It is a look that “blends geometric precision and functionality with a rounded, approachable character,” OpenAI said in its mission statement.

Read more
Google puts military use of AI back on the table
First step of Gemini processing a PDF in Files by Google app.

On February 4, Google updated its “AI principles,” a document detailing how the company would and wouldn’t use artificial intelligence in its products and services. The old version was split into two sections: “Objectives for AI applications” and “AI applications we will not pursue,” and it explicitly promised not to develop AI weapons or surveillance tools.

The update was first noticed by The Washington Post, and the most glaring difference is the complete disappearance of any “AI applications we will not pursue” section. In fact, the language of the document now focuses solely on “what Google will do,” with no promises at all about “what Google won’t do.”

Read more