Skip to main content
  1. Home
  2. Computing
  3. Features

I tested the future of AI image generation. It’s astoundingly fast.

Add as a preferred source on Google
Imagery generated by HART.
MIT / HART

One of the core problems with AI is the notoriously high power and computing demand, especially for tasks such as media generation. On mobile phones, when it comes to running natively, only a handful of pricey devices with powerful silicon can run the feature suite. Even when implemented at scale on cloud, it’s a pricey affair.

Nvidia may have quietly addressed that challenge in partnership with the folks over at the Massachusetts Institute of Technology and Tsinghua University. The team created a hybrid AI image generation tool called HART (hybrid autoregressive transformer) that essentially combines two of the most widely used AI image creation techniques. The result is a blazing fast tool with dramatically lower compute requirement.

Recommended Videos

Just to give you an idea of just how fast it is, I asked it to create an image of a parrot playing a bass guitar. It returned with the following picture in just about a second. I could barely even follow the progress bar. When I pushed the same prompt before Google’s Imagen 3 model in Gemini, it took roughly 9-10 seconds on a 200 Mbps internet connection.

Image of a parrot generated by HART.
MIT / HART

A massive breakthrough

When AI images first started making waves, the diffusion technique was behind it all, powering products such as OpenAI’s Dall-E image generator, Google’s Imagen, and Stable Diffusion. This method can produce images with an extremely high level of detail. However, it is a multi-step approach to creating AI images, and as a result, it is slow and computationally expensive.

The second approach that has recently gained popularity is auto-regressive models, which essentially work in the same fashion as chatbots and generate images using a pixel prediction technique. It is faster, but also a more error-prone method of creating images using AI.

On-device demo for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

The team at MIT fused both methods into a single package called HART. It relies on an autoregression model to predict compressed image assets as a discrete token, while a small diffusion model handles the rest to compensate for the quality loss. The overall approach reduces the number of steps involved from over two dozen to eight steps.

The experts behind HART claim that it can “generate images that match or exceed the quality of state-of-the-art diffusion models, but do so about nine times faster.” HART combines an autoregressive model with a 700 million parameter range and a small diffusion model that can handle 37 million parameters.

Evolution of image training for HART.
MIT / HART

Solving the cost-computing crisis

Interestingly, this hybrid tool was able to create images that matched the quality of top-shelf models with a 2 billion parameter capacity. Most importantly, HART was able to achieve that milestone at a nine times faster image generation rate, while requiring 31% less computation resources.

As per the team, the low-compute approach allows HART to run locally on phones and laptops, which is a huge win. So far, the most popular mass-market products such as ChatGPT and Gemini require an internet connection for image generation as the computing happens in the cloud servers.

In the test video, the team showcased it running natively on an MSI laptop with Intel’s Core series processor and an Nvidia GeForce RTX graphics card. That’s a combination you can find on a majority of gaming laptops out there, without spending a fortune, while at it.

Comparative analysis of AI images.
MIT / HART

HART is capable of producing 1:1 aspect ratio images at a respectable 1024 x 1024 pixels resolution. The level of detail in these images is impressive, and so is the stylistic variation and scenery accuracy. During their tests, the team noted that the hybrid AI tool was anywhere between three to six times faster and offered over seven times higher throughput.

The future potential is exciting, especially when integrating HART’s image capabilities with language models. “In the future, one could interact with a unified vision-language generative model, perhaps by asking it to show the intermediate steps required to assemble a piece of furniture,” says the team at MIT.

They are already exploring that idea, and even plan to test the HART approach at audio and video generation. You can try it out on MIT’s web dashboard.

Some rough edges

Before we dive into the quality debate, do keep in mind that HART is very much a research project that is still in its early stages. On the technical side, there are a few hassles highlighted by the team, such as overheads during the inference and training process.

Failures of HART.
HART / Nadeem Sarwar

The challenges can be fixed or overlooked, because they are minor in the bigger scheme of things here. Moreover, considering the sheer benefits HART delivers in terms of computing efficiency, speed, and latency, they might just persist without leading to any major performance issues.

In my brief time prompt-testing HART, I was astonished by the pace of image generation. I barely ran into a scenario where the free web tool took more than two seconds to create an image. Even with prompts that span three paragraphs (roughly over 200 words in length), HART was able to create images that adhere tightly to the description.

AI images sample generated with HART.
HART / Nadeem Sarwar

Aside from descriptive accuracy, there was plenty of detail in the images. However, HART suffers from the typical failings of an AI image generator tool. It struggles with digits, basic depictions like eating food items, character consistency, and failing at perspective capture.

Photorealism in human context is one area where I noticed glaring failures. On a few occasions, it simply got the concept of basic objects wrong, like confusing a ring with a necklace. But overall, those errors were far, few, and fundamentally expected. A healthy bunch of AI tools still can’t get that right, despite being out there for a while now.

Overall, I am particularly excited by the immense potential of HART. It would be interesting to see whether MIT and Nvidia create a product out of it, or simply adopt the hybrid AI image generation approach in an existing product. Either way, it’s a glimpse into a very promising future.

Nadeem Sarwar
Nadeem is the Managing Editor at Digital Trends.
Apple’s historically high tax for RAM upgrades on Macs has now become absurd
Mac RAM upgrade prices have doubled amid the global memory crunch
MacBook Pro.

Apple’s Mac RAM upgrades were already expensive enough to raise eyebrows. After the company’s latest round of price hikes, some of them now look ridiculous.

Apple recently raised prices across its Mac and iPad lineup, along with other products, citing rising memory and storage costs. The supply crunch is real, but Mac buyers were paying steep premiums for RAM and SSD upgrades long before this jump. Recent MacBook Pro configuration screenshots shared by 9to5Mac show how much worse the upgrade path has become.

Read more
Windows 11 is getting a new Screen Tint mode, and your eyes might thank Microsoft
Users can apply custom color overlays to reduce screen intensity and visual fatigue.
Windows 11 on a laptop

Microsoft is testing a new accessibility feature for Windows 11 called Screen Tint, and it could be one of those small additions that make a surprisingly big difference. Instead of changing your display's color temperature like Night Light, Screen Tint applies a customizable color overlay across the entire screen, making bright displays easier on the eyes during long work or gaming sessions.

A softer screen for tired eyes

Read more
Apple’s looking at a politically radioactive fix for the memory crisis, and the US government isn’t happy about it
Apple blamed memory costs for your price hike. Its proposed solution involves a Pentagon blacklist.
Apple Mac Mini on a Desk

A few days ago, Apple announced an ugly mid-cycle price hike, blaming the worsening-by-the-day memory crisis. According to the Financial Times, the company is now lobbying the government for approval to buy memory chips from a Chinese company. 

The company in question is CXMT, a Chinese chipmaker that the Pentagon added to its Chinese Military Company blacklist for alleged ties to the Chinese army.

Read more