Skip to main content

AI image generators appear to propagate gender and race stereotypes

Experts have claimed that popular AI image generators such as Stable Diffusion are not so adept at picking up on gender and cultural biases when using machine learning algorithms to create art.

Many text-to-art generators allow you to input phrases and draft up a unique image on the other end. However, these generators can often be based on stereotypical biases, which can affect how machine learning models manufacture images Images can often be Westernized, or show favor to certain genders or races, depending on the types of phrases used, Gizmodo noted.

Recommended Videos

What's the difference between these two groups of people? Well, according to Stable Diffusion, the first group represents an 'ambitious CEO' and the second a 'supportive CEO'.
I made a simple tool to explore biases ingrained in this model: https://t.co/l4lqt7rTQj pic.twitter.com/xYKA8w3N8N

— Sasha Luccioni, PhD 🦋🌎✨🤗 (@SashaMTL) October 31, 2022

Sasha Luccioni, artificial intelligence researcher for Hugging Face, created a tool that demonstrates how the AI bias in text-to-art generators works in action. Using the Stable Diffusion Explorer as an example, inputting the phrase “ambitious CEO” garnered results for different types of men, while the phrase “supportive CEO” gave results that showed both men and women.

Similarly, the DALL-E 2 generator, which was created by the brand OpenAI has shown male-centric biases for the term “builder” and female-centric biases for the term “flight attendant” in image results, despite there being female builders and male flight attendants.

While many AI image generators appear to just take a few words and machine learning and out pops an image, there is a lot more that goes on in the background. Stable Diffusion, for example, uses the LAION image set, which hosts “billions of pictures, photos, and more scraped from the internet, including image-hosting and art sites,” Gizmodo noted.

Racial and cultural bias in online image searches has already been an ongoing topic long before the increasing popularity of AI image generators. Luccioni told the publication that systems, such as the LAION dataset ,are likely to home in on 90% of the images related to a prompt and use it for the image generator.

Fionna Agomuoh
Former Digital Trends Contributor
Fionna Agomuoh is a Computing Writer at Digital Trends. She covers a range of topics in the computing space, including…
I tested the future of AI image generation. It’s astoundingly fast.
Imagery generated by HART.

One of the core problems with AI is the notoriously high power and computing demand, especially for tasks such as media generation. On mobile phones, when it comes to running natively, only a handful of pricey devices with powerful silicon can run the feature suite. Even when implemented at scale on cloud, it’s a pricey affair.
Nvidia may have quietly addressed that challenge in partnership with the folks over at the Massachusetts Institute of Technology and Tsinghua University. The team created a hybrid AI image generation tool called HART (hybrid autoregressive transformer) that essentially combines two of the most widely used AI image creation techniques. The result is a blazing fast tool with dramatically lower compute requirement.
Just to give you an idea of just how fast it is, I asked it to create an image of a parrot playing a bass guitar. It returned with the following picture in just about a second. I could barely even follow the progress bar. When I pushed the same prompt before Google’s Imagen 3 model in Gemini, it took roughly 9-10 seconds on a 200 Mbps internet connection.

A massive breakthrough
When AI images first started making waves, the diffusion technique was behind it all, powering products such as OpenAI’s Dall-E image generator, Google’s Imagen, and Stable Diffusion. This method can produce images with an extremely high level of detail. However, it is a multi-step approach to creating AI images, and as a result, it is slow and computationally expensive.
The second approach that has recently gained popularity is auto-regressive models, which essentially work in the same fashion as chatbots and generate images using a pixel prediction technique. It is faster, but also a more error-prone method of creating images using AI.
On-device demo for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
The team at MIT fused both methods into a single package called HART. It relies on an autoregression model to predict compressed image assets as a discrete token, while a small diffusion model handles the rest to compensate for the quality loss. The overall approach reduces the number of steps involved from over two dozen to eight steps.
The experts behind HART claim that it can “generate images that match or exceed the quality of state-of-the-art diffusion models, but do so about nine times faster.” HART combines an autoregressive model with a 700 million parameter range and a small diffusion model that can handle 37 million parameters.

Read more
Microsoft nixes its Dall-E upgrade after image quality complaints
Robot holding a video camera, generated by Bing.

Microsoft has had to roll back its latest update to its Bing Image Generation system, which installed the latest iteration of OpenAI's Dall-E model, called PR16, after Bing users vociferously complained about a decline in image quality.

https://x.com/JordiRib1/status/1869425938976665880

Read more
I tried out Google’s latest AI tool that generates images in a fun, new way
Google's Whisk AI tool being used with images.

Google’s latest AI tool helps you automate image generation even further. The tool is called Whisk, and it's based on Google’s latest Imagen 3 image generation model. Rather than relying solely on text prompts, Whisk helps you create your desired images using other images as the base prompt.

Whisk is currently in an experimental phase, but once set up it's fairly easy to navigate. Google detailed in a blog post introducing Whisk that it is intended for “rapid visual exploration, not pixel-perfect edits.”

Read more