Skip to main content

MiniGPT-4: A free image-to-text AI tool you can try out today

ChatGPT is great, but right now, it’s limited to just text — text in, text out. GPT-4 was supposed to expand on this by adding image processing to allow it to generate text based on images.

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

OpenAI has yet to release this feature, however, which is where MiniGPT-4 comes in. This open source project gives us a preview of what the image processing in GPT-4 might be like — and it’s pretty neat.

What is MiniGPT-4?

Image used with permission by copyright holder

MiniGPT-4 is an open source project that was posted on GitHub to demonstrate vision-language capabilities in an AI system. Some examples of what it can do include generating descriptions of images, writing stories based on images, or even creating websites just from drawings.

Despite what the name implies, MiniGPT-4 is not officially connected to OpenAI or GPT-4. It was created by a group of Ph.D. students based in Saudi Arabia at the King Abdullah University of Science and Technology. It’s also based on a different large language model (LLM) called Vicuna, which itself was built on the open-source Large Language Model Meta AI (LLaMA). It’s not quite as powerful as ChatGPT, but as graded by GPT-4 itself, Vicuna gets within 90%.

How to use MiniGPT-4

MiniGPT-4 is just a demo and is still in its first version. For now, it can be accessed for free at the group’s official website. To use it, just drag an image in or click “Drop Image Here.” Once it’s uploaded, type your prompt into the search box.

What kinds of things should you try out? Well, asking MiniGPT-4 to describe an image is simple enough. But maybe you need some copy for an Instagram post for your company. Or maybe you want to knoe the ingredients needed for an interesting dish, and even a recipe for how to cook it. MiniGPT-4 can handle these tasks surprisingly well.

The coding aspects are a bit more rough around the edges. Turning a simple napkin drawing into a functioning website was a trick shown off by OpenAI when GPT-4 was first announced. But MiniGPT-4 doesn’t seem to be able to handle that quite as well just yet. ChatGPT will provide more accurate code — in fact, running whatever the MiniGPT-4 code is through ChatGPT or GPT-4 will net you better results.

One thing to note is that MiniGPT-4 does use your local system’s GPU. So, unless you have a fairly powerful discrete GPU, you may find the experience fairly slow. For context, I tried it out on a M2 Max MacBook Pro, and it took around 30 seconds to generate text based on an image I uploaded.

Limitations of MiniGPT-4

The speed of MiniGPT-4 is certainly a limitation. If you’re trying to access this without some decent graphics, it’s too slow to feel responsive. If you’re used to the speed of cloud-based ChatGPT or even Bing Image Creator, MiniGPT-4 is going to feel painfully slow.

Beyond that, MiniGPT-4 has all the same limitations that ChatGPT or Google Bard or any other AI chatbot in that it can “hallucinate” or make up information.

Luke Larsen
Luke Larsen is the Senior editor of computing, managing all content covering laptops, monitors, PC hardware, Macs, and more.
The best free Midjourney alternatives you can try out today

Midjourney is one of the most popular and capable natural language artificial intelligence (AI) art generators out there, but it's no monolith. There are a lot of exciting competitors, and not all of them will charge you for the privilege. Want to try your hand at prompting an AI to paint or draw you something amazing?

Here are the best free Midjourney alternatives and a record of my journey to turn myself into a Power Ranger.
Starry AI

Read more
What is Grok? Elon Musk’s controversial ChatGPT competitor, explained
A digital image of Elon Musk in front of a stylized background with the Twitter logo repeating.

Elon Musk has thrown his hat into the already crowded AI ring with Grok, a conversational AI designed to challenge both the likes of ChatGPT and Midjourney, by offering a chatbot with more of "a sense of humor" than other AIs (read: fewer content restrictions and more swearing), as Musk has quipped.

It's all accessed by and trained on X social media platform, as you might guess. Here's everything you need to know about it.
What is Grok?

Read more
There’s something strange about the latest update to ChatGPT
A laptop screen shows the home page for ChatGPT, OpenAI's artificial intelligence chatbot.

OpenAI announced that it has implemented a new version of its GPT-4o large language model to drive its ChatGPT chatbot, but it has declined to specify exactly how the updated model differs from its predecessor.

"To be clear, this is an improvement to GPT-4o and not a new frontier model," the company posted on X (formerly Twitter) Monday.

Read more