Skip to main content
  1. Home
  2. Computing
  3. Evergreens

MiniGPT-4: A free image-to-text AI tool you can try out today

Add as a preferred source on Google

ChatGPT is great, but right now, it’s limited to just text — text in, text out. GPT-4 was supposed to expand on this by adding image processing to allow it to generate text based on images.

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

OpenAI has yet to release this feature, however, which is where MiniGPT-4 comes in. This open source project gives us a preview of what the image processing in GPT-4 might be like — and it’s pretty neat.

Recommended Videos

What is MiniGPT-4?

Image used with permission by copyright holder

MiniGPT-4 is an open source project that was posted on GitHub to demonstrate vision-language capabilities in an AI system. Some examples of what it can do include generating descriptions of images, writing stories based on images, or even creating websites just from drawings.

Despite what the name implies, MiniGPT-4 is not officially connected to OpenAI or GPT-4. It was created by a group of Ph.D. students based in Saudi Arabia at the King Abdullah University of Science and Technology. It’s also based on a different large language model (LLM) called Vicuna, which itself was built on the open-source Large Language Model Meta AI (LLaMA). It’s not quite as powerful as ChatGPT, but as graded by GPT-4 itself, Vicuna gets within 90%.

How to use MiniGPT-4

MiniGPT-4 is just a demo and is still in its first version. For now, it can be accessed for free at the group’s official website. To use it, just drag an image in or click “Drop Image Here.” Once it’s uploaded, type your prompt into the search box.

What kinds of things should you try out? Well, asking MiniGPT-4 to describe an image is simple enough. But maybe you need some copy for an Instagram post for your company. Or maybe you want to knoe the ingredients needed for an interesting dish, and even a recipe for how to cook it. MiniGPT-4 can handle these tasks surprisingly well.

The coding aspects are a bit more rough around the edges. Turning a simple napkin drawing into a functioning website was a trick shown off by OpenAI when GPT-4 was first announced. But MiniGPT-4 doesn’t seem to be able to handle that quite as well just yet. ChatGPT will provide more accurate code — in fact, running whatever the MiniGPT-4 code is through ChatGPT or GPT-4 will net you better results.

One thing to note is that MiniGPT-4 does use your local system’s GPU. So, unless you have a fairly powerful discrete GPU, you may find the experience fairly slow. For context, I tried it out on a M2 Max MacBook Pro, and it took around 30 seconds to generate text based on an image I uploaded.

Limitations of MiniGPT-4

The speed of MiniGPT-4 is certainly a limitation. If you’re trying to access this without some decent graphics, it’s too slow to feel responsive. If you’re used to the speed of cloud-based ChatGPT or even Bing Image Creator, MiniGPT-4 is going to feel painfully slow.

Beyond that, MiniGPT-4 has all the same limitations that ChatGPT or Google Bard or any other AI chatbot in that it can “hallucinate” or make up information.

Luke Larsen
Former Senior Editor, Computing
Luke Larsen is the Senior Editor of Computing, managing all content covering laptops, monitors, PC hardware, Macs, and more.
I let Radial menu take over my Mac, and I’m never going back
One mouse jiggle, endless shortcuts. My Mac has never felt this fast.
Radial app running on Mac

I have been testing Radial for the past week, and it's quickly become one of those apps I didn’t know how I could live without. It's a radial menu for macOS that puts your shortcuts, scripts, and automations right where your cursor is, so you never have to go hunting through menus to find what you need.

The app just received its 5.0 update, adding AI actions powered by Claude, window layouts, variables, a redesigned settings interface, a new Atmosphere background effect, and a squircle menu shape. I got to try most of these, and here's what I found.

Read more
Android desktop mode made me miss my laptop in record time
I tried writing and publishing from Google’s phone-to-monitor setup, and the future of mobile computing immediately started sweating.
Computer, Electronics, Laptop

Android 17 desktop mode has a very simple pitch. Plug your phone into a monitor, add a keyboard and mouse, and watch the slab in your pocket pretend to be a computer. I wanted to give that pitch a fair shot, so I tried using it for an actual workday instead of a cute demo.

The goal was boring on purpose: write an article, edit it, build the page in WordPress, upload whatever needed uploading, and publish the thing without running back to my laptop like a coward.

Read more
As AI turbocharges digital abuse, UK agencies urge parents to limit who sees kids’ photos online
The National Crime Agency and Internet Watch Foundation are asking parents to tighten privacy settings as AI-generated abuse material rises.
Social Media

Parents who post pictures of their kids online are being told to rethink the habit. The UK's National Crime Agency and the Internet Watch Foundation have issued new guidance urging families to lock down their social media accounts, warning that publicly shared photos are increasingly being pulled and altered by AI tools to create child sexual abuse material.

The two organizations say most parents have no idea this is happening. Criminals no longer need to contact a child directly to generate such material. They can scrape an ordinary photo and run it through widely available nudify apps.

Read more