Skip to main content
  1. Home
  2. Computing
  3. Evergreens

MiniGPT-4: A free image-to-text AI tool you can try out today

Add as a preferred source on Google

ChatGPT is great, but right now, it’s limited to just text — text in, text out. GPT-4 was supposed to expand on this by adding image processing to allow it to generate text based on images.

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

OpenAI has yet to release this feature, however, which is where MiniGPT-4 comes in. This open source project gives us a preview of what the image processing in GPT-4 might be like — and it’s pretty neat.

Recommended Videos

What is MiniGPT-4?

Image used with permission by copyright holder

MiniGPT-4 is an open source project that was posted on GitHub to demonstrate vision-language capabilities in an AI system. Some examples of what it can do include generating descriptions of images, writing stories based on images, or even creating websites just from drawings.

Despite what the name implies, MiniGPT-4 is not officially connected to OpenAI or GPT-4. It was created by a group of Ph.D. students based in Saudi Arabia at the King Abdullah University of Science and Technology. It’s also based on a different large language model (LLM) called Vicuna, which itself was built on the open-source Large Language Model Meta AI (LLaMA). It’s not quite as powerful as ChatGPT, but as graded by GPT-4 itself, Vicuna gets within 90%.

How to use MiniGPT-4

MiniGPT-4 is just a demo and is still in its first version. For now, it can be accessed for free at the group’s official website. To use it, just drag an image in or click “Drop Image Here.” Once it’s uploaded, type your prompt into the search box.

What kinds of things should you try out? Well, asking MiniGPT-4 to describe an image is simple enough. But maybe you need some copy for an Instagram post for your company. Or maybe you want to knoe the ingredients needed for an interesting dish, and even a recipe for how to cook it. MiniGPT-4 can handle these tasks surprisingly well.

The coding aspects are a bit more rough around the edges. Turning a simple napkin drawing into a functioning website was a trick shown off by OpenAI when GPT-4 was first announced. But MiniGPT-4 doesn’t seem to be able to handle that quite as well just yet. ChatGPT will provide more accurate code — in fact, running whatever the MiniGPT-4 code is through ChatGPT or GPT-4 will net you better results.

One thing to note is that MiniGPT-4 does use your local system’s GPU. So, unless you have a fairly powerful discrete GPU, you may find the experience fairly slow. For context, I tried it out on a M2 Max MacBook Pro, and it took around 30 seconds to generate text based on an image I uploaded.

Limitations of MiniGPT-4

The speed of MiniGPT-4 is certainly a limitation. If you’re trying to access this without some decent graphics, it’s too slow to feel responsive. If you’re used to the speed of cloud-based ChatGPT or even Bing Image Creator, MiniGPT-4 is going to feel painfully slow.

Beyond that, MiniGPT-4 has all the same limitations that ChatGPT or Google Bard or any other AI chatbot in that it can “hallucinate” or make up information.

Luke Larsen
Former Senior Editor, Computing
Luke Larsen is the Senior Editor of Computing, managing all content covering laptops, monitors, PC hardware, Macs, and more.
Android desktop mode made me miss my laptop in record time
I tried writing and publishing from Google’s phone-to-monitor setup, and the future of mobile computing immediately started sweating.
Computer, Electronics, Laptop

Android 17 desktop mode has a very simple pitch. Plug your phone into a monitor, add a keyboard and mouse, and watch the slab in your pocket pretend to be a computer. I wanted to give that pitch a fair shot, so I tried using it for an actual workday instead of a cute demo.

The goal was boring on purpose: write an article, edit it, build the page in WordPress, upload whatever needed uploading, and publish the thing without running back to my laptop like a coward.

Read more
As AI turbocharges digital abuse, UK agencies urge parents to limit who sees kids’ photos online
The National Crime Agency and Internet Watch Foundation are asking parents to tighten privacy settings as AI-generated abuse material rises.
Social Media

Parents who post pictures of their kids online are being told to rethink the habit. The UK's National Crime Agency and the Internet Watch Foundation have issued new guidance urging families to lock down their social media accounts, warning that publicly shared photos are increasingly being pulled and altered by AI tools to create child sexual abuse material.

The two organizations say most parents have no idea this is happening. Criminals no longer need to contact a child directly to generate such material. They can scrape an ordinary photo and run it through widely available nudify apps.

Read more
I used ASUS’ dual-screen laptop as a portable creative station, and my desk PC started collecting dust
The Zenbook Duo might be the creator setup I wanted in college
Computer Hardware, Electronics, Hardware

With laptops, brands are constantly in a balancing act between portability and workspace productivity. The ASUS Zenbook Duo UX8407AA tries to dodge that choice with a design that brings a whole setup in a compact form factor.

I used the Zenbook Duo as a creative machine, mainly with design apps, illustration work, writing, and multitasking. The model I tried runs on Intel’s Core Ultra 7 355, paired with 32GB of memory and a 1TB SSD. That gives it enough horsepower to handle Photoshop and Animate, for sketches and animations, and a lot more without breaking a sweat.

Read more