Skip to main content

MiniGPT-4: A free image-to-text AI tool you can try out today

ChatGPT is great, but right now, it’s limited to just text — text in, text out. GPT-4 was supposed to expand on this by adding image processing to allow it to generate text based on images.

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

OpenAI has yet to release this feature, however, which is where MiniGPT-4 comes in. This open source project gives us a preview of what the image processing in GPT-4 might be like — and it’s pretty neat.

Recommended Videos

What is MiniGPT-4?

Image used with permission by copyright holder

MiniGPT-4 is an open source project that was posted on GitHub to demonstrate vision-language capabilities in an AI system. Some examples of what it can do include generating descriptions of images, writing stories based on images, or even creating websites just from drawings.

Despite what the name implies, MiniGPT-4 is not officially connected to OpenAI or GPT-4. It was created by a group of Ph.D. students based in Saudi Arabia at the King Abdullah University of Science and Technology. It’s also based on a different large language model (LLM) called Vicuna, which itself was built on the open-source Large Language Model Meta AI (LLaMA). It’s not quite as powerful as ChatGPT, but as graded by GPT-4 itself, Vicuna gets within 90%.

How to use MiniGPT-4

MiniGPT-4 is just a demo and is still in its first version. For now, it can be accessed for free at the group’s official website. To use it, just drag an image in or click “Drop Image Here.” Once it’s uploaded, type your prompt into the search box.

What kinds of things should you try out? Well, asking MiniGPT-4 to describe an image is simple enough. But maybe you need some copy for an Instagram post for your company. Or maybe you want to knoe the ingredients needed for an interesting dish, and even a recipe for how to cook it. MiniGPT-4 can handle these tasks surprisingly well.

The coding aspects are a bit more rough around the edges. Turning a simple napkin drawing into a functioning website was a trick shown off by OpenAI when GPT-4 was first announced. But MiniGPT-4 doesn’t seem to be able to handle that quite as well just yet. ChatGPT will provide more accurate code — in fact, running whatever the MiniGPT-4 code is through ChatGPT or GPT-4 will net you better results.

One thing to note is that MiniGPT-4 does use your local system’s GPU. So, unless you have a fairly powerful discrete GPU, you may find the experience fairly slow. For context, I tried it out on a M2 Max MacBook Pro, and it took around 30 seconds to generate text based on an image I uploaded.

Limitations of MiniGPT-4

The speed of MiniGPT-4 is certainly a limitation. If you’re trying to access this without some decent graphics, it’s too slow to feel responsive. If you’re used to the speed of cloud-based ChatGPT or even Bing Image Creator, MiniGPT-4 is going to feel painfully slow.

Beyond that, MiniGPT-4 has all the same limitations that ChatGPT or Google Bard or any other AI chatbot in that it can “hallucinate” or make up information.

Luke Larsen
Luke Larsen is the Senior Editor of Computing, managing all content covering laptops, monitors, PC hardware, Macs, and more.
This massive upgrade to ChatGPT is coming in January — and it’s not GPT-5
ChatGPT on a laptop

OpenAI is set to launch a new AI agent in January, code-named Operator, that will enable ChatGPT to take action on the user's behalf. You may never have to book your own flights ever again.

The company's leadership made the announcement during a staff meeting Wednesday, reports Bloomberg. The company plans to roll out the new feature as a research preview through the company’s developer API.

Read more
Is AI already plateauing? New reporting suggests GPT-5 may be in trouble
A person sits in front of a laptop. On the laptop screen is the home page for OpenAI's ChatGPT artificial intelligence chatbot.

OpenAI's next-generation Orion model of ChatGPT, which is both rumored and denied to be arriving by the end of the year, may not be all it's been hyped to be once it arrives, according to a new report from The Information.

Citing anonymous OpenAI employees, the report claims the Orion model has shown a "far smaller" improvement over its GPT-4 predecessor than GPT-4 showed over GPT-3. Those sources also note that Orion "isn’t reliably better than its predecessor [GPT-4] in handling certain tasks," specifically coding applications, though the new model is notably stronger at general language capabilities, such as summarizing documents or generating emails.

Read more
ChatGPT monthly usage may now rival Google Chrome
A person sits in front of a laptop. On the laptop screen is the home page for OpenAI's ChatGPT artificial intelligence chatbot.

A number of popular generative AI platforms are seeing consistent growth as users are figuring out how they want to use the tools -- and ChatGPT is at the top of the list with the most visits, at 3.7 billion worldwide. So many people are visiting the AI chatbot, and its figures are rivaling browser market share. It can only be compared to Google Chrome figures in terms of monthly users, which is estimated to be around 3.45 billion.

Statistics from Similarweb indicate that ChatGPT saw a 17.2% month-over-month (MoM) growth and a 115.9% year-over-year (YoY) traffic growth. Some highlights that spurned the ChatGPT growth during 2024 include its parent company, OpenAI, updating its web address from a subdomain, chat.openai.com, to a main domain, chatgpt.com. The tool especially saw a surge of traffic in May 2024, when it hit a 2.2-billion-visit milestone, and has been growing ever since, according to Similarweb researcher David F. Carr.

Read more