Skip to main content

Nvidia just released an open-source LLM to rival GPT-4

Nvidia CEO Jensen in front of a background.
Nvidia

Nvidia, which builds some of the most highly sought-after GPUs in the AI industry, has announced that it has released an open-source large language model that reportedly performs on par with leading proprietary models from OpenAI, Anthropic, Meta, and Google.

The company introduced its new NVLM 1.0 family in a recently released white paper, and it’s spearheaded by the 72 billion-parameter NVLM-D-72B model. “We introduce NVLM 1.0, a family of frontier-class multimodal large language models that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models,” the researchers wrote.

Recommended Videos

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2).
Remarkably, NVLM 1.0 shows improved text-only… pic.twitter.com/yKGyOqHnsp

— Wei Ping (@_weiping) September 18, 2024

The new model family is reportedly already capable of “production-grade multimodality,” with exceptional performance across a variety of vision and language tasks, in addition to improved text-based responses compared to the base LLM that the NVLM family is based on. “To achieve this, we craft and integrate a high-quality text-only dataset into multimodal training, alongside a substantial amount of multimodal math and reasoning data, leading to enhanced math and coding capabilities across modalities,” the researchers explained.

The result is an LLM that can just as easily explain why a meme is funny as it can solve complex mathematics equations, step by step. Nvidia also managed to increase the model’s text-only accuracy by an average of 4.3 points across common industry benchmarks, thanks to its multimodal training style.

screenshot of the NVLM white paper explaining the process of explaining why a meme is funny
Nvidia

Nvidia appears serious about ensuring that this model meets the Open Source Initiative’s newest definition of “open source” by not only making its training weights available for public review, but also promising to release the model’s source code in the near future. This is a marked departure from the actions of rivals like OpenAI and Google, who jealously guard the details of their LLMs’ weights and source code. In doing so, Nvidia has positioned the NVLM family to not necessarily compete directly against ChatGPT-4o and Gemini 1.5 Pro, but rather serve as a foundation for third-party developers to build their own chatbots and AI applications.

Andrew Tarantola
Former Computing Writer
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Midjourney’s new image generation model announced to take on OpenAI’s GPT-4o
Midjourney logo on web explore feed.

Even though MidJourney set out to be one of the most promising image generation models in the early days of AI, it appears to have fallen behind more accessible, easy to use, and free tools such Gemini, ChatGPT, and Bing. Adding to its woes is the latest update to OpenAI's GPT-4o model which allows exceptionally good image generation with the ability to recreate real photos and produce immaculate text. So to stay relevant -- or perhaps catch the hype train being shunted by the wave of Studio Ghibli-inspired AI art flooding the internet, MidJourney is rolling out an updated model with several improvements.

CEO David Holz announced details of the new V7 model on MidJourney's official Discord server and through a blog post. They said the new model is "smarter with text prompts" and produces images with "noticeably higher" quality and "beautiful textures."

Read more
OpenAI plans to make Deep Research free on ChatGPT, in response to competition
OpenAI's new typeface OpenAI Sans

OpenAI has plans to soon make its Deep Research function available for free tier ChatGPT users.

The feature has been available since early February to Plus, Pro, Enterprise, and Edu subscribers; however, the AI company plans to expand availability beyond its paid users. Deep Research goes beyond the standard query results of the brand’s more traditional AI models. The AI agent has the ability to do extended research tasks on command without the help of a human. The feature can provide a detailed report on the subject of your choosing that might take between five and 30 minutes to compile.  

Read more
The delay is over — you can now generate images with ChatGPT for free
OpenAI ChatGPT image

After an explosive launch, a viral trend, and some melted GPUs, the new image generation feature for ChatGPT is now available to free users. The feature originally launched on March 25 but because paid subscribers utterly flooded OpenAI with requests for Ghiblified images, CEO Sam Altman announced the next day that the rollout to free users would be delayed "a while."

Luckily, it appears this delay is over just five days later -- Altman has already published another X post saying that "image gen [is] now rolled out to all free users!"

Read more