Skip to main content

Google's image-caption creator, based on AI technology, is now open source

Google is bringing Show and Tell to the world. No, it doesn’t want you to bring something from home to show the class — instead, it’s open-sourcing an artificially intelligent model for giving images captions.

The model was first detailed back in 2014, however it was updated in 2015 to be a little more accurate. It has been improved even more since then, and is now available on GitHub as a part of Google’s TensorFlow machine learning framework. Along with posting the code for it, Google is also posting a research paper on the technology.

Recommended Videos

What makes the new system great is that it can be trained much faster than it could in the past, and achieves the same accuracy of captions while doing so — in fact, it previously took 3 seconds per training step, however with TensorFlow it takes a measly 0.7 seconds.

“This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system,” said Google software engineer Chris Shallue in a blog post.

Show and Tell is trained by being shown images together with captions that were written for those images. Sometimes it uses previously written captions if it thinks it sees something that is similar to what it has seen before, however at other times it creates its own captions.

Of course, Google isn’t the only company turning to artificial intelligence for the creation of image captions, but it is one of the few companies that has a number of products that could implement the technology. For example, the tech would be able to help users find images in their Google Photos library, to assist with Google Images, and so on.

Christian de Looper
Christian de Looper is a long-time freelance writer who has covered every facet of the consumer tech and electric vehicle…
Samsung might put AI smart glasses on the shelves this year
Google's AR smartglasses translation feature demonstrated.

Samsung’s Project Moohan XR headset has grabbed all the spotlights in the past few months, and rightfully so. It serves as the flagship launch vehicle for a reinvigorated Android XR platform, with plenty of hype from Google’s own quarters.
But it seems Samsung has even more ambitious plans in place and is reportedly experimenting with different form factors that go beyond the headset format. According to Korea-based ET News, the company is working on a pair of smart glasses and aims to launch them by the end of the ongoing year.
Currently in development under the codename “HAEAN” (machine-translated name), the smart glasses are reportedly in the final stages of locking the internal hardware and functional capabilities. The wearable device will reportedly come equipped with camera sensors, as well.

What to expect from Samsung’s smart glasses?
The Even G1 smart glasses have optional clip-on gradient shades. Photo by Tracey Truly / Digital Trends
The latest leak doesn’t dig into specifics about the internal hardware, but another report from Samsung’s home market sheds some light on the possibilities. As per Maeil Business Newspaper, the Samsung smart glasses will feature a 12-megapixel camera built atop a Sony IMX681 CMOS image sensor.
It is said to offer a dual-silicon architecture, similar to Apple’s Vision Pro headset. The main processor on Samsung’s smart glasses is touted to be Qualcomm’s Snapdragon AR1 platform, while the secondary processing hub is a chip supplied by NXP.
The onboard camera will open the doors for vision-based capabilities, such as scanning QR codes, gesture recognition, and facial identification. The smart glasses will reportedly tip the scales at 150 grams, while the battery size is claimed to be 155 mAh.

Read more
I saw Google’s Gemini AI erase copyright evidence. I am deeply worried
Gemini Advanced on the Google Pixel 9 Pro Fold.

Update: Google has responded to Digital Trends’ queries. The story has been updated with company’s statement below.
The rise of generative AI has been a fairly messy process, especially from fair usage ethics and copyright perspective. AI giants are inking deals with publishers to avoid legal hassles, while at the same time, they are embroiled in copyright tussles in courts in multiple countries.
As the ravenous appetite for training AI on user data grows, we might be in for another ethical conundrum. Multiple users on X and Reddit have shared demonstrations of how Google’s latest Gemini 2.0 series AI model can remove watermarks from copyright-protected images.
Going by the before/after samples of images, it seems Gemini is doing a fairly good job of removing the watermarks. Notably, it’s not only erasing those banner-style watermarks, but also fairly complex overlays with design and stylized text elements.
https://x.com/ajiteshleo/status/1901484196972945872?s=61
The model in question is the Gemini 2.0 Flash, which was released earlier this year and got a reasoning upgrade, as well. It is worth noting here that you can not remove the watermark if you are using the mobile or desktop version of the Gemini 2.0 Flash model. Trying to do so returns a message like this:
“Removing watermarks from images without the explicit permission of the copyright holder is illegal in most countries. It is important to respect the copyright laws and intellectual property rights. If you want to use an image with a watermark, you should contact the copyright holder and ask for permission.”
You can, however, try and remove the watermark from images in the Google AI Studio. Digital Trends successfully removed watermarks from a variety of images using the Gemini 2.0 Flash (Image Generation) Experimental model.
 
It is a violation of local copyright laws and any usage of AI-modified material without due consent could land you in legal trouble. Moreover, it is a deeply unethical act, which is also why artists and authors are fighting in court over companies using their work to train AI models without duly compensating them or seeking their explicit nod.

How are the results?
A notable aspect is that the images produced by the AI are fairly high quality. Not only is it removing the watermark artifacts, but also fills the gap with intelligent pixel-level reconstruction. In its current iteration, it works somewhat like the Magic Eraser feature available in the Google Photos app for smartphones.
Furthermore, if the input image is low quality, Gemini is not only wiping off the watermark details but also upscaling the overall picture. .
https://x.com/kaiju_ya/status/1901099096930496720?s=61
The output image, however, has its own Gemini watermark, although this itself can be removed with a simple crop. There are a few minor differences in the final image produced by Gemini after its watermark removal process, such as slightly different color temperatures and fuzzy surface details in photorealistic shots.

Read more
Gemini is replacing Google Assistant. How will the shift affect you?
Google Assistant and Gemini apps on an Android phone.

The writing has been on the wall for a while, but the shift away from Google Assistant is now official. Google has announced that it will shift users to Gemini as the default AI assistant on their devices in the coming months. Once that happens, they will no longer be able to access the Google Assistant.
At the moment, you can switch to Google Assistant as the default option on your Android phone, even on newer phones that come with Gemini running out of the box. In addition to phones, Google will be giving a similar treatment to smartwatches, Android Auto, tablets, smart home devices, TVs, and audio gear.
“We're also bringing a new experience, powered by Gemini, to home devices like speakers, displays, and TVs,” says Google, without sharing a specific time frame for the transition. What happens to Google Assistant following the transition? Well, it will be removed from devices and will no longer be available to download from app stores.

Talking about apps, Gemini can already interact with a wide range of Google’s own as well as a few third-party apps. Users can ask it to perform chores across different products, without ever having to open those apps. In addition to in-house apps such as Docs, Drive, and Gmail, the Gemini assistant can also perform tasks in third-party apps such as WhatsApp and Spotify, alongside a bunch of Samsung apps.

Read more