Google is bringing Show and Tell to the world. No, it doesn’t want you to bring something from home to show the class — instead, it’s open-sourcing an artificially intelligent model for giving images captions.
The model was first detailed back in 2014, however it was updated in 2015 to be a little more accurate. It has been improved even more since then, and is now available on GitHub as a part of Google’s TensorFlow machine learning framework. Along with posting the code for it, Google is also posting a research paper on the technology.
What makes the new system great is that it can be trained much faster than it could in the past, and achieves the same accuracy of captions while doing so — in fact, it previously took 3 seconds per training step, however with TensorFlow it takes a measly 0.7 seconds.
“This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system,” said Google software engineer Chris Shallue in a blog post.
Show and Tell is trained by being shown images together with captions that were written for those images. Sometimes it uses previously written captions if it thinks it sees something that is similar to what it has seen before, however at other times it creates its own captions.
Of course, Google isn’t the only company turning to artificial intelligence for the creation of image captions, but it is one of the few companies that has a number of products that could implement the technology. For example, the tech would be able to help users find images in their Google Photos library, to assist with Google Images, and so on.