YouTube first introduced captions for videos in 2006. Three years later it automated the feature, a huge step forward that on Thursday enabled it to announce it now has a billion captioned videos on its site.
Captions show on videos as a text overlay and transcribe dialog and other relevant audio occurrences happening on screen. You can enable them by clicking on the icon on the far left in the bottom right of the video player.
Although primarily geared toward the 300 million people in the world with hearing impairments, captions can also come in handy for a good chunk of YouTube’s global user base of more than a billion people. Consider videos where the audio is a bit ropey or you simply can’t catch what the actors are saying. The feature could also be useful when you’re in a public place without your earphones and you’re still keen to view the content.
And yes, captions is a heavily used feature, with viewers clicking the “on” button more than 15 million times a day.
But the system isn’t perfect, at least, not yet. Errors in the text of course show up from time to time, with some YouTubers taking advantage of the slip-ups to create their own comedy videos.
However, the team has been working hard in recent years to improve the reliability of its automated captions technology. Discussing the issue in a blog post, YouTube’s Liat Kaver said significant progress has already been made in enhancing its speech recognition software and machine learning algorithms. “All together, those technological efforts have resulted in a 50 percent leap in accuracy for automatic captions in English, which is getting us closer and closer to human transcription error rates,” Kaver wrote.
The team also wants to invest more time in improving the caption accuracy of its other supported languages, which include Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
Kaver said the long-term aim is to get captions on every clip that requires them. “Ideally, every video would have an automatic caption track generated by our system and then reviewed and edited by the creator,” she wrote, adding, “With the improvements we’ve made to the automated speech recognition, this is now easier than ever.”
- YouTube Live’s new geotags allow world exploration from your couch
- In China, lowly vending machines are transforming into smart storefronts
- YouTube could bring its picture-in-picture feature to desktop soon
- Everything you need to know about the Essential PH-1 phone
- How to fix Microsoft Edge’s most common problems