Here’s the quick summary:
- People have watched video with automatic captions more than 23 million times, and have automatically translated captions more than 7.6 million times.
- The number of manually-created caption tracks has more than tripled thanks largely to automatic caption timing technology.
- Just recently, we’ve reduced the error rate in our speech recognition algorithms by 20%
Back in November we talked about how online video presents a tremendous challenge of scale. Before automatic captions, there were around 200,000 videos on YouTube with captions. It sounds like a lot, but at YouTube more than 35 hours of video is uploaded every minute. We want all videos to be accessible to everyone -- whether or not they can hear or understand the language.
Since March, people have been able to get captions for almost any video that has clearly spoken English. Less than a year later people have watched video with automatic captions more than 23 million times. Clearly, there’s a lot of demand for captioned content, and people have been really making use of our technology. They’re also using the technology to access content in their own languages, since captions can be automatically translated to more than fifty languages; we’ve seen more than 7.6 million caption translations.
Auto-captions aren’t perfect, so we’ve also been pursuing a number of initiatives to help people manually create captions. At our event a year ago, we introduced automatic caption timing, a feature that will take an ordinary text file and turn it into captions with time-codes. Since then we’ve added these features to the YouTube Data API to make it easier for people to write scripts and apps that can upload large numbers of captions at once. More recently, we started the YouTube Ready qualification program to help video owners find professional caption vendors familiar with YouTube. Thanks to these efforts, we’ve seen the number of manually-created caption tracks available on YouTube more than triple (with more than 500,000 available today).
In the past few weeks, we’ve rolled out a significant improvements to our speech recognition technology to improve the accuracy of automatic captions. YouTube's new speech recognition model reduces the overall word error rate by about 20%. Although the improvements vary from video to video, a video that had identified 50% of the words correctly before will now recognize about 60% of the words, and a video that was at 75% before will now correctly identify about 80% of the words. We continue to make improvements and there is much more on the way.
On a personal note, it's been amazing to see the feedback, videos, blog posts, thanks, (and bug reports!) sent in over the past year. Even though we can't possibly respond to them all, we love to see them, and they shape our efforts on this project. We’ve taken this feedback to make a number of subtle improvements to the service, such as adding an “Always show automatic captions” setting, adding an interactive transcript button so you can see all the captions and skip through the video, and making the red button easier to find.
What's next? We’ll continue to work on accuracy, and we also want to make sure captions are available on YouTube everywhere, on your Internet TV, your computer and your mobile phone. We have a few other things coming... but I don't want to spoil the surprise. You'll have to stay tuned, and I hope you'll turn the captions on when you do!