
A new program is using those annoying CAPTCHAs to help digitize old texts.
You might not know the term, but you’ve dealt with CAPTCHAs many times. They’re the annoying, fuzzy, distorted words or numbers you need to try and copy correctly to gain access to asite. About 60 million of them are typed every single day Their aim is to stop automated programs having access and posting ads; it requires a human to type the code. Researchers atCarnegie Mellon University have found a new use for the CAPTCHA – well, call it a dual use, if you like. As well as acting as a password to a site, they cannow also help with the digitizing of old books. It all stems from the problem that when old texts are scanned, computers are unable to decipher about 10% of the words, meaning the human touchis necessary to make sense of them. With literally thousands of pages scanned every month, that becomes a gigantic test. So researchers decided to farm out the work. The words are sent out toweb sites to be used as CAPTCHAs. Known as reCAPTCHAs, once they’re deciphered, the result is returned to Carnegie Mellon. But how does anyoneknow the answer is correct? Well, as a test, users are given two words to type, and the content of one is already known. If that is typed correctly, the assumption is that the other is correct. Forextra proof, the word is sent to two different sites to be used as a CAPTCHA. If both answers are the same, then that’s good enough for the researchers. With the proliferation ofCAPTCHAs, about a million words a day are being deciphered. “There’s no danger of us running out of words," Luis von Ahn, a Professor at CMU told the BBC. "There’s still about 100 million books to be digitized, which at the current rate will take us about 400 years to complete."
















Showing 2 comments
RSSOther than that, it's a pretty cool idea!