Tuesday, September 11, 2007

Capturing CAPTCHAs

The inspiration for this post actually came from Monday's class. A reference was made to "you know, when you have to type in a word to show you're a human".

This is called a CAPTCHA - Completely Automated Public Turing test to tell Computers and Humans Apart. I'm sure most of you are familiar with them, but for any of you for whom this is the first time on the internet, the most common form of CAPTCHA looks like this:
A human can easily read it, despite the distortion, and the extraneous lines, while a computer will have more difficulty.

Though, the use of CAPTCHAs to thwart computer mischief is kind of ironic, since it pits computer scientists against their own achievements. Advances are being made constantly in the field of artificial intelligence that can be used to decipher CAPTCHAs more reliably. In fact, of the three steps a computer goes through to read a cipher, only one is more easily accomplished by a human. The steps are distinguishing the text from the background, segmenting the text into individual letters and finally interpreting that letter. The second step is where computers usually fail.

Interestingly, CAPTCHAs are providing more than just protection against bots. The reCAPTCHA system also helps digitize books. It does this by using two words. One word is just a normal CAPTCHA. The other word comes from a scanned book that the computer wasn't able to make sense of. If the first word was correct, it's assumed the second one is too. In this way, books are digitized faster and more accurately.


Cute-lil'-Yaris said...

I sometimes find them annoying. Have you ever gotten one of those CAPTCHAs that are kind of ambiguous when it comes to the 0's, o's, and O's?

Just a random thought, some CAPTCHAs do so well keeping humans from doing whatever on the internet too.

For more CAPTCHA examples, our electronic music blog @ http://soulflex.blogspot.com provides links to music downloads AND CAPTCHA inputs before you can listen. :)

Jose Alvarez said...

I hate those things. They keep making them harder. They are getting so hard that I now get them wrong sometimes multiple times in a row on a website. They should also not use characters that can mixed up like I’s and L’s, O’s and 0’s are bad too.