Nearly every time you fill out an online form with a CAPTCHA you are helping digitize old books. Read on to understand what any of this means…
What is a CAPTCHA?
You’re already familiar with CAPTCHAs. In fact you probably despise them since, frequently, you get them wrong seven times in a row and have to spend 35 minutes filling out a 10 second form. Here are a few examples ranging in difficulty:
The basic idea behind CAPTCHAS
Mathieu Blondel explains the idea behind a CAPTCHA in his article on using internet users to do something useful:
The idea of CAPTCHA is to ask users to answer a question that only humans can answer in order to check whether they are a human or a computer program. For example, in order to prevent massive creation of email accounts by computer programs (or by malicious hackers – Jack), most email services ask new subscribers to write the text present in a image. The image is distorted so that it is not possible for a computer program to recognize it (since computers can only easily read normal text fonts, not distorted images – Jack) but it is still readable by a human.
How filling out CAPTCHAS helps digitize old books
The writing in ancient books can have a very poor quality or might even be handwritten and so OCR (Optical Character Recognition) software, which easily converts modern scanned books into digitized text, is unable to do the same with older books.
A brilliant idea, the ReCAPTCHA, addresses this problem by harnessing the manpower of the millions of people filling in CAPTCHAs every day.
With ReCAPTCHA users are presented with two words to identify. The first word is used to check whether the user is a human or not – this can be tested since the website knows the correct answer already and simply checks if you got it right, thus confirming your hominid status.
The second word is from an old text. The website does not know the answer – it wants to find out the answer in order to digitize that book. It is not to test whether or not you are human – it already confirmed this. You type in your answer, as part of the CAPTCHA experience and then the computer learns from your input, helping it digitize old texts. You never knew, did you?
Why harnessing users is a great idea
ReCAPTCHA is the modern equivalent of powering your bicycle’s headlamp by pedaling: an additional goal is fulfilled whilst still completing the original goal. The pedaling action provides not only forward momentum but also light. ReCAPTCHA not only identifies a user as human but also helps digitize ancient books. Recall that carnivorous old saying “hitting two birds with one stone”.
Other ways to harness users
Ask Facebook users whether they fancy someone or not.
Compare these answers with the observed data in Facebook’s database – information such as the number of times profile was viewed, frequency and length of time spent on profile, time spent going through photos, number of wall posts, time spent writing wall posts etc.
There are too many variables here to try and figure out what combination of factors (such as time spent on profile, photos looked at) indicates fancying. If, however, millions of users were asked whether they liked other users, Facebook could learn to identify the patterns which indicate fancying using the millions of worked out examples to extrapolate a formula. This is the basis of one of my Top 5 interests in the world – assisted machine learning.
Be sure to add your uses in the comments – good ones will get featured alongside a link to your blog.
Credit to Amaru Villanueva Rance for pointing out the ReCAPTCHA to me
Related posts







Like this? Then submit to a social bookmarking site
DiggEverything you ever wanted to know about social bookmarking