Tuesday, June 12, 2007

Books: Saving the Books - One Word at a Time

reCAPTCHA
http://recaptcha.net/


So, most of us have had to enter a word from warped text when we sign up for a free newsletter or something else. reCAPTCHA has found a way to use this to help in digitizing books.


The quick, very simple explanation on digitizing books is that you can scan the book and have an image available. In order to make the text searchable, you must do another step using optical character recognitition (OCR) so that all of the text can be read by the system. With older books and odd fonts, the OCR system may have trouble reading all of the characters correctly.


reCAPTCHA helps the system know the correct characters by these words with the warped words to humans so that humans read the text and enter the correct characters for the system to store and add to the text of the book. The human eye is still better at deciphering strange characters than a computer, and reCAPTCHA is taking advantage of that - one word at a time.


Now, that is a crude explanation, but I hope it makes sense. reCAPTCHA has a much prettier explanation on their page.




found via Chronicle: Wired Campus blog
(and Brad Baxter's post to an internal list at work)

No comments: