Gerard Salton
Lecture Series

Thursday, April 7, 2005
4:15 pm
B17 Upson Hall

Raj Reddy
Carnegie Mellon University

The Million Book Digital Library Project

Increases in storage densities and falling costs make it possible to envision a future when all the publicly available human knowledge is made available to anyone, anywhere at anytime. In spite of determined praiseworthy efforts for two decades, projects such as Guttenberg have only been able capture a few thousand books accessible online. At a rate of under a thousand books per year, the estimated 100 million books ever published in the world will take 100,000 years to digitize. And we may never be able to catch up with the ever increasing new publications.

Capturing born-digital publications at the time of creation (by requiring publishers to submit a digital copy as well the currently mandated physical copy) and scanning all the older publications at a rate of million books per year is one of the solutions being explored at this time to resolve this conundrum.

Digitizing a million books a year requires finding, scanning, processing, and storing in a web accessible form about 5000 every day.

The million book project is an attempt to understand and solve the technical, economic and social policy issues of providing online access to all creative works of the human race. This talk will provide a status report and discuss interesting research challenges that are arising out of this work.