D is for Databases and Digital Lib’es.
In CS at Cornell, our D’s have good vibes.
The groups are not big, but their projects are,
For their data has reached the petabyte bar.
Our database/datamining group —Alan
Demers, Johannes Gehrke, Jay Shanmugasundaram, and researchers Mirek
Riedewald and Walker White— are
doing neat things:
Demers and Gehrke collaborate with Astronomer Jim Cordes on data gathering
and managing a new petabyte-sized database of pulsars in the Milky Way.
Riedewald and machine-learning expert Rich Caruana are helping Cornell’s
renowned Lab of Ornithology with the database of volunteer-reported bird
sightings, the largest and longest-running resource of environmental
time-series data in existence. Gehrke, Demers, and Shanmugasundaram ,
along with Bill Arms, Dan Huttenlocher, and Jon Kleinberg, are working
with the Internet Archive to manage and study the 40-billion Web pages
archived by the Wayback Machine, the time machine of the internet. As
you can see, we’re big on humongous, petabyte database problems.
The database people naturally talk to the digital library and Web people —Bill
Arms and researchers Dean Krafft and Carl Lagoze. These guys have been
heavily involved in work on digital publishing for years and are now
main cogs in the NSF National Science Digital Library (NSDL) project.
See letter K for mention of their work on electronic publishing.