ABC Book

	ABC Book D: Databases and Digital Lib'es
C-D pdf (2.5MB): click pic
Preface A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Pics

Home

Short bio

Vita

texts by Gries

Interactive programming exercises

Teaching OO using Java

Calculational
logic

festive occasions

ABC book

CS Faculty over the years

CS@Cornell

The Triple-I Administration

How Bush Operated

D is for Databases and Digital Lib’es.
In CS at Cornell, our D’s have good vibes.
The groups are not big, but their projects are,
For their data has reached the petabyte bar.

Our database/datamining group —Alan Demers, Johannes Gehrke, Jay Shanmugasundaram, and researchers Mirek Riedewald and Walker White— are doing neat things:

Demers and Gehrke collaborate with Astronomer Jim Cordes on data gathering and managing a new petabyte-sized database of pulsars in the Milky Way. Riedewald and machine-learning expert Rich Caruana are helping Cornell’s renowned Lab of Ornithology with the database of volunteer-reported bird sightings, the largest and longest-running resource of environmental time-series data in existence. Gehrke, Demers, and Shanmugasundaram , along with Bill Arms, Dan Huttenlocher, and Jon Kleinberg, are working with the Internet Archive to manage and study the 40-billion Web pages archived by the Wayback Machine, the time machine of the internet. As you can see, we’re big on humongous, petabyte database problems.

The database people naturally talk to the digital library and Web people —Bill Arms and researchers Dean Krafft and Carl Lagoze. These guys have been heavily involved in work on digital publishing for years and are now main cogs in the NSF National Science Digital Library (NSDL) project. See letter K for mention of their work on electronic publishing.