Date Posted: 9/30/2004

The NSF has awarded $1.8 Million to Computer Science, Astronomy, the Program in Computer Graphics, and the Cornell Theory Center for support of research in these  "data intensive" areas: 1. Large-scale sky charts build upon Arecibo data. 2. Physically accurate rendering in computer graphics 3. Structure and evolution of the web. The overall "Petabyte Storage devices for Data-Driven Science" project will be directed by Al Demers. Al is joined by Prof Jim Cordes from Astronomy and  Profs Kavita Bala, Steve Marschner, Dan Huttenlocher, Jon Kleinberg, Bill Arms, Johannes Gehrke and Jai Shanmugasundaram from CS. Congratulations to all the involved faculty and staff! Below is the CTC press release.

The National Science Foundation (NSF) has awarded Cornell University $1.8 million to develop an information access and analysis system that will meet the data-intensive needs of three landmark research projects at Cornell University . The award, from the NSF's directorate of Computer and Information Science and Engineering, will support projects conducted by Cornell's Computer Science and Astronomy departments, and the Program for Computer Graphics, with assistance from the Cornell Theory Center (CTC). Microsoft, Unisys, and Intel Corporation are also contributing to the project. Research results will be available to a world-wide community through a proposed Web-services-based infrastructure that will allow applications to interoperate across programming languages, platforms, and operating systems.

"We were impressed with the level and depth of collaboration involved in this initiative," said University Provost Carolyn Martin . "The credentials of the departments, all recognized as leaders in their fields, and the partnership with a production facility that is skilled in providing scalable, high-performance computing, represent a powerful combination of expertise and potential."

In the first year of the grant, the projects will be supported by one Unisys ES7000/430 server with 32 Intel® Itanium® 2 processor-based nodes and 100 terabytes of online disk space. Additional storage and networking upgrades will be purchased in subsequent years to meet the growing data storage needs of the projects. By the final year of the grant, total storage will be more than a petabyte. The large-scale information access and analysis system will be housed at and maintained by CTC and will be tightly coupled with the Center's high-performance computing complex.

"With the technology that this grant will fund, research has the potential to become more data-driven and exploit modeling, an activity that is central to science and engineering," said professor of computer science Alan Demers, who is the principal investigator of the five-year grant. "Service-oriented interfaces and easily accessible Web interfaces to data management and analysis tools will revolutionize how scientists conduct research and interact with data and data processing capabilities."

"The rapid growth in the generation of digital data is changing computational science in a fundamental way," said David Lifka , chief technology officer of CTC. "Traditionally, the scope of computational problems was limited by the available processing power. High-performance computing's ability to handle data- and computationally intensive problems has made this a non-issue. However, the lack of an infrastructure capable of managing, searching, and interpreting creates a new bottleneck for data-intensive problems. Modern data-intensive applications need both high-performance computing resources and an infrastructure capable of information access and analysis."

The new infrastructure supported by this grant will further research being conducted by Professor of Astronomy James Cordes, Assistant Professors of Computer Science Steve Marschner and Kavita Bala, and Associate Professor of Computer Science Jon Kleinberg and Professor Dan Huttenlocher.

Large Scale Astronomical Surveys Using the Arecibo Telescope

Led by Cordes, this team of researchers will analyze data from the Arecibo Telescope to find pulsars and other exotic objects in the project titled, "Large-Scale Astronomical Surveys using the Arecibo Telescope." The Arecibo telescope is the world's largest radio telescope in terms of collecting area and thus can conduct the most sensitive surveys for point-like objects. A new multi-beam feed system has increased the power of the facility by a factor of seven.

"The pulsar surveys will be the deepest (reaching to the greatest distances) ever undertaken and are expected to yield not only about 1000 new pulsars, but also exotic objects, including millisecond pulsars spinning near the break-up speed of a neutron star; neutron stars in compact binaries with orbital periods of a few hours or less; and companion stars that are other neutron stars or black holes," said Cordes. "These discoveries are expected to provide numerous opportunities for follow-up research on the equation of state of nuclear matter, gravitation physics, and gravitational waves."

The proposed surveys for pulsars include searching the entire Galactic plane of the Milky Way visible with Arecibo and also searching further out of the Galactic plane in a shallower survey to find millisecond pulsars and binary pulsars. To analyze this amount of raw data efficiently, the astronomers on the project will collaborate with the Department of Computer Science's database group, which has developed some of the fastest known data mining algorithms.

"We anticipate that our results will be of considerable value to astronomers and astrophysicists world-wide," said Cordes. "The service-oriented interface to the data that will be implemented will allow users from all over the world to interactively query the multidimensional search space and to allow interactive and efficient exploration of the data set."

Physically Accurate Rendering in Computer Graphics

In this project, Marschner, Bala, and the team will research light reflection how light reflects from complex objects and structures and how reflection can be handled efficiently in rendering systems.  The data will originate from a spherical gantry, a versatile four-axis motion system designed for optical scattering measurement.  Complex three-dimensional objects, as well as complex materials such as skin, hair, and cloth, will be illuminated from thousands of directions, and the reflected light will be measured using a camera from thousands more directions.  The Spherical Gantry will generate approximately 50 terabytes of data per scanned object that can be used to computationally model the actual object.  This data will be used in research on the fundamental properties of materials as well as on how to represent complex objects efficiently and realistically.

"This will be the first study ever undertaken at this level of accuracy," said Marschner. "Physically accurate rendering is an important goal in computer graphics, and of interest to archaeologists, librarians, and others. One application for this kind of data is a virtual museum. Rare artifacts can be digitized into representations that are accurate enough to produce highly realistic views from any reasonable distance and under any kind of lighting. The digitized images could be placed in a single environment, creating a real-time, fully realistic, immersive experience of a museum collection that could never be assembled in reality."

Currently, Marschner is using measurements of real materials to develop better reflectance models and Bala is exploring new rendering approaches that can use measured or precomputed data to produce interactive, physically accurate renderings. Both of these research thrusts require a very large and scalable storage infrastructure to achieve their full potential.

The Structure and Evolution of the World Wide Web

In the project led by Kleinberg and Huttenlocher, the research team will develop new and precise models for how the Web evolves with time. These models will distinguish measurement-independent properties from those that are influenced by the method of measurement. The research will include algorithms for understanding the structure and evolution of the Web, studies of the Deep Web, and the related areas of scientific publishing and digital libraries.

"Our goal is to develop techniques for simultaneously studying the evolution of the full Web and of specific, highly visible Web sites," the researchers said. "Because of current relationships with organizations such as Internet Archive and resident expertise, Cornell is extremely well positioned to examine these kinds of questions on large Web data sets, but is currently limited by the lack of availability of large disk storage coupled with the fast computing power necessary to run our algorithms on large data sets."

The research will define techniques to identify and analyze rapidly changing content on the Web, for example, detecting hot topics and determining the evolution of topics over time. These techniques can highlight portions of the Web that are undergoing rapid change at any point in time, to archive and summarize the Web content surrounding a fast-breaking news story, and to provide a means of structuring the content of emerging media like Weblogs.