Digital Libraries and Information Science
Computational Molecular Biology
Program of Computer Graphics
Cornell Theory Center
Outreach and Development

Interdisciplinary

Digital Libraries and Information Science

A fundamental challenge in building information services on the Internet is interoperability and manageability — how to combine independently managed collections and services distributed around the world into a coherent whole made available in a reliable and secure manner. Cornell has developed two architectures for interoperability: a production system called Dienst and Fedora, which is used for experimentation with new concepts of information structuring, extensibility, and security. Dienst is being used to build a prototype national digital library for science education for the National Science Foundation.

Interest in integrity and availability of distributed information spaces has led to research in the theory and practice of metadata (data about data), and in reference linking (automatic methods to link references in one source of information to the target that they refer to). Our aim is to develop general principles that can be applied to wide varieties of digital objects, while minimizing the costs of adoption.

Long-term availability is a key aspect of information integrity. The Cornell University Library was a pioneer in conversion of historic documents to digital form and remains one of the leaders in this area. Computer science and the library have expanded this work on long-term preservation to all classes of digital objects, especially those that originate in digital form. This is a primary focus of the Prism project, which is funded by a large grant from the NSF. In addition, we are advising the Library of Congress on preservation of the Web.

The technology of digital libraries is interwoven with social, legal, and economic issues. Cornell has a particular interest in lowering the cost of access to scientific information. This work includes experimental publications, studies of the economics of open-access publishing, reducing the cost of metadata, and the use of automated digital libraries to minimize the cost of research libraries.

Several groups within Cornell cooperate in this area, including the Department of Computer Science, the Human-Computer Interface group in the Department of Communication, and the Cornell University Library. Beyond Cornell, we have numerous joint research projects including two international research grants. Both Dienst and the Fedora concepts are widely adopted by other research groups. A recent initiative is the extension of digital libraries to mobile and wireless computing. The research is supported by the NSF, DARPA, Intel, and the Library of Congress.

 

Computational Molecular Biology & Computer Science

The recent completion of the human genome project underlines the need for new computational and theoretical tools in modern biology. The tools are essential for analyzing, understanding and manipulating the detailed information on life we now have at our disposal. Problems in computational molecular biology vary from understanding sequence data to the analysis of shapes and the prediction of biological function.

Cornell has a university-wide plan in the science of genomics; the Department of Computer Science is playing a critical role in this initiative. Researchers in the computer science department are engaged in a wide range of computational biology projects. Below a few of the ongoing research endeavors are described:

Sequence analysis

• David Shmoys is studying approximate algorithms for genetic mapping. Identifying the locations of markers on the genome (genetic linkage mapping) is a hard computational problem. Algorithms are developed that reduce significantly the cost of the (wet lab) experiments and improve the accuracy of the resulting maps.
• Golan Yona, who will join the computer science department in January 2001, is working on clustering the "protein universe." The known sequences of proteins are classified into families. Interesting and new protein families are identified and are subject to further investigation.
• Ron Elber has developed a new system for threading in three dimensions, called LOOPP. Threading is a matching of a sequence into a protein shape. The system was trained on tens of millions of data points, and can detect highly remote evolutionary relationships between proteins. In a recent publication in Science, LOOPP proposed an evolutionary link between the biological mechanism that controls the size of the tomato fruit and the mechanism responsible for the development of cancer.

Studies of protein shapes

• Paul Chew, Klara Kedem, Jon Kleinberg, and Dan Huttenlocher develop algorithms for matching and identifying structural similarities in proteins. Ide ally, we wish to manipulate three-dimensional objects, such as proteins, with the same ease that strings (sequences) are studied. The new algorithm, URMS, is an efficient and accurate measure of protein similarities. It is used to study complete protein chains and to find similar fragments.
• Jon Kleinberg studies simple models of threading. He made the intriguing observation that an existing simplified protein model (called the H/P model) can be mapped to the Max Flow problem. The resulting exact algorithms are significantly better than those achieved by heuristic approaches used in earlier studies.

Dynamics and function of biological molecules

• Ron Elber studies algorithms for simulating the long-time behavior of biological molecules. The new SDE algorithm provides an additional link between studies of structures and studies of function. Collaborations outside the department Other key elements of the computational biology initiative include the Computational Genomics Institute and the NIH National Center for Research Resources at the Cornell Theory Center. The NIH NCRR provides an extensive set of software tools for computational biology and a server that identifies protein families from sequences. The tools were developed in part in the Department of Computer Science. Computational biology is also a crucial part of the recently announced $160M collaboration between Cornell and the Rockefeller and Sloan-Kettering institutes. The computer science department plays an important role in establishing the collaboration and enhancing the intellectual links between the different institutes.

Education

A new graduate program in Computational Molecular Biology that crosses colleges was initiated with the participation of the computer science field. A concentration in Structural Biology for undergraduate students majoring in computer science was also established.

 

Program of Computer Graphics

The Program of Computer Graphics is best known for pioneering work on realistic image synthesis, including the radiosity method for calculating direct and indirect illumination in synthetic scenes. Our long-term goal is to develop physically-based lighting models and perceptually-based rendering procedures to produce images that are visually and measurably indistinguishable from real-world scenes and to generate these images in real time.

Over the past two decades, we have articulated and refined a framework for global illumination research incorporating light reflection models, energy transport simulation, and visual display algorithms. Our current goal is to solve these computationally demanding simulations as fast as possible using an experimental cluster of tightly coupled processors and specialized display hardware. We are achieving this goal by taking advantage of increased on-chip processing power, distributed processing using shared memory resources, and instructional-level parallelism of algorithms.

Our graphics research also involves three-dimensional modeling of very complex environments and new approaches for modeling architectural designs. We have developed a new interaction paradigm for architectural sketch modeling that supports direct sketching with a pen on a large display surface. Traditional sketching skills are augmented through 3D interfaces which merge conceptual design with rendered 3D models and allow collaborative sketching across networks, whether in the same room or across the country. These new tools are being tested each semester in a unique undergraduate architectural design studio in our lab.

New developments in image capture are also rapidly changing the way we model and render 3D environments. By extracting depth and orientation from series of images, we cannot only reconstruct seamless panoramas for passive viewing but can merge image data into 3D models for active design manipulation. Both these research projects take full advantage of a calibrated, wide-field display system that provides a life-size, twenty-foot wide image with more than four megapixels of resolution at interactive frame rates.

Our lab has been a pioneer in distance learning through the NSF Graphics and Visualization Center, a distributed center for fundamental research in computer graphics. We have seven years of experience working together remotely, including teaching a collaborative advanced seminar in computer graphics across our five sites (Brown, Caltech, Cornell, UNC-Chapel Hill, and the University of Utah). The value of dedicated, high-bandwidth connections has been proven, but we are pushing forward to enhance the sense of direct person-to-person contact for distance learning through improved telepresence and innovative educational approaches.

 

The Cornell Theory Center

The Cornell Theory Center (CTC) is Cornell's high-performance computing and interdisciplinary research center. CTC's main technical research and development thrust is in large-scale Windows 2000-based cluster computing. Through its Advanced Cluster Computing Consortium (AC3), CTC acquired a 256-processor cluster — AC3 Velocity — that consists of 64 Dell PowerEdge servers, each with four Intel Pentium III Xeon 500 mhz processors and running Microsoft Windows 2000. The primary cluster interconnect is provided by Giganet, Inc. Cornell is one of the leading institutions for computational science and engineering in the country, due in large part to the resources and expertise available at CTC.

Researchers associated with CTC work in some of the most computationally challenging fields. CTC acquired a second cluster, Velocity+, which consists of 64 Dell dual PowerEdge servers and is dedicated to the strategic applications of protein folding/structural biology and multiscale materials modeling, both of which require huge amounts of computing resources.

Additional interdisciplinary research focus areas in clude: Computational Finance — projects such as investigating new optimization algorithms for large-scale portfolio analysis and value-at-risk calculations. Computational Genomics — development of highly advanced tools for large-scale data acquisition and analysis to understand the origins of life and the molecular processes that underlie life.

These projects also benefit from CTC's extensive visualization expertise and resources, including a three-wall CAVE virtual reality environment, where scientists can "immerse" themselves in their application. CTC is an integral part of Cornell's new Computing and Information Sciences initiative, and is active in attracting new communities, such as business, the arts, and the social sciences, to advanced computing and information technologies. CTC works closely with its AC3 infrastructure members, including Dell Computer Corporation, Intel Corporation, Microsoft Corporation, and Giganet, Inc., and with a range of corporations interested in implementing state-of-the-art cluster environments and in having a strategic window into future technologies.

 

Development Outreach to the Ithaca Community

Recently there has been increasing national and international attention focused on computer literacy; avoiding the development of a socially or financially defined computer underclass is especially important as computer-based information becomes ubiquitous. Through part of a HUD grant (of about $400K) awarded to Professor Patricia Pollak of the Department of Policy Analysis and Management, Cornell's Department of Computer Science is working actively with the Southside Community Center to build a computer lab to help develop computing skills across the "digital divide." This program was one of about 10 funded nationally, and through Professor Graeme Bailey is drawing together undergraduates from the Cornell computer science program (and Ithaca College students through Professor Wanda Dann) to offer hands-on computer training to children, teens and adults in the community.

The lab recently received city funding through a Community Development Block Grant of $45K to purchase equipment and software. Local parents and others in the community are planning to help assemble the lab to a design being produced by a class taught by Professor Alan Hedge of the Department of Design and Environmental Analysis. This whole endeavor is building significant momentum as it draws together people from many parts of the university, local human service agencies, and committed volunteers, and should become a showcase for the benefits of cross-disciplinary interactions that occur as by-products of the evolving FCI.