Cornell Genomics Initiative

Computer Science and the Cornell Genomics Initiative

Since the early 1980s, a succession of technological advances has made it possible to perform large-scale DNA sequencing. New genomic data has been generated at an explosive rate: the volume has doubled every 15 months since about 1984. Extracting biological insight from this raw data is one of the central challenges of biology today.

The ability to relate genomic data to biological function would constitute a major step in understanding the process of life. Such an advance would likely have a significant impact on medicine and agriculture. Recognizing the importance of this challenge, a Genomics Task Force consisting of 40 Cornell faculty was convened last year to plan the course of genomics research at Cornell into the next century. The result of their labors is the Cornell Genomics Initiative, an ambitious plan for a major new interdisciplinary research program involving all the biological sciences, Engineering, and the Cornell Medical College. Areas of concentration will be

computational genomics and bioinformatics
mammalian genomics
plant genomics
microbial genomics
nanofabrication and bioengineering

The program has been endorsed by the University administration, and major resources have been committed. The Cornell Genomics Initiative is underway.

It is clear why Computer Science is a key participant in this effort. Dealing with the overwhelming flood of genomic information presents a bewildering array of computational challenges. There is a desperate need for tools to retrieve, compare, filter, visualize, and analyze massive quantities of genomic data spread among several sources and in different formats. The biological research community alone is not in a position to deal with the enormous technological problems involved in the production of these tools; expertise in high-performance scientific computing, data management, information retrieval, software engineering, statistics, stochastic processes, and graphics is required as well.

The plan for computational genomics and bioinformatics involves (among other things) the appointment of two new faculty members in Computer Science. In order to truly bridge the gap between fields, it was considered important to recruit someone whose primary training and research interests were in a biology-related field but who was conversant with computational aspects and would feel comfortable in a computer science department. With the help of colleagues in the biological sciences, CS recently identified a senior computational biochemist, Ron Elber, formerly of the Hebrew University in Jerusalem. Professor Elber will join the Department in January 1999. A search for an additional computational biologist is underway.

In addition, a Laboratory for Computational Genomics and Bioinformatics will be established under the aegis of the Cornell Theory Center. A core group of research personnel will be appointed, whose primary responsibility will be to develop computational tools and provide support under the direction of faculty in the biological sciences and CS. Most of the physical infrastructure for the laboratory is already in place. Besides facilities available in the respective academic departments, the Cornell Theory Center will provide the central hub of computational activity. Housed in Rhodes Hall, it is close to the Computer Science Department and the School of Operations Research and contains classrooms, computer labs, and offices for support personnel. It also houses the SP2, a high-performance supercomputer that would be available for computation-intensive applications. To coordinate activities between the Medical College and the Ithaca campus, a high-speed data link will be installed, which will allow the sharing of data between the two campuses and provide a medium for Web-based remote instruction.