Computational

[ Up ] [ Computational ] [ Bioinformatics ] [ Biometrics ] [ Simulation ] [ Change Models ]

Computational Biology Parallel Processing Cluster

We will establish a 56-node parallel processing Pentium cluster for shared use by computational biology groups throughout the Division of Biological Sciences. Our goal is to evaluate this system for a spectrum of cutting-edge research applications needing mid-level parallel processing support. This will involve migration of applications upwards from workstations and downwards from the massively-parallel 512-node IBM SP2 at the Cornell Theory Center. Comparisons will be made between the three types of platforms and between NT and Unix operating systems. Research topics include protein structure prediction and determination, bioinformatics, and molecular phylogenetic analysis. The cluster will be located in a dedicated facility in the Biotechnology Building; some units will also be used as high-performance graphics workstations for macromolecular modeling and biophysical imaging. The cluster will participate in the Scalable Distributed Computational Resource (SDCR – described in section 2.4) to permit the most efficient distributed use of computing cycles across campus.

Computational biology is one of the most rapidly growing areas of scientific computation. For example, the computational protein folding problem has been designated as a "Grand Challenge" problem by the National Science Foundation. Cornell, the Human Genome Project, and other DNA sequencing projects are generating gigabytes of data and opening entire new areas of computer-intensive research. Cornell is a leader in many of these areas, particularly in experimental and theoretical protein structure analysis and phylogenetic analysis. However, at present IA machines are seldom used in this important area – most studies are performed on Unix-based workstations or on massively-parallel supercomputers such as the 512 node IBM/SP2 at the Cornell Theory Center. The rapid improvement in performance/price of Pentium processors is changing the equation and the time is ripe for porting applications to the IA/NT platform.

We plan to develop, migrate, and use a wide spectrum of biological application. The system will function as a high-performance (hypercube connectivity) unit embedded in an extended campus-wide network composed primarily of stand-alone computers that can provide excess cycles to enhance productivity. This project will enable us to assess and improve the ability of the system to share cycles (both ways) in this mixed environment and to conduct research including:

Computational protein structure analysis and structure-based drug design: We will develop algorithms for computational protein structure prediction and structure-based drug design.

Computational refinement of protein X-Ray crystallography data: The Cornell High Energy Synchrotron Source is a major center for X-ray crystallography and generates voluminous datasets requiring computational refinement. Parallelized versions of the XPLOR code (in FORTRAN) will be ported to the IA cluster to carry out this work.

Computational analysis of multidimensional nuclear magnetic resonance (NMR) protein structure data: NMR data analysis for studying protein structure and dynamics is computationally intensive. We will port several software applications including some distributed from the National Institutes of Health.

Molecular biology systematics and evolutionary studies: We will use computational approaches to analyze DNA sequence variation within and between species. We will implement both phylogenetic reconstruction software and simulation programs on the cluster. The analysis of nuclear genes with associated recombination makes the larger datasets computationally intensive.

Biophysical Imaging: We will use advanced image enhancement techniques to create three-dimensional visual reconstructions from two-dimensional magnetic resonance images, for example of living spiders in the process of producing silk.

Molecular Neurobiology: Monte Carlo simulations of neurotransmitter release and neuron communication are exceeding workstation capabilities. We will develop parallelized modeling on the cluster.

Participants

David Shalloway, Greater Philadelphia Professor in Biological Science

Last modified on: 10/05/99