Home Contents Search

Previous Status
Up ] Current Status ] [ Previous Status ]

 

Computational Biology Cluster - 4th Quater Status Report

The Computational Biology Cluster has grown to include 16 dual 333 MHz and 5 dual 450 MHz Pentium II workstations in addition to the original 4 200 MHz dual and 200 MHz quad Pentium II servers.  High speed connectivity between all machines is provided by two 100 Mbs Intel fast ethernet switches. The Computational Biology Cluster has been made available to the biological computing community by remotely incorporating several of the dual workstations into the Theory Center Cluster where the a batch scheduler is being developed. Thus far the cluster has been used to analyze multidimensional nuclear magnetic resonance spectra for determination of protein structure, Monte Carlo simulation of biomembranes, comparative analysis of molecular dynamics algorithms, and large scale parallel computation of protein and peptide structural hierarchies.  The protein structural hierarchy calculations made use of 40 processors in a master/slave configuration implemented using the Windows NT Parallel Virtual Machine (PVM) message passing library. These calculations ran continuously for over 48 hours and generated data sets of over 100 GB per run. These large data sets were stored on hard disks distributed throughout the cluster and are accessed by a distributed data manager that also uses PVM. These calculations demonstrate the feasibility of large-scale parallel computation on high-performance Pentium-based computers running Windows NT.

Creating a Scalable, Distributed Computation Resourse Status Report - 2nd Quarter

Computational Biology Parallel Processing Cluster

We are establishing a 56-node parallel processing Pentium cluster for shared use by computational biology groups throughout the Division of Biological Sciences. Our goal is to evaluate this system for a spectrum of cutting-edge research applications needing mid-level parallel processing support. This will involve migration of applications upwards from workstations and downwards from the massively-parallel 512-node IBM SP2 at the Cornell Theory Center. Comparisons will be made between the three types of platforms and between NT and Unix operating systems. Research topics include protein structure prediction and determination, bioinformatics, and molecular phylogenetic analysis. The cluster has been physically established as a dedicated facility in the Biotechnology Building; some units will also be used as high-performance graphics workstations for macromolecular modeling and biophysical imaging. The cluster will be operational as soon as the scheduling software (from the ARMS component of the project) is ready. The cluster will participate in the Scalable Distributed Computational Resource (SDCR – described in section 2.4) to permit the most efficient distributed use of computing cycles across campus.

Computational biology is one of the most rapidly growing areas of scientific computation. For example, the computational protein folding problem has been designated as a "Grand Challenge" problem by the National Science Foundation. Cornell, the Human Genome Project, and other DNA sequencing projects are generating gigabytes of data and opening entire new areas of computer-intensive research. Cornell is a leader in many of these areas, particularly in experimental and theoretical protein structure analysis and phylogenetic analysis. However, at present IA machines are seldom used in this important area – most studies are performed on Unix-based workstations or on massively-parallel supercomputers such as the 512 node IBM/SP2 at the Cornell Theory Center. The rapid improvement in performance/price of Pentium processors is changing the equation and the time is ripe for porting applications to the IA/NT platform.

We plan to develop, migrate, and use a wide spectrum of biological application. The system will function as a high-performance (hypercube connectivity) unit embedded in an extended campus-wide network composed primarily of stand-alone computers that can provide excess cycles to enhance productivity. This project will enable us to assess and improve the ability of the system to share cycles (both ways) in this mixed environment and to conduct research including:

Computational protein structure analysis and structure-based drug design: We will develop algorithms for computational protein structure prediction and structure-based drug design.

Computational refinement of protein X-Ray crystallography data: The Cornell High Energy Synchrotron Source is a major center for X-ray crystallography and generates voluminous datasets requiring computational refinement. Parallelized versions of the XPLOR code (in FORTRAN) will be ported to the IA cluster to carry out this work.

Computational analysis of multidimensional nuclear magnetic resonance (NMR) protein structure data: NMR data analysis for studying protein structure and dynamics is computationally intensive. We will port several software applications including some distributed from the National Institutes of Health.

Molecular biology systematics and evolutionary studies: We will use computational approaches to analyze DNA sequence variation within and between species. We will implement both phylogenetic reconstruction software and simulation programs on the cluster. The analysis of nuclear genes with associated recombination makes the larger datasets computationally intensive.

Biophysical Imaging: We will use advanced image enhancement techniques to create three-dimensional visual reconstructions from two-dimensional magnetic resonance images, for example of living spiders in the process of producing silk.

Molecular Neurobiology: Monte Carlo simulations of neurotransmitter release and neuron communication are exceeding workstation capabilities. We will develop parallelized modeling on the cluster.

 

Computational Biology Parallel Processing Cluster Status Report - 1st Quarter

Two dual-processor machines have been installed in the groups of Profs. Nixon and Shalloway. Dr. Nixon's groups has been developing phylogenetic analysis software that will be implemented on the cluster. Dr. Shalloway's group, which is setting up the cluster, has been testing third-party software which will be needed for cluster operation and also developing some software for computational analysis of protein structure. Hardware tests have been conducted with Intel ethernet switches and five other machines that will be part of the cluster.  However, establishment of the cluster is waiting on the completion of a working version of the ARMS scheduling software which is being developed by the Theory Center group. We expect that this will be available by the end of April. We intend to have the cluster running shortly after that and begin to provide service to the biological computing community. Anticipated applications include computational analysis of protein structure, computational refinement of protein X-ray crystallography data, analysis of multidimensional nuclear magnetic resonance protein structure data, phylogenetic evolutionary and phylogenetic analysis of genomic sequence data, biophysical imaging, and simulations of neurotransmitter release. These will be performed by multiple independent research groups within the Division of Biological Sciences.

 

Back Home Up

Last modified on: 10/08/99