Home Contents Search

Computation
Up ] QoS ] Scaling/Security ] Access ] [ Computation ]

 

Current Status
Previous Status

Creating a Scalable, Distributed Computation Resource

Original Project Proposal

While new Intel-architecture desktop and server machines can provide very significant computation resources to the individual researcher or small research group, there are frequently problems that require far more computation than any desktop or single server can provide. In the past, these types of computationally intensive problems were typically run on special-purpose, and highly expensive, supercomputers such as the Cray. More recently, Cornell’s leadership role in globally scalable parallel computing has led to the development of a new kind of supercomputer based on commodity processors, the IBM SP. This approach has proved highly effective in solving a wide range of computationally intensive problems.

We believe that it is time to take the next step toward decreasing the costs and increasing the availability of massive, scalable computation power. Our vision is of a Scalable, Distributed Computation Resource (SDCR) that is created, as needed, from heterogeneous server and desktop machines based on the Intel Architecture. To achieve this vision, it is important that IA workstations sitting on researchers’ desks and servers in local research clusters have the capability of dynamically becoming an integrated part of this massive computing environment.

Cornell University is in an ideal position to research and implement this new computational model. While at Argonne National Labs, David Lifka, who is now with the Cornell Theory Center (CTC), developed the EASY scheduling system for parallel processing supercomputers. This system has been integrated with both LoadLeveller and LSF to provide for scheduling of IBM SP, SGI Onyx, and a wide range of other Unix-based systems. Since arriving at Cornell, Lifka has been working on the Advanced Resource Management System (ARMS), which extends resource management and scheduling to distributed systems. In addition, a companion project is the work of Ken Birman in developing research and production systems to provide guarantees such as reliability, high availability, fault-tolerance, consistency, security and real-time responsiveness.

We propose to combine these efforts to create an SDCR comprised of the research clusters and desktop systems being donated by Intel through this grant. The machines will be connected via the Cornell campus network using Fast Ethernet and ATM at speeds of 100Mbps and higher. Using the Advanced Resource Management System (ARMS), researchers will be able to create computational jobs that will run on resources ranging from a small number of nodes in a cluster local to their departments, units, colleges, or divisions all the way to the total computational resource comprised of hundreds of IA processors. Later versions of the system will allow users interactive access to and control of their computational tasks.

To accomplish this we plan to:

Integrate MPI/NT & MPI Java into key programming languages: C, C++, HPF, etc.
Create the ARMS-NT version of the resource management system. An intelligent heterogeneous distributed resource management system and job scheduler is essential to provide deterministic and efficient use of resources. Cornell is developing Java-based resource management agents for Windows NT, allowing users to be part of the distributed system without making a significant software investment.
Investigate application-level performance of NT-based SDCR applications. Building on our strength in the area of performance measurement and evaluation, we will analyze the performance and identify the bottlenecks in cross-cluster SDCR calculations.
Port relevant financial and scientific applications to NT to explore the cost effectiveness of "commodity resources." Examples of applications include:
Financial analysis and securities modeling.
X-ray crystallograhic studies and molecular structure determination.
Molecular- and atomistic-based simulations of material properties.
Create tools to allow general users to interact with and "steer" financial and scientific applications. This allows the applications to support interactive simulations and visualizations and to provide immediate feedback to users on the progress and direction of their computations.

Participants

David Lifka, Researcher, Advanced Computing Research Institute

 

 

Back Home Up

Last modified on: 10/05/99