Workshops and Tutorials
to be held in conjunction with
ICS'02

Workshops

Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN)
Fourth Annual Workshop on Java for High Performance Computers (JHPC'02)
Second Workshop on Caching, Coherence and Consistency (WC3 '02)
Performance Optimization via High-Level Languages and Libraries

Tutorials

InfiniBand Architecture: Where is it Headed and What will be the Impact on High Performance Computing?
Saturday 06/22, (Morning)
Performance Analysis and Prediction for Large-Scale Scientific Applications
Saturday 06/22 (Morning)
Embedded wireless networking using Bluetooth and 802.11: state-of-the-art and research challenges
Saturday 06/22 (Afternoon)
Minimally clocked microprocessor design
Sunday 06/23 (Morning)
Energy Management for Server Clusters
Sunday 06/23 (Afternoon)

Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN)
http://www.cse.psu.edu/~yyzhang/shaman/

held in conjunction with
16th Annual ACM International Conference on Supercomputing
New York City, NY, June 23rd, 2002

Organized by

Anand Sivasubramaniam
Penn State University
(anand@cse.psu.edu)

Mark Squillante
IBM Research
(mss@us.ibm.com)

Yanyong Zhang
Penn State University
(yyzhang@cse.psu.edu)

We are entering a new era in computing where we want to make it easier for users to avail of the high computing power that is available, and for system administrators in managing the computing resources. There is a critical need to be able to deliver systems that can automatically detect performance bottlenecks, and dynamically adapt the execution to fix themselves. At the same time, fault-tolerance is also an important criterion, wherein the system automatically needs to identify any faults and self-regulate its execution so that users and system administrators need not be concerned with such details. A recent IBM announcement also reiterates the importance of building such systems, which they refer to as "autonomic computing".

This workshop is intended to bring together researchers and industrial affiliates to begin exploring this new and challenging inter-disciplinary topic at all levels of the system architecture within the context of high performance computer systems.

In addition to paper presentations by researchers in this area, we are also intending to bring in industrial speakers to give their perspective on important research topics, and organizing a panel discussion on where future research is really needed.

Fourth Annual Workshop on Java for High Performance Computers (JHPC'02)

http://www.philippsen.com/events/jhpc.html

This will be the fourth annual workshop on Java for High Performance Computing. The purpose of the workshop is to provide a forum where researchers can report recent developments in the field of high performance computing with Java, and discuss work in progress with their colleagues. As in years past, the workshop will consist of presentations by the authors of accepted papers, and an open discussion trends in Java for high performance computing.

As Java JVM and compiler technologies become more mature, performance has become less of an impediment for using Java to solve problems that are computationally, storage or I/O intensive. Among the problem domains discussed in this workshop are parallel and distributed computing, numerical computing, compilation and optimization techniques, computationally intensive applications, database and other server-oriented applications, embedded systems, frameworks and libraries for high-performance computing tools and techniques for developing high-performance applications, object-oriented techniques for scientific computing and virtual Machine techniques targeting high performance computing.

Second Workshop on Caching, Coherence and Consistency (WC3 '02)

http://www.cs.rutgers.edu/~wc3/

New York, NY, USA

to be held in conjunction with the

16th Annual ACM International Conference on Supercomputing (ICS 2002)

The workshop aims to bring together researchers from various areas of computer science whose work is related to data caching, coherence, and consistency. Interestingly, these three topics have been present in the research agenda of several independent communities that usually do not meet. The interest in these topics started in the computer architecture community and now pervades in parallel and distributed systems research. There have also been significant efforts to address caching, coherence, and consistency topics using compiler, operating system, or application support. The same topics have been addressed by the operating system community in the context of file and storage systems for both servers and mobile systems. More recently, the interest in these issues has been revived by the web technologies, including content and service replication and distribution. This workshop is the first forum to bring together people from all these areas of research by recognizing that their specific caching, coherence, and consistency issues have common denominators that can lead to fruitful discussions and exchange of ideas. The first edition of this workshop, which took place in conjunction with ICS'01, was very successful and the participants were particularly appreciative to the idea of bringing researchers from all these areas together. This workshop continued the tradition of the workshops on software DSM, which were associated with ICS in 1999 and 2000.

Performance Optimization via High-Level Languages and Libraries

http://www.ece.lsu.edu/jxr/ics02workshop.html

The development of high-performance programs for scientific applications is usually very complicated. The effect of the algorithm choice on memory access costs, communication overhead etc. are often very complex. Currently available tools for software development and performance modeling/optimization do not provide adequate support to the developers of high-performance scientific applications. Often, the time to develop an efficient parallel program for a computational model is the primary limiting factor in the rate of progress of the science. Therefore approaches to automated synthesis of high-performance programs is very attractive and is the subject of active research at several universities and labs now.

Breaking down the traditional separation between applications development by domain scientists and systems software development by computer scientists, the aim of this workshop is to bring together researchers working on several aspects of this and related problems such as:

program synthesis to facilitate the development of high-performance programs for specific application domains such as signal processing, computational chemistry, etc.
efficient development of efficient programs from high-level mathematical languages like MATLAB.
development of efficient implementations of applications such as FFT for a variety of architectures, by exploiting structural properties of the specific application.
automatic optimization of library implementations together with the optimization of programs that use them.
efficient synthesis of recursive linear algebra codes that exploit deep memory hierarchies in current computer systems.

This workshop will be of interest of to reseachers and graduate students in several areas such as compilation technology, domain-specific languages, library development, problem-solving environments, etc.

Tutorial 1

InfiniBand Architecture: Where is it Headed and What will be the Impact on High Performance Computing?

Dhabaleswar K. Panda, The Ohio State University

Intended Audience:

This tutorial is intended for researchers, scientists, engineers, managers, developers, professors, and students engaged in research, design, and development of next generation high performance computing systems (clusters, servers, and data centers).

Abstract:

The emerging InfiniBand Architecture (IBA) standard is generating a lot of excitement towards building next generation high performance computing systems in a radical different manner. This is leading to the following common questions among many scientists, engineers, managers, developers, and users associated with High Performance Computing:

What is InfiniBand Architecture?
How is it different from other on-going developments and standardization effort such as Virtual Interface Architecture (VIA), PCI-X, Gigabit Ethernet, Rapid I/O, Hyper-transport, 3GIO, etc.?,
How does it perform compared to other contemporary interconnects (Myrinet, Gigabit Ethernet, and GigaNet)?
What unique features and benefits does IBA bring to designing next generation high performance computing systems?

This tutorial is designed to provide answers to the above questions. We will start with the background behind the origin of the IBA standard. Then we will make the attendees familiar with the novel features of IBA (such as elimination of the standard PCI-bus based architecture; provision for multiple transport services and mechanisms to support QoS and protection in the network; uniform treatment of interprocessor communication and I/O, hardware support for remote DMA, atomic, and multicast operations; support for virtual lanes and service levels; and support for low latency communication with Virtual Interface). We will compare and contrast the IBA standard with other on-going developments/standards. We will show how the IBA standard facilitates the next generation computing systems to be designed not only to deliver high performance but also RAS (Reliability, Availability, and Serviceability). Open research challenges in designing communication and I/O subsystems of next generation HPC systems with IBA will be outlined. Challenges in developing efficient programming model layers (Message Passing Interface (MPI), Distributed Shared Memory (DSM), and Get/Put) on top of IBA-based communication subsystems will be discussed. Performance numbers obtained on clusters with first generation InfiniBand products and their comparisons with other contemporary interconnects (Myrinet, Gigabit Ethernet, and GigaNet) will be presented. The tutorial will conclude with an overview of on-going IBA related research projects, IBA products, and the market time frame for the IBA products.

Outline:

Introduction
What is InfiniBand Architecture (IBA)?
Overview of IBA
Details of IBA Architecture
How is it Different from Other Technologies?
Unique Features and Benefits to High Performance Computing
Performance Numbers obtained for Clusters with IBA and their
Comparisons with Other Interconnects
Overview of On-going IBA Research
Overview of IBA Products and Time frame
Conclusions

Presenters' bio's:

Dhabaleswar K. Panda is a Professor of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance computing, user-level communication protocols, interprocessor communication and synchronization, network-based computing, and Quality of Service. He has published over 100 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on VIA and InfiniBand. His research group has collaborated with IBM T.J. Watson in designing a high performance VIA implementation for the IBM Netfinity cluster system and with Intel on designing a comprehensive micro-benchmark suite to evaluate VIA/IBA implementations. His research group is currently collaborating with Sandia National Laboratory and Mellanox (a leading company producing IBA Products) on designing next generation High Performance Computing systems with Infiniband.

Dr. Panda has served on Program Committees and Organizing Committees of several parallel processing and high performance computing conferences and on editorial boards for several parallel processing journals. He was General Co-Chair for the 2001 International Conference on Parallel Processing; Program Co-Chair of the 1999 International Conference on Parallel Processing, 1997 and 1998 Workshops on Communication and Architectural Support for Network-Based Parallel Computing (CANPC); Program Co-Chair of the Int'l Workshop on Communication Architecture for Clusters (CAC '01); an Associate Editor of the IEEE Transactions on Parallel and Distributed Computing; Co-Guest-Editor for two special issue volumes of Journal of Parallel and Distributed Computing on "Workstation Clusters and Network-based Computing"' an IEEE Distinguished Visitor Speaker and an IEEE Chapters Tutorials Program Speaker. Currently, he is serving as a Program Co-Chair of International Workshop on Communication Architecture for Clusters (CAC '02). Dr. Panda is a recipient of the NSF Faculty Early CAREER Development Award, the Lumley Research Award (1997 and 2001) at the Ohio State University, and an Ameritech Faculty Fellow Award. Dr. Panda is listed as a distinguished scientist in "Who'sWho in America" and in "American Men & Women of Science".

Tutorial 2

Performance Analysis and Prediction for Large-Scale Scientific Applications

http://public.lanl.gov/hjw/TUT

Adolfy Hoisie and Harvey Wasserman
Los Alamos National Laboratory

Intended audience:

The target audience is a mixture of computer scientists, computational scientists and code developers interested in performance analysis of parallel architectures and "real-life" applications. The tutorial will also be useful to those trying to define needs for future-generation, high-end computing systems, from either the buyers or the designers point of view.

Abstract:

This tutorial presents a methodical, simplified approach to performance analysis and modeling of large-scale, parallel, scientific applications. The heart of the tutorial covers analytical modeling of application scalability using several real case studies. The case studies demonstrate how performance modeling can be used to estimate performance that can be expected from a future computer system, diagnose system performance 'glitches' in comparison with true application performance during system installation, accurately identify performance bottlenecks in existing systems, provide a tuning "roadmap" to application developers, and enable "point-design" studies for computer architects designing new systems.

We will not emphasize any particular machine in the tutorial, nor performance rankings; rather, we will generally address performance of RISC processors and of widely utilized parallel systems such as the SGI Origin 2000, IBM SP2/3, Compaq HPC systems, clusters, and Cray T3E.

Presenters' bio's:

The authors are members of an internationally-recognized team of performance evaluation experts and have nearly thirty years combined experience in application benchmarking, optimization, and performance modeling. They have given numerous tutorials and invited and contributed lectures on performance at major conferences and at various universities and other institutions. One is a Gordon Bell prize winner and co-author of new SIAM monograph on performance. See http://www.c3.lanl.gov/par_arch/

Tutorial 3

Embedded wireless networking using Bluetooth and 802.11:
state-of-the-art and research challenges

Pravin Bhagwat (IIT Kanpur & Winlab, Rutgers University)

Intended Audience:

This tutorial is intended for researchers and practitioners who want to track new developments in short range wireless communication, but who don't have time or patience to read all specifications. Computer professionals who want to develop better understanding of technology trends and identify new market opportunities in the area of wireless networking will also benefit from this tutorial. Basic understanding of layered network architecture is expected. No background in analog radio, signal processing, or wireless communication is required.

Abstract:

The promise of untethered computing in the workplace is becoming a reality. IEEE 802.11b, the 11Mbps wireless LAN standard, has finally arrived, and early market response has been positive. As the WLAN market takes off, Bluetooth, another emerging standard for short-range wireless networking, is also gathering force. Several vendors have demonstrated Bluetooth products, including cordless headsets, PCMCIA cards, and LAN access points. Both standards are competing for the same airwaves, but are they also chasing the same market? Will Bluetooth and 802.11b complement each other, or will one technology eventually displace the other?

This tutorial will explain the key design aspects of 802.11 and Bluetooth standards and illustrate how technology innovation and market forces are shaping their evolution.

Outline:

Review of basic concepts (RF, signal processing) and technology trends (low cost, low power, small form factor)
Overview of Bluetooth 1.1 specifications
Overview of 802.11b specifications
Cost, form factor, power consumption, and co-existence of the two technologies
Future directions and open issues.

Presenter's bio:

Pravin Bhagwat is an entrepreneur and a well-known researcher in the area of wireless and mobile networking. Currently, he is directing a large-scale 802.11 deployment project in India and also working as a visiting professor in the computer science department, IIT Kanpur. He was the principal architect at Reefedge, Inc., a wireless networking infrastructure and software company based in NJ. He played an active role in the standardization of Bluetooth PAN profile and also served as the chair of the Internet Engineering Task Force BOF on IP over Bluetooth. Prior to working for ReefEdge, he worked as technology consultant in the Networking Research group at AT&T Labs-Research, and as a member of research staff at IBM Thomas J. Watson Research Center. He is the chief architect of BlueSky, an indoor wireless networking system for palmtop computers, and the co-inventor of TCP splicing, a technique for building fast application layer proxies. He actively serves on program committees of networking conferences and has published numerous technical papers and patents in the area of mobile computing and wireless communication. He received his Ph.D. in computer science from the University of Maryland, College Park. He also holds and adjunct faculty appointment at Winlab, Rutgers University.

Tutorial 4

Minimally clocked microprocessor design

Diana Marculescu (CMU), David Albonesi (Rochester), Pradip Bose (IBM)

Intended Audience:

This tutorial is intended to provide industry and university-based computer architects and processors designers with an overview of minimally clocked systems and the impact of such a design style on processor performance and power consumption.

Abstract:

This tutorial addresses the problem of minimally clocked processor design. Minimally clocked or Globally Asynchronous Locally Synchronous systems (GALS) are an intermediate style of design between synchronous and fully asynchronous systems. GALS systems contain several independent synchronous blocks which operate with their own local clocks and communicate asynchronously with each other. The main feature of these systems is the absence of a global timing reference and the use of several distinct local clocks (or clock domains), possibly running at different frequencies. In the case of high-end core processors, global clock distribution issues are perhaps the best motivating factor for the study of GALS systems: with each technology shrink, the clock distribution network of a large chip grows rapidly in complexity and requires large design effort, power consumption and die area.

As opposed to fully synchronous processors, minimally clocked processors offer the advantage of fine-grain control of local clock speeds and voltages, thus providing additional power savings capabilities, under a wide variety of applications and workloads.

Outline:

Trends and issues in clock distribution (power consumption, clock skew, cost for deskewing circuits)
How much asynchrony do we want? (success and failure stories in fully asynchronous processor design)
Motivation for minimally clocked/GALS processors
Issues in GALS/minimally clocked machines (synchronization issues, deadlock prevention, possible inter-clocking domain communication schemes)
GALS processors
- power/performance evaluation and workload characterization
- potential for fine grain speed/voltage scaling
Case study - LPX, an IPCMOS based processor (IBM)
Ahead - where would a GALS/minimally clocked design style be useful?

Presenters' bio's:

Diana Marculescu is an Assistant Professor of ECE at Carnegie Mellon University. She has received her Ph.D. in Computer Engineering in 1998 from University of Southern California and her M.S. in Computer Science from "Politehnica" University of Bucharest in 1991. After spending 2 years at University of Maryland, Dr. Marculescu has joined Carnegie Mellon University where she is currently leading the Energy Aware Computing (EnyAC) group focusing on techniques and tools for enabling synergistic hardware/software power management and novel paradigms for energy-delay efficient computing. Diana Marculescu is a recipient of a National Science Foundation CAREER Award (2000-2004) and a member of the organizing committee of the ACM/IEEE International Symposium on Low Power Electronics and Design. She also serves on the technical program committee of several conferences, including IEEE/ACM International Conference on Computer-Aided Design and IEEE Design, Automation and Test in Europe Conference. Her research interests are in the area of energy aware computing, VLSI, computer architecture and CAD for power modeling and estimation.

David H. Albonesi is an Associate Professor of Electrical and Computer Engineering at the University of Rochester and Director of the Advanced Computer Architecture Laboratory. He received his B.S.E.E. from the University of Massachusetts Amherst in 1982, his M.S.E.E. from Syracuse University in 1986, and his Ph.D. in Electrical and Computer Engineering from the University of Massachusetts Amherst in 1996. Prior to receiving his Ph.D., he held technical and management leadership positions for 10 years at IBM Corporation (1982-86) and Prime Computer, Incorporated (1986-1992). The primary focus of his industry work was on the design, implementation, and debugging of low-latency, high-bandwidth memory hierarchies for high performance processors, the development of shared memory multiprocessor systems, and the development and application of architectural evaluation, design implementation, and hardware emulation tools. For this work, he received three corporate excellence awards and four U.S. patents. At Rochester, he leads the Complexity-Adaptive Processing (CAP) project and is also conducting research in understanding and improving dynamic branch prediction, multithreaded architectures, and VLIW architectures for voice and video applications. Dr. Albonesi has received a National Science Foundation CAREER Award and an IBM Faculty Partnership Award. He co-founded the Workshop on Complexity-Effective Design that was initially held at the 27th International Symposium on Computer Architecture, was held last year at ISCA-28, and will be held again this year at ISCA-29.

Pradip Bose received his B.Tech degree in Electronics and Electrical Communication Engineering from the Indian Institute of Technology, Kharagpur, India in 1977 and the M.S. and Ph.D degrees in Electrical and Computer Engineering from the University of Illinois, Urbana-Champaign, in 1981 and 1983 respectively. Since May 1983, Dr. Bose has been a Research Staff Member at the IBM T. J. Watson Research Center, Yorktown Heights, NY. During this time, Dr. Bose has conducted research projects that led to well-known IBM products such as RS/6000 and POWER3. Between 1989-1990 he has led the UNDP (United Nations Development Program) funded program to establish a Center for Advanced Research on Fifth Generation Computer Systems at Indian Statistical Institute (ISI), Calcutta, India, as part of his assignment as a Visiting Associate Professor at ISI. His current research interests include: high performance, low power computer architectures and their performance evaluation, verification and testing. Dr. Bose has over 60 refereed publications and is the author of a book by MIT Press (to appear in late 2002). He is active in many conference committees and is a senior member of IEEE; in 2001-2002, he was Program Chair of IEEE Int'l. Symp. on Performance Analysis of Systems and Software (ISPASS), and he is a member of the program committees of MICRO-35 and HPCA-9. His most recent conference tutorials include offerings (with other co-speakers) at ISCA-2001, HPCA-2001 and Sigmetrics-2001.

Tutorial 5

Energy Management for Server Clusters

Ram Rajamony, IBM Austin Research Laboratory

Intended Audience:

This tutorial is intended for researchers, professionals, and students engaged in designing, developing, and using energy-efficient compute and storage clusters.

Abstract:

Power consumption is rapidly becoming a key design issue for servers deployed in large data centers and web hosting facilities. In fact, a significant fraction of the operation cost of these centers is due to power consumption and cooling. Computing nodes in these densely packed systems also often overheat, leading to intermittent failures. These problems are likely to worsen as newer server-class processors offer higher levels of performance at the expense of increased power consumption.

Energy conservation techniques have traditionally focused on single-node systems, be they portable and mobile computers, or single-node servers. However, most data center users employ clusters of servers for scalability, reliability, and cost considerations. Consequently, energy management techniques in this environment must take a holistic approach. This tutorial will present an in-depth look at techniques for energy management in server clusters, composed of both compute and storage nodes.

While we will provide a brief introduction to single-node energy management techniques, the bulk of the tutorial will focus on mechanisms and policies for energy management in clusters. These mechanisms and policies will be discussed with an emphasis on practicality. The tutorial will include case studies where we will examine how to put together clusters that meet a performance and energy budget, along with workloads for evaluating clusters, and metrics for measuring energy/performance tradeoffs. The tutorial will conclude with an overview of research activities in cluster energy management in industry, research labs, and academia.

Presenter's bio:

Ram Rajamony is a researcher at the Low-Power Computing Research Center at IBM's Austin Research Laboratory. His research interests are in energy-efficient computing, high-performance computing, networking, and operating sytems. He has published papers in venues such as ISCA, HPCA, PPoPP, and PACT. He won the Best Student Paper award at SIGMETRICS in 1998. Dr. Rajamony holds one patent and has many more pending at the USPTO. He served on the Texas Advanced Technology Program committee in 1999 and the SAN-2001 program committee. Dr. Rajamony received his PhD from Rice University in 1998.

Webmaster: ics02@tc.cornell.edu

Workshops and Tutorials to be held in conjunction with ICS'02

Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN) http://www.cse.psu.edu/~yyzhang/shaman/

held in conjunction with 16th Annual ACM International Conference on Supercomputing New York City, NY, June 23rd, 2002 Organized by

Fourth Annual Workshop on Java for High Performance Computers (JHPC'02)

http://www.philippsen.com/events/jhpc.html

Second Workshop on Caching, Coherence and Consistency (WC3 '02)

http://www.cs.rutgers.edu/~wc3/

New York, NY, USA

16th Annual ACM International Conference on Supercomputing (ICS 2002)

Performance Optimization via High-Level Languages and Libraries

http://www.ece.lsu.edu/jxr/ics02workshop.html

Tutorial 1

InfiniBand Architecture: Where is it Headed and What will be the Impact on High Performance Computing?

Dhabaleswar K. Panda, The Ohio State University

Tutorial 2

Performance Analysis and Prediction for Large-Scale Scientific Applications

http://public.lanl.gov/hjw/TUT

Adolfy Hoisie and Harvey Wasserman Los Alamos National Laboratory

Tutorial 3

Embedded wireless networking using Bluetooth and 802.11: state-of-the-art and research challenges

Pravin Bhagwat (IIT Kanpur & Winlab, Rutgers University)

Tutorial 4

Minimally clocked microprocessor design

Diana Marculescu (CMU), David Albonesi (Rochester), Pradip Bose (IBM)

Tutorial 5

Energy Management for Server Clusters

Ram Rajamony, IBM Austin Research Laboratory

Workshops and Tutorials
to be held in conjunction with
ICS'02

Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN)
http://www.cse.psu.edu/~yyzhang/shaman/

held in conjunction with
16th Annual ACM International Conference on Supercomputing
New York City, NY, June 23rd, 2002

Organized by

Adolfy Hoisie and Harvey Wasserman
Los Alamos National Laboratory

Embedded wireless networking using Bluetooth and 802.11:
state-of-the-art and research challenges