Computational Science and Engineering:

How to Educate the Next Generation

 

John Guckenheimer

 

Computers have had a profound impact upon science and engineering since their invention in the middle of the twentieth century. That impact continues to grow with the capacity of computers, communications networks and information repositories accessible via the web. Therefore, all disciplines need to incorporate computation into the education of their young scientists and engineers.  This “white paper” is an exploration of computational science education framed in the context of Cornell University. Its purpose is to stimulate the development of effective computational science and engineering academic programs at Cornell. The focus will be on the three largest undergraduate colleges: Agriculture, Arts and Sciences and Engineering and upon graduate education, organized through the Graduate School into Fields of Study. Throughout the document, we include engineering in the sciences that are part of computational science.

 

Computational Science

 

Computation has taken a place with experiment and theory as a mode of doing science. More powerful computers, better software and electronic data repositories all broaden the access to computational science. Individuals do not need to be expert programmers, mathematicians or computer scientists to engage in computational science any longer. However, they do need to become skilled and intelligent “users” to make effective use of computational tools. Scientific research seeks to extend the frontiers of knowledge, so computational science entails the development of new models, new ways of analyzing models and data, and new ways of interpreting the output from computations. The capability of the computational tools themselves also become increasingly important as an enabler of science at the deepest levels. In many research areas, it is impossible to write down equations or laws from which predictions can be derived analytically. A few examples illustrate this point

Scientific progress depends on our ability to construct computer models of the processes we study and the effectiveness of computational methods to extract information from these models. Evaluation of the science requires an understanding of modeling approximations and computational accuracy, and comparison of computational output with empirical data.

 

An essential part of computational science is extension of the technology it employs. This technology includes the mathematical foundations of algorithms, the programming languages in which models are implemented and the operating systems of the computers. Parallel computers having complex memory hierarchies dominate high performance computing and multi-core chips are now bringing parallel computation to the desktop. Achieving computational efficiency on these machines is a technical challenge that influences the science that will be accomplished. The creation of high quality software for the solution of numerical problems remains a difficult challenge. Everywhere we seek to push the frontiers of scientific computation, we discover new limitations of current methods. Thus computational science requires a spectrum of  activities that includes algorithmic research, software development, modeling, data collection and curation of data repositories in addition to computing itself. This set of activities is not aligned with the traditional departmental organization of universities and is hardly represented in the curricula that we offer students. Essential aspects of computational science need established academic homes if they are to become important part of the educational system. Computational science within the disciplines requires both new facilities and new research organizations that go beyond individual faculty members working with a few students and postdoctoral fellows. While individual investigator groups remain fundamental units for conducting scientific research, large organizations are needed to create federated data archives of high quality that are rapidly becoming essential scientific resources.

 

Large supercomputer centers were established twenty years ago to provide resources for computational science when the cost of computers was thousands of times higher than it is today. The technical staff of these centers include a substantial group of professional computational scientists who provide services to “users” but do not engage in scientific research themselves. Centers continue to provide direct support for the most demanding applications, but much of the work that once required a supercomputer and technical help in using those computers is easily done at the desktop today. High performance computing centers are adapting to the remarkable power of desktop computing and the communications capability of networks. The 2003 NSF  Report,  Revolutionizing Science and Engineering through Cyber-infrastructure, (http://www.communitytechnology.org/nsf_ci_report/) presents a vision in which  centers assume increasing responsibility for services that support collaborations across entire research areas in addition to services to individual users. The development of cyberinfrastructure requires new types of organizations within scientific communities, creation of new research areas that are oriented toward technology, shifts in disciplinary boundaries and new modes of interactions among disciplines. The payoff for this change and added complexity is a qualitative change in our ability to study scientific problems of pressing social and economic importance.

 

Our educational system has responded slowly to the demands of computational science. The 2005 report Computational Science: Ensuring America’s Competitiveness in June, 2005 (http://www.nitrd.gov/pitac/) of the President’s Information Technology Advisory Committee (PITAC) calls for new structures, programs and institutional incentives to train computational scientists.   No discipline has taken primary responsibility for computational science education, and much of the evolution of computational science technologies has taken place outside the mainstream of our academic programs. Apart from the centers described above, universities have been  largely users rather than developers of computational science. The focus has been on application of existing methods to solve scientific problems more than on expansion of the suite of available methods. Moreover, mainstream educational programs offer few opportunities to develop expertise in computational science methodologies. This situation is a danger to US economic competitiveness. The importance of computational science technology to industry is hard to overstate. Computation gives companies advantages in how they design and build their products. Computational capability is a critical factor in determining winners and losers in the global economy and in national health and security as well as in the scientific marketplace of ideas. We need to act on the recommendations of the PITAC report to create mainstream programs in our universities that take responsibility for the development of the  cross-cutting “core” of computational science and for education of students in  computational science.

 

We have done little to adapt the instruction that science students receive to the growing use of computation in the conduct of science. One can argue about how much change has taken place, but the time has come for us to reexamine thoroughly courses and curricula that we teach. The basic goals of educating scientists to think analytically and  critically have not changed, but computers have changed the fundamental aspects of how we solve problems.  Simulation of complex models and computational methods for solving mathematical problems enable us to transcend the limits of what could be done “by hand” when theories that predate computers were formulated. Much of the science and mathematics that we teach today would have been formulated differently if computers had been available at the time of its discovery. Implementing appropriate changes throughout the science curriculum is a process that has only begun. Moreover, most of the classrooms we teach in were also designed before the age of computers. Computational science is best learned by doing, and we should establish environments where active learning takes place in a supervised fashion.

 

Thus, there are two goals that Cornell University should adopt with regard to computational science education:

 

 

Educating the Next Generation

 

There are several principles that can help guide the implementation of academic computational science programs.

 

Academic programs that adhere to these principles require organizations that are complementary to traditional departments. The Faculty of Computing and Information Science is an innovative structure for the support of these academic programs. The Graduate School at Cornell with its Fields also provides a framework for establishing computational science programs at the graduate level. Three current interdisciplinary Graduate Fields support computational science directly. The Graduate Field of Applied Mathematics was established in 1963 and has served for decades as a home for students whose primary focus is numerical analysis and scientific computation. Since the requirements of this Field do not mesh fully with the background and needs of students engaged in computational science in science and engineering disciplines, two new fields have been established, a Graduate Field of Computational Biology and a minor Graduate Field of Computational Science. The establishment of new fields is only the first step in ensuring that there will be curricula that meet the needs of students in computational science.

 

There have been two previous studies of computational science academic programs at Cornell. In the early 1990’s. The Theory Center prepared materials that surveyed computing research and faculty in different disciplines and highlighted recommendations for students interested in pursuing computational science. This survey concluded that existing fields were adequate to meet student demand and proposed only greater use of existing minors as a mechanism for cross disciplinary training. After the Faculty for Computing and Information Science was established, Charlie van Loan headed a Study Group in 2000-01 to impart a sense of unity to what Cornell does in computer science and engineering with coherent educational programs that serve both undergraduates and graduates. This Study Group surveyed courses in computational science at Cornell and programs at other universities. This document draws upon these efforts, in particular as benchmarks of faculty engaged in  computational science and the curricula available to students over at least the past fifteen years.

 

The evolution of academic curricula happens on a time scale of decades and is slow compared to the changes that have taken place in computational science over the past twenty five years. Incorporating computational science components into courses within established disciplines is challenging. At one level, we need to determine how computation will be included within the “foundation” courses in mathematics, basic science and programming that we now require. This can take the form of either new courses that replace or add to existing ones or to revisions of existing courses. Basic topics can be more effectively addressed at this level than in disciplinary courses whose focus is computational science methods, models and analysis for that discipline. Pedagogically, there is also a need for common mathematics and computing courses to highlight the use of abstract concepts and general methods in other disciplines. Interdisciplinary coordination of curriculum revisions is required to develop a  shared vision for both undergraduate and graduate computational science curricula. Implementing practice into computational science instruction is similar to laboratory courses in that it requires specialized teaching environments. Providing space for these facilities, maintaining equipment and developing

instructional materials require sustained institutional support that goes beyond the resources required for typical lecture courses.

 

This document can only make a cursory assessment of the state of computation in our curricula for science and engineering students. It is easier to identify courses whose primary content involves computation than to find smaller computational components of other courses. The 2000-01 study of van Loan lists three levels of courses. Basic programming courses are taught primarily by the computer science department, with smaller 100 level courses in biological and environmental engineering and in earth and atmospheric sciences addressed to students in those majors. The computer science courses include one and two credit introductory courses in unix tools and programming in the languages C, C++ as well as more extensive courses that introduce programming concepts, algorithms and data structures. These courses are taught in two “flavors” with varying emphasis upon scientific computation. Both introduce Matlab, a commonly used programming environment in computational science that are not specifically oriented toward large scale computing. The study lists nine more advanced undergraduate courses, one of which is no longer in the catalog of courses and two of which have changed numbers from 200 level to 300 level courses. Six of these courses, CEE 241, AEP 438, BTRY 421, COMS 322, COMS 421 and MATH 425 are numerical analysis and methods that teach general algorithms that are the foundation of most numerical computing. COMS 321 is a computational biology course, MAE 470 is a course on finite element calculations for mechanical and engineering design and PHYS 480/680 has evolved to a more specialized course that emphasizes computation of material properties. A few new courses in computational biology have been added to the curriculum, notably in bioinformatics and dynamic models in biology. The six general courses listed above overlap, but none builds upon another. There are no undergraduate course sequences in numerical analysis.

 

At the graduate level, Computer Science 621 (Matrix Computation), 622 (Numerical Optimization and Nonlinear Equations) and 624 (Numerical Solution of Differential Equations) form a core sequence in numerical methods for scientific computing. Currently, 621 is taught yearly in the fall  and 622 and 624 are taught in alternate years in the spring. Ten years ago, the scientific and parallel computing group within the Computer Science Department had four faculty, but Trefethen and Coleman have left Cornell and not been replaced by faculty whose core expertise is in numerical analysis.  The CIS study group identified an additional ten courses in advanced scientific computing in the 2000-01 Course Catalog, most in more specialized areas than the courses listed above. Two of these ten courses are no longer in the catalog and three of the remaining eight will not be offered in the 2005-06 academic year. The statistics with regard to “data gathering and display” are more dismal. The study group  listed  six graduate level courses in this area, of which three are no longer in the catalog and only one is being offered in the 2005-06 academic year. The group identified four courses in modeling and simulation, one of which is no longer in the catalog.  Since 2000-01, three or four new courses have been added to the ones that existed then, but the net change is negative. These data suggest that our academic programs in computational science are hardly robust.

 

The curriculum described above is inadequate to sustain a distinctive program in computational science. Analysis of faculty appointments over the decade 1995-2005 reinforces the conclusion that computational science has suffered from a lack of concerted attention at Cornell during the past decade. The Cornell curriculum needs to be enhanced to provide a strong foundation for thriving computational science programs. This raises questions about how Cornell can best use its existing faculty to support computational science programs and where additional faculty are need to sustain strong programs.

 

Interdisciplinary organizations are used to support computational science programs across the country, almost without exception. The field structure of the Graduate School and the Faculty for Computing and Information Science provide an excellent organizational structure for interdisciplinary programs, assuming adequate resources are allocated to the programs. The Faculty of CIS is an academic organization whose resources can augment those of colleges and departments for computational science programs. It can serve as the “home” for these programs, ensuring that the core areas of  computational science remain strong at Cornell and coordinating programs across “stakeholder units.”  Close cooperation among these units will be required to develop and maintain strong computational science programs and might not happen spontaneously.

 

As a starting point, we propose the following actions: