Steps Toward a World Wide Digital Library of Formal Algorithmic Knowledge

Robert Constable, Stuart Allen, Mark Bickford, James Caldwell, Jason Hickey, Christoph Kreitz.

Technical Report, Cornell University, Ithaca, NY, October 2003.


This book describes the results and plans for future work of the Cornell/ Cal Tech/Wyoming Multidisciplinary University Research Initiative (MURI) project entitled:

Building Interactive Digital Libraries of Formal Algorithmic Knowledge.

The book addresses these topics: the objectives of our research, the scientific and technical significance of the project, our technical approach and accomplishments, scenarios for use of such digital libraries, and the relevance of our work to DoD and Navy missions.

This MURI project was created as part of a Department of Defense (DoD) initiative to increase the reliability and adaptability of software. The topic is important because unreliable and insecure software poses a significant national security risk. The nation's critical software infrastructure is open to terrorist attacks, which could be coordinated with those against the population and the physical infrastructure on which they depend; simultaneous, well-planned attacks of this kind could have a devastating impact.

The liabilities of unreliable software have been recognized since the ``software crisis'' of the 1970's. The cost to the economy, and the cognitive disruptions caused by poor software, remain serious problems, now more significant because of the ubiquitous nature of software. Society's response since the 70's has included: an elaborate testing methodology, now well established but limited in effectiveness; a call for improved education, now heeded and also of limited effectiveness; a call for better software engineering and more science to support it, also heeded; and finally, modest exploration of advanced development and checking tools such as model-checkers, program analyzers, and interactive theorem provers.

In some cases, these advanced tools have become established in industry, especially in hardware design and in the front ends of compilers. Gradually the outlines of a ``verification technology'' are emerging as a result of steady DoD, NASA, and NSF funding; and there are deep results from computer science and mathematics that give it direction and justification. Now that the stakes are higher, the need to explore advanced technologies is more urgent. It is clear that the United States must lead in such a technology, and the more advanced it is the better protection it will afford against terrorists. In some ways, the protection of cyberspace is similar to the exploration of outer space; it requires very advanced science and technology, and it offers enormous strategic advantages.

The authors of this book have worked on verification tools and the science behind them for many years. We have known for some time that a key piece of technology is missing -- namely a large machine accessible information resource. The DoD also recognized this gap and our MURI project addresses the problem directly.

The important role of knowledge, and the information resources from which it arises, is clear from the behavior of the best system designers and programmers. They bring extensive knowledge to bear in their reasoning about designs and code. Large teams share knowledge, and reason in a coordinated yet distributed manner. However, the tools used to help them reason and check their inferences do not have access to significant amounts of knowledge. For instance, the interactive theorem provers that are designed to assist in reasoning have access to only a few thousand facts, a stark contrast to the computing cycles they could (but seldom) use. They should have access to hundreds of thousands of facts. The few people who drive the provers are the main source of knowledge, and they feel compelled to formalize much of it before they are comfortable using it.

The time is right to begin creating a massive information resource. First, the time is right because we need it, but additionally, the past work of multiple groups world wide has accumulated about 50,000 formal definitions and formally proved theorems. Much of the material is about data structures, algorithms, and computer systems. In addition, the Web and the emergence of information science have given us new tools for this hard task. Currently the knowledge is fragmented; no one system or one verification group has access to more than a small fraction of it. It is a terrible waste of time and huge cost for each group to try to recreate every result for themselves. For many reasons, it is not practical to limit the choices of formalism to apply. Instead, we believe that we must facilitate logically sound combinations of results, and we must be able to account for the correctness of these hybrid results. Our goal is to make it possible for people and machines to share this knowledge. Achieving this goal is a hard problem with a large payoff --- worthy of MURI funding.

The problem is hard because there are no standard parts, and we don't know how to make them. Indeed it may be that for many aspects of reasoning no unique standards will ever become established. In addition, the mechanisms for efficient machine processing of knowledge make accounting for evidence and truth difficult. It should be noted that there might be advantages to having mutliple independent logics and systems involved in checking critical arguments.

Our approach is to enable different systems to share results from distinct theories and logics, and to account for these differences precisely. We also recognize the need to connect the formal knowledge to intuitive human knowledge in transparent ways. This connection introduces mechanisms that are flexible enough to allow sharing, yet precise enough to guarantee correctness. The connection to intuition at all levels of abstraction is a safeguard, as well as an opportunity to use the formal knowledge to support education and software engineering, and thus testing as well.

This book explains our technical approach and research results from the past 27 months, and it lays out our plans for the next 30 months.

Back to overview of papers

Bibtex Entry

@TechReport{tr:Constable+03a, author = "Robert Constable and Stuart Allen and Mark Bickford and James Caldwell and Jason Hickey and Christoph Kreitz", title = "Steps Toward a World Wide Digital Library of Formal Algorithmic Knowledge", institution = "Cornell University. Department of Computer Science", year = "2003" }