Friday, April 20, 2007
12:45 pm
315 Upson Hall

Computer Science
Colloquium
Spring 2007

Ted Kremenek
Stanford University
 

From Uncertainty to Bugs: Inferring Bugs in Systems With
Static Analysis and Probabilistic Graphical Models

Software often contains many bugs.  Fortunately, in the last half decade we have seen a surge of work on automated bug-finding tools, such as those based on static program analysis, that are adroit in finding serious errors in real, multi-million line code bases.  Unfortunately, these tools are usually designed to look for bugs violating generic invariants that may manifest in many systems ( e.g. do not dereference NULL), and have no knowledge of the invariants and rules, or specification, that is specific to a given system.  The sad consequence is that these tools will miss many important bugs that they could otherwise find if they only had more knowledge of said specification.

This talk concerns the following question: how do we go about automatically finding more system-specific bugs with our bug-finding tools with as little human assistance as possible?  I will present a generic framework, based on probabilistic graphical models and static program analysis, for inferring specifications directly from systems code.  Unlike previous approaches, the technique provides a generic way of binding together the arbitrary sources of information useful for inferring specifications in a structured, non-ad hoc way.  This includes information gleamed from both analyzing the behavior of a program as well as non-numeric insights such as suggestive naming conventions, program structure, and the whole cadre of information that lies dormant in code that in practice we as coders exploit to reverse-engineer the semantics of code that is foreign to us.

By tightly binding such information together, we can infer specifications that can require an order of magnitude less data than similar approaches, and using these specifications we have found several serious bugs in code bases, such as the OS kernels for Linux and Mac OS X, that would have otherwise remained undetected.  Moreover, we illustrate how inferring specifications can be used as safety nets for our bug-finding tools, as the inferred specifications can be used not only to catch bugs in the programs we are analyzing, but also bugs in our bug-finding tools.

Ted is in the midst of completing his PhD in Computer Science at Stanford under the aegis of Dawson Engler.  His research interests intersect in the areas of program analysis, machine learning, and systems, with a current focus on techniques for automatically finding serious bugs in large and complex systems.