Integrating Formalism in Software Engineering

Perry Alexander

Department of Electrical & Computer Engineering and Computer Science
The University of Cincinnati
PO Box 210030, Cincinnati, OH 45221
513.556.4762 (Voice)
513.556.7326 (FAX)
perry.alexander@uc.edu

Abstract

This paper describes an approach and rational for using logic and formal methods in software engineering education. Formal methods and logic provide a mathematical basis for modeling software analogous to the role of continuous mathematics in traditional engineering disciplines. Traditional software engineering techniques provide means for modeling software development processes and structuring specifications. Neither formal methods nor traditional approaches subsume the other, but are complimentary in software engineering education and practice. The course described here is a part of the standard Computer Engineering curriculum at the University of Cincinnati. It has been taught on several occasions each resulting in very positive outcomes.

1.0 Introduction

The current state-of-the-practice in software engineering education is based on qualitative methods dealing primarily with process-oriented models. Software system models traditionally taught in structured (dataflow diagrams and data dictionaries) and object oriented approaches (object diagrams, behavioral and function diagrams) provide little predictive capability. The resulting models cannot be verified with respect to functional requirements or constraints, an activity common among most traditional engineering disciplines. Only structural characteristics such as type checking, interface verification and consistency are checked. In such courses, students are taught good ``design hygiene'' and are provided with little or no means for quantitatively representing and analyzing designs.

At the University of Cincinnati, a radical change in the approach used to educate software engineers is being attempted. Formal methods, specifically logic and Z [1], are being introduced in the required undergraduate course to supplement traditional material. The objectives of the course are to teach the students: (a) the role and importance of modeling; (b) mathematical tools for building and evaluating models of software components; (c) traditional software engineering methodologies; and (d) techniques for combining formal techniques with traditional software engineering methodologies.

Students are taught logic and Z with applications in the development of software component specifications. A laboratory is used to provide the students with education in more traditional topics (object-oriented analysis/design) and a pragmatic team project. The project forces the students to combine the formal and qualitative approaches using Z for program components and object-oriented techniques to structure the specification.

This paper describes the structure of the software engineering course and laboratory. First, motivation for the material is presented and the general structure of the course is outlined. Specific topics including modeling, logic and proofs, requirements and specification, and verification are discussed along with the general structure of the laboratory.

2.0 Course Structure

The heart of both the laboratory and lecture course is teaching the students: (1) what a model is; (2) how to construct models; and (3) modelings role in design. Modeling is motivated by appealing to experience in traditional engineering coursework where formalisms play an instrumental role. Students are asked to describe a circuit to perform a simple analog or digital transformation and are then asked to explain why they believe the design to be correct. Typically students cite Kirchoff's laws or Boolean Algebra to mathematically model their design. The students are then asked to do the same for a simple software system. They quickly realize they have no mathematical tools to predict the behavior of their design.

This simple experiment is intended to demonstrate to students the role that modeling plays in traditional engineering. Although trivial, the experiment motivates the need for mathematics and application to design activities. The students also realize they have been using ``formal methods'' in all of their engineering studies except computing.

When the students understand why formal modeling is needed, logic is introduced as the formal system for software. The students are taught a traditional state-based modeling paradigm where each component is described in terms of its affect on a system state. Each state change is described using an axiomatic approach in the canonical fashion. Z is introduced as a structuring mechanism applying the formalism to software component specification. Z specifications are then used to model software components at different points in the software lifecycle. Modeling software requirements and specifications is emphasized most heavily. Proofs of correctness are then used to demonstrate how one model can be shown to satisfy another. In the same way continuous mathematics and boolean algebra predicts the behavior of the example circuits, logic predicts the behavior of example specification.

2.1 Models

Modeling is presented by teaching traditional software lifecycle models and the role software product models play in their application. Several lifecycle models including waterfall, prototype and spiral, are presented. In each case, the students are shown that each lifecycle process describes a process where: (1) a more concrete model is synthesized from an abstract model; (2) the concrete model is analyzed to determine its function; and (3) the resulting behavior is evaluated with respect to the abstract model. A satisfaction relationship exists between the abstract and concrete models that defines when a concrete model is correct with respect to the abstract model.

After understanding the role models play, formal modeling is motivated by the desire to verify the satisfaction relationship. In other engineering disciplines, the satisfaction relationship can be formally verified using continuous mathematics. Without a formal basis, making comparisons cannot be done with any confidence. Thus, a formal model for software systems is needed.

2.2 Logic and Proofs

Students entering the software engineering course do not deeply understand logic nor the process of proof. This problem is complicated by their exclusive reliance on imperative computation models to describe programs. All students have had extensive exposure to C or C++ but almost no exposure to functional or declarative programming styles.

The first 2-3 weeks of the course are spent teaching basic typed predicate calculus, relations and functions, set theory and sequence theory. Students learn natural deduction techniques and how to construct traditional Hilbert style proofs. Basic proof techniques are emphasized as well as the structure of a proof activity. Special emphasis is placed on models and proofs. Modeling software components is different than the traditional engineering modeling students are accustomed to. Traditional engineers select and use models from a canonical set defined by their disciplines. Software engineers must develop models from structures at much lower abstraction levels. When analyzing traditional physical systems, one selects from a collection of several canonical models. Although new models may be continuously developing, the development process is undertaken by a relative minority of people involved in the discipline. Thus, when students use a model in traditional engineering design, problems typically result from improper use. Software modeling differs in that a model must be developed for each new system with few canonical building blocks available. The net effect is that problems manipulating models may be caused by improper use, or by errors in the model. The latter difficulty creates significant problems for many students.

The art of proving is often mysterious and difficult for students. In addition to traditional difficulties, two problems are of particular note. First, students have difficulty determining what should be proven and when proof should be performed. Given a simple model and the question ``is this correct?'', most students have difficulty stating even informally what correctness is. Second, notations for effectively recording proofs offer little insight into their development process. Most students can follow a proof presented to them, but many have difficulty synthesizing new proofs. For this reason, much time is spent in class and study sessions doing proofs rather than reading them.

2.3 Requirements and Specification

During and after the introduction of logic and proof systems, Z is presented as an application of logic to program specification and and verification. Requirements are presented as an abstract model defining ``what'' a component should do with as little reference to ``how'' as possible. Requirements indicate what the component should do without reference to implementation details. More detailed design specifications are presented as a concrete model defining an incremental step towards ``how.'' When both are expressed formally, students realize that specifications can be verified with respect to requirements. Combined with evidence of the increased cost of finding errors late in the design process, the need for formal methods becomes evident.

Students are taught to write specifications based on defining system state and how components change that state. Software component requirements and design specifications are modeled using the state-based component description model central to Z. In the Z model, one schema is used to represent the contents and invariant characteristics of system state. Schemas defining an operation specify a relation that accepts inputs and the current state and produces outputs and a next state.

When defining state, the students are taught to write minimal specifications using uninterpreted types (given sets) wherever possible. Using the canonical Z style, a state schema is defined to represent state elements. Then, state invariants are defined to describe what must always be true about states.

When defining an operation, the students are taught to write specifications for pre- and post-conditions. Using the canonical Z style, a schema is defined that: (a) references the state schema; (b) defines the interface for the operation; (c) defines preconditions over state and inputs; and (d) defines postconditions over inputs, outputs and state.

Logical inference is used to define what pre- and post-conditions indicate:

forall x:D, exists z:R . I(x) => O(x,z) (1)
Intuitively, when the precondition is true in the current state, the component is obligated to make the postcondition true in the next state. Students are taught that preconditions reference parameters and state variables in the current state only and the postconditions describe a relationship between values in the current state and next state.

Students are taught to specify only what the component must do with as little reference to how as possible. The students inclination is to write a complete, operational description. Frequent problems arise when students attempt to execute specifications as a means for understanding them. Constant reinforcement is required to help them specify declaratively. The difference between equality and assignment, incomplete specifications, and referencing an object in the current state vs. the next state all cause difficulty for some students.

Decomposition into schemas that can be combined using the schema calculus operations is also encouraged. Separating exception handling and cases into separate schemas simplifies the representation and some verification activities. Generic schemas also help reinforce abstraction and reusability.

2.4 Verification

At this point, the students are asked to think back into the lifecycle models discussed earlier and relate this process to model building tasks. Given two models representing requirements and specifications, what obligations should be proven to show satisfaction. Verifying invariants, the existence of the initial state and some correctness verification is attempted. Far more time is spent teaching the students how to determine what to prove than on the proofs themselves. For even simple problems, proofs can be extremely long and time consuming.

For pedagogy purposes, correctness is defined using a plug-in match [2] criteria as defined in Equation 2. If the high level specification is defined by Ip(x) and Op(x,z) and the low level specification by Is(x) and Os(x,z), s satisfies p if:

forall x:D, exists z:R (Ip(x) => Is(x)) & (Os(x,z) => Op(x,z)) (2)
Certainly other satisfaction criteria exist and may be more appropriate. However, plug in match is easily motivated and is grasped quickly by the students.

3.0 Laboratory Structure

In the Laboratory, traditional object oriented analysis is presented along with topics not covered in the lecture course. Students are taught basic OMT modeling techniques and use the Cadre Teamwork software tool. In addition, they are taught testing and implementation techniques, configuration management, and how to write proper design documentation.

The capstone laboratory effort combines the Z specification techniques learned in lecture with the OMT methodologies. Students use OMT methods to structure their specifications for a large, group project. Z is then used to formally describe the contents and behavior of objects and methods. Following their design activity, the students are required to partially verify their specifications and implement their design.

4.0 Successes and Difficulties

Undergraduate software engineering has been taught 4 times using the course structure described here. Prior to adopting the formal approach, a traditional software engineering course based on Pressman [3] was taught with an accompanying laboratory. Although the sample space is small, some anecdotal observations can be made.

4.1 Successes

The course has proven successful in several ways. It achieves most of its major goals and has had positive influences on other parts of our curriculum. Formal methods have begun to play a more prominent role in undergraduate and graduate courses following the software engineering course. Many new formal methods projects have been started in the department by those not traditionally teaching or using formal methods. Finally, teaching evaluations indicate that students are enjoying the material.

Students leave the undergraduate course with a basic knowledge of software engineering concepts and formal methods. Software engineering is not subsumed by formal methods, thus it is critical that formal methods not completely dominate the course material. Using the lecture and laboratory in combination, students obtain useful, integrated knowledge of both subject areas. Most of the desired goals are achieved to a large degree with the possible exception of verification.

Few students complete all verification required in the capstone project. Initially, this was viewed as problematic and reduction to a simpler project was considered. Many students commented that although they did not complete the proofs, they discovered errors in their specifications by attempting them. It is believed that the utility of formalism is strongly reinforced when students discover merits of formal specification on their own. Thus, the project's magnitude has not been reduced.

Courses later in the curriculum have adopted logical representations. Specifically, students in operating systems and compiler construction continue to use Z notations to describe their specifications. Faculty teaching those courses have sat in on the software engineering course to familiarize themselves with Z and the logic used. Most gratifying is that students are choosing to use Z and logic based techniques on their own without being forced.

The number of formal methods based research activities in the department has expanded substantially. Faculty researchers seem less hesitant to include formal methods in their research with the addition of the undergraduate and graduate courses. Formal representations are also appearing in more thesis and dissertations. Many graduate students take the undergraduate course. Even students not advised by formal methods researchers have introduced Z and Larch specifications in their graduate work.

Finally, teaching evaluations have improved for teachers involved in this activity. This result is surprising given the move from informal discussion to formal mathematics. One possible explanation is that the engineering students prefer the greater rigor. One student commented on the traditional course that ``it's more like an English or History course than an engineering course.'' More than one student expressed frustration with the qualitative nature of the old course.

4.2 Difficulties

Several difficulties still remain. Abstraction and declarative representation remain difficult to teach and motivate. Both students and faculty question the pragmatics of using formal methods. It remains difficult to relate formal methods activities to traditional software engineering activities. Social problems concerning formal methods adoption remain. Verification is not adequately covered due to student background and lack of time. Finally, tools and materials for teaching formal methods, specifically formal methods and software engineering, do not exist.

The most difficult subjects to motivate are abstraction and declarative notations. Students must constantly be reminded that they are not programming but specifying. Incomplete specifications are particularly bothersome to students. Also, the distinction between assignment and equality is a particularly difficult concept that must constantly be reinforced. Students are virtually trapped by their lack of exposure to programming paradigms other than imperative techniques.

Motivating the formalisms is an important and difficult activity. The University of Cincinnati has a mandatory cooperative education program. Thus, students have some knowledge of pragmatic engineering situations. They must be convinced that although most will not use these techniques in their co-op work, they are still useful and will augment their problem solving capabilities. The effort is becoming much easier with the addition of co-op employers who ask specifically for formal methods skills. Such inquiries convince students that jobs requiring such skills will be available to them.

A serious semantic gap exists between formal methods and software engineering terminology. There is no standard means for mapping software engineering tasks and models to formal methods tasks and models. Software engineering terminology generally exists at a much higher level of abstraction. Formal methods terminology is usually derived from the mathematical system used. The instructor must work to bring the terminology together.

Lack of materials and texts makes course preparation and organization difficult. A number of excellent traditional software engineering texts are available [3,4}. However, most present formal methods as a side issue and typically in an obfuscated manner. This is in part due to the semantic gap previously mentioned. Some good formal methods texts are beginning to become available [5,6,7]. However, none deal with the software engineering process. It becomes necessary to use two texts as well as developing vast support materials.

Traditional software engineering has entrenched itself in many US institutions. When changing from the qualitative approach, several non-technical issues were raised including: (1) difficulty of the material; (2) ABET approval; and (3) learning pragmatic skills. First, the material has not proven too difficult for students to master. This argument is almost pointless because of the mathematical rigor required in other engineering coursework. ABET approval is a pending issue as we are not scheduled for a review for two years. Looking at ABET requirements indicates that this should not be a problem. The most difficult argument to counter is that students do not see enough pragmatic skills. By this, other faculty mean use of version control and configuration management, document format and contents, coverage of CMM and other process related issues, and coverage of government standards such as 2167A. It has successfully been argued that such topics are managerial issues, represent techniques useful only in the short term, and/or represent techniques too specific to given development processes.

Not enough time is available in the ten week quarter to adequately cover verification. Because students enter the course with virtually no logic background, it is necessary to start with basic principles. Verification is the topic that is sacrificed due to lack of time. However, it is felt that giving the students sufficient background to begin thinking declaratively and formally is an important first step. Verification and refinement techniques are also presented in future classes. Additionally, the mathematics department is being asked to increase the rigor in discrete mathematics courses and formalism is being adopted earlier in the computer engineering curriculum.

Finally, tools and associated support documentation are quite lacking. Even the best public domain tools are largely inappropriate for undergraduate use. Documentation difficulties exist as well as system robustness. Most students (and some faculty) do not believe that a technique is industrial strength until reliable tool support is available. In following verification classes we have also discovered that students who would never willingly complete a proof by hand enjoy the process when supported by a tool.

5.0 Conclusions

This work briefly describes the software engineering lecture and lab course sequence based on formal methods. The general contents and structure of the course were presented with some details of material covered and motivation behind it. Successes and difficulties were also presented and discussed.

The software engineering course at the University of Cincinnati demonstrates that it is possible to teach software engineering using a formal basis. Furthermore, this can be done without sacrificing traditional software engineering topics. Evidence indicates that students recognize the importance of formal modeling and can manage the technical difficulties associated with logic and proof.

Bibliography

  1. Spivey, J. M., The Z Notation: A reference manual, Prentice Hall, New York, NY, 1992. Second edition.
  2. Moormann Zaremski, A and Wing, J., "Specification Matching of Software Components" in Proceedings of the 3rd ACM SIGSOFT Symposium on the Foundations of Software Engineering, October, 1995.
  3. Pressman, R., Software Engineering: A Practicioner's Approach, McGraw Hill, New York, NY, 1992. Third Edition.
  4. Sommerville, I., Software Engineering, Addison-Wesley, Reading, MA, 1989. Third Edition.
  5. Diller, A., Z:An Introduction to Formal Methods, John Wiley & Sons, Chichester, England, 1994. Second Edition.
  6. Jones, C. B., Systematic Software Development using VDM, Prentice Hall, New York, NY", 1990. Second Edition.
  7. Woodcock, J. and Davies, J., Using Z, Prentice Hall, New York, NY, 1996.