Integrating Formalism in Software Engineering
Perry Alexander
Department of Electrical & Computer Engineering and Computer Science
The University of Cincinnati
PO Box 210030, Cincinnati, OH 45221-0030
513.556.4762 (Voice)
513.556.7326 (FAX)
perry.alexander@uc.edu
Abstract
This paper describes an approach and rational for using logic and
formal methods in software engineering education. Formal methods and
logic provide a mathematical basis for modeling software analogous to
the role of continuous mathematics in traditional engineering
disciplines. Traditional software engineering techniques provide
means for modeling software development processes and structuring
specifications. Neither formal methods nor traditional approaches
subsume the other, but are complimentary in software engineering
education and practice. The course described here is a part of the
standard Computer Engineering curriculum at the University of
Cincinnati. It has been taught on several occasions each resulting in
very positive outcomes.
1.0 Introduction
The current state-of-the-practice in software engineering education is
based on qualitative methods dealing primarily with process-oriented
models. Software system models traditionally taught in structured
(dataflow diagrams and data dictionaries) and object oriented
approaches (object diagrams, behavioral and function diagrams) provide
little predictive capability. The resulting models cannot be verified
with respect to functional requirements or constraints, an activity
common among most traditional engineering disciplines. Only
structural characteristics such as type checking, interface
verification and consistency are checked. In such courses, students
are taught good ``design hygiene'' and are provided with little or no
means for quantitatively representing and analyzing designs.
At the University of Cincinnati, a radical change in the approach used
to educate software engineers is being attempted. Formal methods,
specifically logic and Z [1], are being introduced in the required
undergraduate course to supplement traditional material. The
objectives of the course are to teach the students: (a) the role and
importance of modeling; (b) mathematical tools for building and
evaluating models of software components; (c) traditional software
engineering methodologies; and (d) techniques for combining formal
techniques with traditional software engineering methodologies.
Students are taught logic and Z with applications in the development
of software component specifications. A laboratory is used to provide
the students with education in more traditional topics
(object-oriented analysis/design) and a pragmatic team project. The
project forces the students to combine the formal and qualitative
approaches using Z for program components and object-oriented
techniques to structure the specification.
This paper describes the structure of the software engineering course
and laboratory. First, motivation for the material is presented and
the general structure of the course is outlined. Specific topics
including modeling, logic and proofs, requirements and specification,
and verification are discussed along with the general structure of the
laboratory.
2.0 Course Structure
The heart of both the laboratory and lecture course is teaching the
students: (1) what a model is; (2) how to construct models; and (3)
modelings role in design. Modeling is motivated by appealing to
experience in traditional engineering coursework where formalisms play
an instrumental role. Students are asked to describe a circuit to
perform a simple analog or digital transformation and are then asked
to explain why they believe the design to be correct. Typically
students cite Kirchoff's laws or Boolean Algebra to mathematically
model their design. The students are then asked to do the same for a
simple software system. They quickly realize they have no
mathematical tools to predict the behavior of their design.
This simple experiment is intended to demonstrate to students the role
that modeling plays in traditional engineering. Although trivial, the
experiment motivates the need for mathematics and application to
design activities. The students also realize they have been using
``formal methods'' in all of their engineering studies except
computing.
When the students understand why formal modeling is needed, logic is
introduced as the formal system for software. The students are taught
a traditional state-based modeling paradigm where each component is
described in terms of its affect on a system state. Each state change
is described using an axiomatic approach in the canonical fashion. Z
is introduced as a structuring mechanism applying the formalism to
software component specification. Z specifications are then used to
model software components at different points in the software
lifecycle. Modeling software requirements and specifications is
emphasized most heavily. Proofs of correctness are then used to
demonstrate how one model can be shown to satisfy another. In the
same way continuous mathematics and boolean algebra predicts the
behavior of the example circuits, logic predicts the behavior of
example specification.
2.1 Models
Modeling is presented by teaching traditional software lifecycle
models and the role software product models play in their application.
Several lifecycle models including waterfall, prototype and spiral,
are presented. In each case, the students are shown that each
lifecycle process describes a process where: (1) a more concrete model
is synthesized from an abstract model; (2) the concrete model is
analyzed to determine its function; and (3) the resulting behavior is
evaluated with respect to the abstract model. A satisfaction
relationship exists between the abstract and concrete models that
defines when a concrete model is correct with respect to the abstract
model.
After understanding the role models play, formal modeling is motivated
by the desire to verify the satisfaction relationship. In other
engineering disciplines, the satisfaction relationship can be formally
verified using continuous mathematics. Without a formal basis, making
comparisons cannot be done with any confidence. Thus, a formal model
for software systems is needed.
2.2 Logic and Proofs
Students entering the software engineering course do not deeply
understand logic nor the process of proof. This problem is
complicated by their exclusive reliance on imperative computation
models to describe programs. All students have had extensive exposure
to C or C++ but almost no exposure to functional or declarative
programming styles.
The first 2-3 weeks of the course are spent teaching basic typed
predicate calculus, relations and functions, set theory and sequence
theory. Students learn natural deduction techniques and how to
construct traditional Hilbert style proofs. Basic proof techniques
are emphasized as well as the structure of a proof activity.
Special emphasis is placed on models and proofs. Modeling software
components is different than the traditional engineering modeling
students are accustomed to. Traditional engineers select and use
models from a canonical set defined by their disciplines. Software
engineers must develop models from structures at much lower
abstraction levels. When analyzing traditional physical systems, one
selects from a collection of several canonical models. Although new
models may be continuously developing, the development process is
undertaken by a relative minority of people involved in the
discipline. Thus, when students use a model in traditional
engineering design, problems typically result from improper use.
Software modeling differs in that a model must be developed for each
new system with few canonical building blocks available. The net
effect is that problems manipulating models may be caused by improper
use, or by errors in the model. The latter difficulty creates
significant problems for many students.
The art of proving is often mysterious and difficult for students. In
addition to traditional difficulties, two problems are of particular
note. First, students have difficulty determining what should be
proven and when proof should be performed. Given a simple model and
the question ``is this correct?'', most students have difficulty
stating even informally what correctness is. Second, notations for
effectively recording proofs offer little insight into their
development process. Most students can follow a proof presented to
them, but many have difficulty synthesizing new proofs. For this
reason, much time is spent in class and study sessions doing proofs
rather than reading them.
2.3 Requirements and Specification
During and after the introduction of logic and proof systems, Z is
presented as an application of logic to program specification and and
verification. Requirements are presented as an abstract model
defining ``what'' a component should do with as little reference to
``how'' as possible. Requirements indicate what the component should
do without reference to implementation details. More detailed design
specifications are presented as a concrete model defining an
incremental step towards ``how.'' When both are expressed formally,
students realize that specifications can be verified with respect to
requirements. Combined with evidence of the increased cost of finding
errors late in the design process, the need for formal methods becomes
evident.
Students are taught to write specifications based on defining system
state and how components change that state. Software component
requirements and design specifications are modeled using the
state-based component description model central to Z. In the Z model,
one schema is used to represent the contents and invariant
characteristics of system state. Schemas defining an operation
specify a relation that accepts inputs and the current state and
produces outputs and a next state.
When defining state, the students are taught to write minimal
specifications using uninterpreted types (given sets) wherever
possible. Using the canonical Z style, a state schema is defined to
represent state elements. Then, state invariants are defined to
describe what must always be true about states.
When defining an operation, the students are taught to write
specifications for pre- and post-conditions. Using the canonical Z
style, a schema is defined that: (a) references the state schema; (b)
defines the interface for the operation; (c) defines preconditions
over state and inputs; and (d) defines postconditions over inputs,
outputs and state.
Logical inference is used to define what pre- and post-conditions
indicate:
forall x:D, exists z:R . I(x) => O(x,z) (1)
Intuitively, when the precondition is true in the current state, the
component is obligated to make the postcondition true in the next
state. Students are taught that preconditions reference parameters
and state variables in the current state only and the postconditions
describe a relationship between values in the current state and next
state.
Students are taught to specify only what the component must do with as
little reference to how as possible. The students inclination is to
write a complete, operational description. Frequent problems arise
when students attempt to execute specifications as a means for
understanding them. Constant reinforcement is required to help them
specify declaratively. The difference between equality and
assignment, incomplete specifications, and referencing an object in
the current state vs. the next state all cause difficulty for some
students.
Decomposition into schemas that can be combined using the schema
calculus operations is also encouraged. Separating exception handling
and cases into separate schemas simplifies the representation and some
verification activities. Generic schemas also help reinforce
abstraction and reusability.
2.4 Verification
At this point, the students are asked to think back into the lifecycle
models discussed earlier and relate this process to model building
tasks. Given two models representing requirements and specifications,
what obligations should be proven to show satisfaction. Verifying
invariants, the existence of the initial state and some correctness
verification is attempted. Far more time is spent teaching the
students how to determine what to prove than on the proofs themselves.
For even simple problems, proofs can be extremely long and time
consuming.
For pedagogy purposes, correctness is defined using a plug-in match
[2] criteria as defined in Equation 2. If the high level
specification is defined by Ip(x) and Op(x,z) and the low level
specification by Is(x) and Os(x,z), s satisfies p if:
forall x:D, exists z:R (Ip(x) => Is(x)) & (Os(x,z) => Op(x,z)) (2)
Certainly other satisfaction criteria exist and may be more
appropriate. However, plug in match is easily motivated and is
grasped quickly by the students.
3.0 Laboratory Structure
In the Laboratory, traditional object oriented analysis is presented
along with topics not covered in the lecture course. Students are
taught basic OMT modeling techniques and use the Cadre Teamwork
software tool. In addition, they are taught testing and
implementation techniques, configuration management, and how to write
proper design documentation.
The capstone laboratory effort combines the Z specification techniques
learned in lecture with the OMT methodologies. Students use OMT
methods to structure their specifications for a large, group project.
Z is then used to formally describe the contents and behavior of
objects and methods. Following their design activity, the students
are required to partially verify their specifications and implement
their design.
4.0 Successes and Difficulties
Undergraduate software engineering has been taught 4 times using the
course structure described here. Prior to adopting the formal
approach, a traditional software engineering course based on Pressman
[3] was taught with an accompanying laboratory. Although the sample
space is small, some anecdotal observations can be made.
4.1 Successes
The course has proven successful in several ways. It achieves most of
its major goals and has had positive influences on other parts of our
curriculum. Formal methods have begun to play a more prominent role
in undergraduate and graduate courses following the software
engineering course. Many new formal methods projects have been
started in the department by those not traditionally teaching or using
formal methods. Finally, teaching evaluations indicate that students
are enjoying the material.
Students leave the undergraduate course with a basic knowledge of
software engineering concepts and formal methods. Software
engineering is not subsumed by formal methods, thus it is critical
that formal methods not completely dominate the course material.
Using the lecture and laboratory in combination, students obtain
useful, integrated knowledge of both subject areas. Most of the
desired goals are achieved to a large degree with the possible
exception of verification.
Few students complete all verification required in the capstone
project. Initially, this was viewed as problematic and reduction to a
simpler project was considered. Many students commented that although
they did not complete the proofs, they discovered errors in their
specifications by attempting them. It is believed that the utility of
formalism is strongly reinforced when students discover merits of
formal specification on their own. Thus, the project's magnitude has
not been reduced.
Courses later in the curriculum have adopted logical representations.
Specifically, students in operating systems and compiler construction
continue to use Z notations to describe their specifications. Faculty
teaching those courses have sat in on the software engineering course
to familiarize themselves with Z and the logic used. Most gratifying
is that students are choosing to use Z and logic based techniques on
their own without being forced.
The number of formal methods based research activities in the
department has expanded substantially. Faculty researchers seem less
hesitant to include formal methods in their research with the addition
of the undergraduate and graduate courses. Formal representations are
also appearing in more thesis and dissertations. Many graduate
students take the undergraduate course. Even students not advised by
formal methods researchers have introduced Z and Larch specifications
in their graduate work.
Finally, teaching evaluations have improved for teachers involved in
this activity. This result is surprising given the move from informal
discussion to formal mathematics. One possible explanation is that
the engineering students prefer the greater rigor. One student
commented on the traditional course that ``it's more like an English
or History course than an engineering course.'' More than one student
expressed frustration with the qualitative nature of the old course.
4.2 Difficulties
Several difficulties still remain. Abstraction and declarative
representation remain difficult to teach and motivate. Both students
and faculty question the pragmatics of using formal methods. It
remains difficult to relate formal methods activities to traditional
software engineering activities. Social problems concerning formal
methods adoption remain. Verification is not adequately covered due
to student background and lack of time. Finally, tools and materials
for teaching formal methods, specifically formal methods and software
engineering, do not exist.
The most difficult subjects to motivate are abstraction and
declarative notations. Students must constantly be reminded that they
are not programming but specifying. Incomplete specifications are
particularly bothersome to students. Also, the distinction between
assignment and equality is a particularly difficult concept that must
constantly be reinforced. Students are virtually trapped by their
lack of exposure to programming paradigms other than imperative
techniques.
Motivating the formalisms is an important and difficult activity. The
University of Cincinnati has a mandatory cooperative education
program. Thus, students have some knowledge of pragmatic engineering
situations. They must be convinced that although most will not use
these techniques in their co-op work, they are still useful and will
augment their problem solving capabilities. The effort is becoming
much easier with the addition of co-op employers who ask specifically
for formal methods skills. Such inquiries convince students that jobs
requiring such skills will be available to them.
A serious semantic gap exists between formal methods and software
engineering terminology. There is no standard means for mapping
software engineering tasks and models to formal methods tasks and
models. Software engineering terminology generally exists at a much
higher level of abstraction. Formal methods terminology is usually
derived from the mathematical system used. The instructor must work
to bring the terminology together.
Lack of materials and texts makes course preparation and organization
difficult. A number of excellent traditional software engineering
texts are available [3,4}. However, most present formal methods as a
side issue and typically in an obfuscated manner. This is in part due
to the semantic gap previously mentioned. Some good formal methods
texts are beginning to become available [5,6,7]. However, none deal
with the software engineering process. It becomes necessary to use
two texts as well as developing vast support materials.
Traditional software engineering has entrenched itself in many US
institutions. When changing from the qualitative approach, several
non-technical issues were raised including: (1) difficulty of the
material; (2) ABET approval; and (3) learning pragmatic skills.
First, the material has not proven too difficult for students to
master. This argument is almost pointless because of the mathematical
rigor required in other engineering coursework. ABET approval is a
pending issue as we are not scheduled for a review for two years.
Looking at ABET requirements indicates that this should not be a
problem. The most difficult argument to counter is that students do
not see enough pragmatic skills. By this, other faculty mean use of
version control and configuration management, document format and
contents, coverage of CMM and other process related issues, and
coverage of government standards such as 2167A. It has successfully
been argued that such topics are managerial issues, represent
techniques useful only in the short term, and/or represent techniques
too specific to given development processes.
Not enough time is available in the ten week quarter to adequately
cover verification. Because students enter the course with virtually
no logic background, it is necessary to start with basic principles.
Verification is the topic that is sacrificed due to lack of time.
However, it is felt that giving the students sufficient background to
begin thinking declaratively and formally is an important first step.
Verification and refinement techniques are also presented in future
classes. Additionally, the mathematics department is being asked to
increase the rigor in discrete mathematics courses and formalism is
being adopted earlier in the computer engineering curriculum.
Finally, tools and associated support documentation are quite lacking.
Even the best public domain tools are largely inappropriate for
undergraduate use. Documentation difficulties exist as well as system
robustness. Most students (and some faculty) do not believe that a
technique is industrial strength until reliable tool support is
available. In following verification classes we have also discovered
that students who would never willingly complete a proof by hand enjoy
the process when supported by a tool.
5.0 Conclusions
This work briefly describes the software engineering lecture and lab
course sequence based on formal methods. The general contents and
structure of the course were presented with some details of material
covered and motivation behind it. Successes and difficulties were
also presented and discussed.
The software engineering course at the University of Cincinnati
demonstrates that it is possible to teach software engineering using a
formal basis. Furthermore, this can be done without sacrificing
traditional software engineering topics. Evidence indicates that
students recognize the importance of formal modeling and can manage
the technical difficulties associated with logic and proof.
Bibliography
1. Spivey, J. M., The Z Notation: A reference manual, Prentice Hall,
New York, NY, 1992. Second edition.
2. Moormann Zaremski, A and Wing, J., "Specification Matching of
Software Components" in Proceedings of the 3rd ACM SIGSOFT Symposium
on the Foundations of Software Engineering, October, 1995.
3. Pressman, R., Software Engineering: A Practicioner's Approach,
McGraw Hill, New York, NY, 1992. Third Edition.
4. Sommerville, I., Software Engineering, Addison-Wesley, Reading, MA,
1989. Third Edition.
5. Diller, A., Z:An Introduction to Formal Methods, John Wiley & Sons,
Chichester, England, 1994. Second Edition.
6. Jones, C. B., Systematic Software Development using VDM, Prentice
Hall, New York, NY", 1990. Second Edition.
7. Woodcock, J. and Davies, J., Using Z, Prentice Hall, New York, NY,
1996.