We said before that a *trustworthy* system is accompanied by evidence for
claims made about its security. *Assurance* is a name for that evidence:
reasons to believe that the system really does offer the security that
its builders claim. *Evaluation* is the process of establishing that
assurance. Software testing labs that evaluate a system can *certify*
that the system meets certain criteria.
## Certification criteria
**The Orange Book.**
In 1983 the US Department of Defense (DoD) issued a document
titled *Trusted Computer System Evaluation Criteria* (TCSEC). It is
colloquially known as the Orange Book, because that was the color of
its cover. The evaluation criteria laid out in it were used circa
1985–2000 for US government systems. The criteria were grouped
into *classes*, and each class included the classes that came before,
thus becoming progressively more stringent. The lowest-level classes
mandated the use of basic security mechanisms such as authentication
and access control. The highest level required formal verification
of the system design; only a handful of systems ever achieved that
certification. The Orange Book succeeded in raising awareness of
security concerns. It arguably failed in providing evaluation criteria
that are useful today: the classes it defined were "one size fits all"
and not necessarily applicable to particular applications. Moreover,
evaluation was an expensive process that delayed development and
**The Common Criteria.**
The *Common Criteria* evolved out of evaluation criteria in Europe, Canada,
and the US, including the Orange Book. It solves the "one size fits all"
problem of the Orange Book by introducing customizable security goals and
requirements. *Protection profiles* (PPs) state the goals and requirements
for categories of applications (e.g., instant messaging). There is a
huge catalog of pre-written components that can be included in a PP.
*Security targets* (STs) customize a PP to a particular system, and provide
evidence that the system satisfies the PP. A system and its ST can be
submitted for evaluation to a lab that specializes in Common Criteria
evaluation. The lab then evaluates how strong of a case the evidence
in the ST makes for the system satisfying the PP, earning an *Evaluation
Assurance Level* (EAL). The EAL is not a measure of how secure the system
is, but rather a measure of how strong the evidence is that the system
achieves its stated security goals. Certification at higher EALs still
requires a signifcant investment of time and money.
## Verification and testing
Verification uses the program code and its specifications to argue
either formally or informally that the program satisfies all its
specification and therefore works correctly on all possible inputs. The
value of verification is that if carried out thoroughly, it produces
convincing evidence that the program really does what it is supposed to
do on all possible inputs. There are research tools available to help
with program verification, often based on automated theorem provers, as
well as research languages that are designed for program verification.
Testing involves actually executing the program on sample inputs to see
whether the behavior of is as expected. By comparing the actual results
of the program with the expected results, we find out whether the
program really works on the particular inputs we try it on. One weakness
of testing is that unless we try all possible inputs to the program, we
can't be sure that it works on all of them. This contrasts with
verification, in which we attempt to prove that the program always
works. If carried out carefully, however, testing can give us useful
assurance that the program works, at lower cost than formal or even
The above statements about verification and testing apply equally well
to both program correctness and security. From the security
perspective, there are two additional forms of testing that are commonly
employed. With *penetration testing*, experts attempt to attack a
system. This might be under lab conditions, or against an actual
deployment of the system. In the latter case, the system users might or
might not be made aware of the attempt. With *fuzz testing*, testers
feed randomly-generated inputs to the system to see what happens. If
the inputs can crash the system, then there is likely some vulnerability
that could be exploited. Generating inputs purely randomly is useful up
to a point; after that, randomly tweaking actual inputs, perhaps
captured from previous executions of the system, reveals additional
* Windows NT achieved a TCSEC C2 certification. What does C2 certification
require? What did Windows NT have to implement to achieve that certification?
What changes had to be made to the default configuration of Windows
NT to actually operate in that C2-certified mode?
* Read pp. 96–7 of the [*Common Criteria Part 2*][cc2], i.e., the
pages discussing FIA_UAU.1 through FIA_UAU.7. These are templates for
security requirements related to authentication of users. Instantiate
these templates, such that the instantiations accurately describe a
system of your choice—e.g., your bank's Internet banking system, or CMS.
How could you gain assurance that these requirements are actually
satisfied by the system?
* Critique each of the following as a basis for assurance:
- Running a system for a year on the Internet and observing no
successful attacks on it during that time.
- Paying a skilled penetration testing team to attempt to attack
a deployed system.
- Formally verifying the design of the Trusted Computing Base (TCB)
of a system.
* [FindBugs][fb] is a tool for finding bugs in Java applications,
including some security vulnerabilities. Examine the [bug descriptions][bugs]
and find the bugs in the "Security" category. Explain why the four bugs
related to databases (two about passwords, two about SQL)
constitute likely vulnerabilities.