Information Flow

Lecturer: Michael Clarkson

Lecture notes by Tom Roeder and Michael Clarkson



We have discussed various tools (cryptographic protocols, authentication logic, etc.) for ensuring secrecy, making authorization decisions, and accomplishing other security-oriented tasks.  However, these tools are just pieces of a larger puzzle: they only suffice to provide security for one component of a system, whereas we would like to enforce policies on entire systems. So far, we lack a mechanism to enforce these end-to-end policies.

For example, suppose that Alice visits a web site that plants a Trojan horse in her browser, which attempts to copy her secret data up to a public web site. Alice would like the system to enforce a policy that data she labels secret must not become public. Of course, this is harder than it seems. You might imagine a naive solution: monitor the outgoing network link from the browser, and disallow transmission of bytes that look like the secret data. However, we can't tell a priori what is secret and what isn't, in part because data can be encoded. For instance, the Trojan horse might encrypt the data, so that the bytes no longer look like Alice's data.  Or, the Trojan horse might use a different network connection that the browser's, perhaps by sending email. 

The problem becomes even worse, since data is just bits, and there are many devious methods for transferring bits.  For example, a sender (perhaps a Trojan horse) can create a file in a temporary directory, and a receiver can attempt to access this file at regular intervals. At each interval, if the attacker wants to send '1', he locks the file, and if he wants to send '0', he doesn't. The receiver can tell by trying to access the file whether or not it is locked, so we have succeeded in transmitting a single bit, and the process can be repeated to transfer data at a potentially high rate. A similar method uses system load to transfer information. The attacker uses many cycles to send a one, and uses few cycles to send a zero. This is a noisy medium for transmission, since other programs are also using the CPU, but data transfer in the presence of noise is a well-studied problem in the field of information theory. The moral is that data can be transferred by abusing mechanisms in ways that designers never intended.

All these examples use channels to transmit information. A covert channel is one in which data is encoded into a form, such as timing, not intended for information transfer.

Noninterference

Toward our goal of enforcing end-to-end security, we examine a class of policies called information flow policies. These policies restrict how information flows through a system, that is, they control the release and propagation of information. A familiar kind of information flow policy is a confidentiality policy, which specifies what information can be disclosed to whom. The goal of such a policy is to prevent information leakage, or the improper disclosure of information. Alice's policy in the example above is a confidentiality policy.

In the rest of this lecture, we consider one particular confidentiality policy, known as noninterference (NI). This policy was originally formulated in terms of groups of users in a shared system, such as Unix or a database.  Noninterference requires that the actions of high-level users have no effect on (i.e., should not interfere with) what low-level users can observe. We will adapt this policy to a programming language.

Suppose that all variables in our programming language are labeled with either H or L, which correspond to high secrecy and low secrecy, respectively. We write xH and yL to indicate that the variables have the given security label. The variable h is, by convention, a high secrecy variable; similarly, l is low secrecy. Also, we write x to denote the label of x. The label of a variable is fixed and cannot change, either in the program text or during execution.

We want to enforce the policy that information does not flow from high to low variables. Immediately, we notice that we can determine the security of two simple programs.

h = l;   // secure  
l = h;   // insecure

The first program is secure because information flows from a low secrecy variable to a high secrecy variable, which does not violate our policy.  But the second program is insecure because it does violate the policy: information flows from a high variable to a low variable. In this program, if h were initially equal to 0, then upon termination of the program, l would equal 0.  But if we changed h to initially equal 1, then l would also equal 1.  So this program violates noninterference because high-level information has an observable effect on low-level information.

More generally, we can represent the execution of the program as a box with low and high inputs and outputs. The possible information flows through the program are represented here with arrows from input variables to output. There is only one flow disallowed in the execution of this program, and that is the flow from Hin to Lout:

We define noninterference as changes in H inputs do not cause changes in L outputs. Suppose that the program, represented by the box below, satisfies noninterference. We can then consider what the possible outputs would be if Lin remained constant and Hin changed. Since the only difference in the two executions of the program is the initial values of high variables, the output values of low variables cannot be different.  Hout, however, can be changed, since noninterference does not put any restrictions on the output of high variables. 

We offer two additional examples of explicit flow, i.e., flow resulting from assignment statements.

The second example raises an interesting question: what if the attacker could observe the value of x during execution of the program? Then we would not be able to claim that this program satisfies noninterference, because a change in H input would be observable. The problem of protecting a process's memory is orthogonal to the problem of enforcing noninterference, so we will make the assumption that the values of L variables are only observable immediately before execution begins, and immediately after termination if the program terminates. The values of H variables are never observable by an attacker. 

Information flow can also result from control flow; we call this implicit flow.  Consider:

if (h) { 
  l = 1; 
} else { 
  l = 0; 
}         

This program leaks the parity of h, even though there is no direct assignment from h to l.

Enforcing Noninterference

We want to enforce noninterference while still allowing interesting interactions between low and high variables in our programs. If we were willing to be more draconian, we could consider separating programs into two halves, one which references only high variables, and one which references only low. It is clear, however, that such separated programs do not allow many useful computations, such as computing functions of both high and low data, or secret logs of low data. There are two standard ways to enforce properties of programs:

  1. At runtime, using execution monitors, virtual machines, or interpreters
  2. At compile time, using static analysis, type systems, or compiler transformations

Let us consider whether these can be used to enforce noninterference.

Runtime enforcement

An interpreter reads the instructions in a program and simulates the execution of each instruction. Consider what an interpreter should do for each of the following statements:

Static enforcement

We now show how to build a set of syntax-directed inference rules that can statically enforce noninterference. By "syntax-directed", we mean that the syntax of the program dictates which rule can be applied to a formula; these kinds of rules are easy to implement as an analysis in a compiler.  The key insight we use is that the program counter can be represented by a pseudo-variable, written pc, which tells us where in the program execution has reached. Further, the program counter is given a security label, which tells us how much information could be gained by knowing the value of the program counter. We use the notation S sif pc = C to mean "S is secure if the pc label is C". This means that the program S enforces noninterference when executed in a context where the pc label is exactly C. So S sif pc = L means "S is secure", assuming programs begin execution with pc = L.

Assignments
  1. xH = yH; sif pc = L
  2. tL = uL; sif pc = L
  3. xH = tL; sif pc = L
  4. tL = xH; sif pc = L  <--- false

According to the arguments we made when discussing runtime enforcement, the first three formulas are true and the fourth is false.  Suppose we change the pc label to H.

  1. xH = yH; sif pc = H
  2. tL = uL; sif pc = H <--- false
  3. xH = tL; sif pc = H
  4. tL = xH; sif pc = H  <--- false

Now assignment (6) becomes insecure, because an assignment to a L variable when the pc is H may violate noninterference. (Recall the example of implicit flow.) We can construct an inference rule based on these examples:

yx        pcx


x = y; sif pc = C

Thus if x = L, then pc must be L, which corresponds to our observation that we cannot assign to low variables in a high context. Similarly, if pc = H then x must be H.  

Assignment statements allowing only two variables are restrictive, so we generalize the assignment rule to any right-hand-side expression E.  An expression is any mathematical function constructed using program variables and built-in operators, such as 3*y+z.  Define the label E of expression E as the maximum label of any variable appearing in E, where the maximum is taken with respect to the ordering L ≤ H.  The label of any constant in an expression is defined to be L.  We can now construct an inference rule for assignments.

Ex        pcx


x = E; sif pc = C

What does this rule say about the assignment tL = xH * 0;?  Even though the assignment does not violate noninterference, the inference rule says that it is insecure.  Thus, our analysis is conservative.  If we can derive that a program is secure, then the program satisfies noninterference; but if we cannot, then we don't know whether the program satisfies noninterference.  The problem is that our rules are too weak to recognize all noninterfering programs.  This may seem unacceptable, but you are, in fact, accustomed to such conservative analyses from normal programming languages. For example, the code

int x;
if (false)
  x = "h";

will not compile due to a type error, even though the error will never occur in any run of the program.   

If Statements

Observe that S: if xH = 1 then yL = 5; else yL = 6; violates noninterference (just as the implicit flow example did), so we should not be able to derive S sif pc = L. Consider another if statement:

if (xL + zH == 5) then yH = zH; else xL = 2;

The attacker here knows xL, and so can learn zH by observing xL. He might also learn yH. If the guard had been xL + wL = 5, however, the program would have been secure. 

If statements are secure if their subprograms are secure in the context of the label of the program counter when it reaches the subprogram. So to determine the pc label for if statements, we use the highest label in the guard. This leads to the following rule.

C' = B max C        S1 sif pc = C'        S2 sif pc = C'

if (B) S1; else S2; sif pc = C

Then, given our if statement S above, we can check to see whether S sif pc = L. We calculate that C' = H, so we need to show that yL = 5; sif pc = H and xL = 2; sif pc = H. To prove these two statements, we can try to use our inference rule for assignment statements.  Both assignments, however, fail to be derivable.  Thus we conclude that S is insecure.

Sequences

We would now like to write an inference rule that lets us decide when S1; S2 is secure. All that matters in this case is that the individual statements in the sequence are secure. We obtain the following rule.

S1 sif pc = C         S2 sif pc = C

S1; S2 sif pc = C

We might wonder whether the label on the pc must remain the same when checking both S1 and S2. In particular, is it possible for the label to change after executing S1?  In our language, the answer is no: after execution of an entire statement, the label returns to what it was before the statement. This is true because the label on the pc represents the level of information that is revealed by learning that execution has reached the program point represented by the pc.  Since none of the statements in our language can produce runtime errors or exceptions, there can be nothing learned from the fact that S1 completed execution normally (with one caveat, discussed below).

Loops

Loops are almost identical to if statements.

C' = B max C        S sif pc = C'

while (B) S; sif pc = C
 

There is an interesting quirk lurking in this rule. Consider the following code:

yL = 0;
while (xH == 0) { }
yL = 1;

Our rule says that this program is secure.  However, if this program has not terminated after a long time, then an attacker can infer that x is likely to be 0. The problem with the rule we gave is that it is termination-insensitive, that is, it ignores information flows through termination channels. Termination is a covert channel, similar to a timing channel. Whether or not this matters depends on an aspect of the execution model which, until now, we have left undefined: whether it is synchronous or asynchronous. In a synchronous model, observers have access to a clock, and also have some bound on the time which the program will require to terminate. This allows an observer to detect whether the execution has exceeded that bound, and thus entered a nonterminating state such as an infinite loop. In an asynchronous model, we assume that observers do not have access to a clock. Without a clock, an observer cannot detect nontermination. We conclude that in the asynchronous model our rule enforces noninterference, but that in the synchronous model our rule is incorrect.