Quantitative Information Flow

Lecturer: Michael Clarkson

Lecture notes by Michael Clarkson

This lecture is based on:

Michael R. Clarkson, Andrew C. Myers, and Fred B. Schneider. Belief in Information Flow. In Proceedings of the 18th IEEE Computer Security Foundations Workshop, pages 31--47, Aix-en-Provence, France, June 2005.

In the last lecture, we discussed an information flow policy called noninterference. Our discussion of it led to a rather conservative set of static analysis rules: any low output that is at all dependent on a high secrecy variable is regarded as giving away information about that high variable. Yet there are functions that give away very little information about some of their inputs. For example, encryption reveals almost no information at all, even though it violates noninterference. Thus, we seek a quantitative measure of information flow that characterizes the amount of information that is leaked from a given program, rather than whether information leaks.

As an example program, consider the password checker (PWC):

if (p == g) a = 1; else a = 0;

Variable p is a stored password, g is a guessed password supplied by the attacker using the program, and a is an authentication flag. Clearly, p must be a high confidentiality variable, so let p = H. Similarly, a = L. The label of g is trickier: it should be L because the attacker must supply g as input, but it should be H to protect the value of g from other users of the system. The problem is that our set of labels {L,H} is not rich enough to describe this kind of per-user security policy. We define g = L for now; a richer label model is used in Jif, and you might want to read about that.

In PWC, p interferes with a, so we will be unable to prove it secure using the static analysis we gave in the last lecture. Yet PWC seems to be secure under certain assumptions about the space of passwords, the way in which users choose passwords, etc., because an attacker will learn very little information about p. So in this lecture we develop a new analysis that captures these intuitions.

Uncertainty

What is information? Information is the enemy of uncertainty: by acquiring information, we reduce uncertainty. Imagine your uncertainty while playing cards. At the beginning of a game, no cards have been exposed, and (hopefully) you don't know where any of the cards are in the deck. Thus you are maximally uncertainty about the order of the cards. As soon as you are dealt a card, your uncertainty is reduced: now you know one of the cards and have gained information about the order. The study of information is properly the subject of information theory, and you can consult any introductory text to learn more.

We represent an attacker's uncertainty about programs using probability distributions on program states, which are assignments of values to some or all of the variables in a program. For example, if there were only three possible passwords {A,B,C}, then we could define three program states for PWC:

s₁ : p = A
s₂ : p = B
s₃ : p = C

An example probability distribution over these states is Pr(s₁) = 0.98, Pr(s₂) = 0.01, and Pr(s₃) = 0.01. Suppose we name this distribution d. Another way to describe d is as a table:

p	Pr
A	0.98
B	0.01
C	0.01

We can treat probability distributions as functions and write d(s₁) = 0.98, d(s₂) = 0.01, and d(s₃) = 0.01.

Recall that any probability distribution d over a set of states S = {s₁, ..., s_n} must obey two properties.

For all i, 0 ≤ d(s_i) ≤ 1
∑_i d(s_i) = 1

Beliefs

Uncertainty is always relative to who is uncertain: an attacker may be uncertain about what the stored password is, but the system executing PWC knows the password. Define the attacker's uncertainty about high inputs as his belief. Our fundamental premise is that we can quantify information flow by measuring the change, from before execution of a program to after execution, in the accuracy of an attacker's belief . So we need to characterize how programs cause attackers to revise their beliefs. Define an experiment as a protocol for interaction between an attacker and system, resulting in the attacker revising his belief. The experiment protocol and its rationale are as follows.

The attacker chooses a prebelief b.
The attacker chooses the low input L_in, and the system chooses the high input H_in.
The attacker constructs a prediction d about the low and high outputs of program S, using his prebelief and knowledge of L_in.
The system executes S, yielding the low output as an observation o for the attacker.
The attacker calculates postbelief b' as d | o, restricted to high variables.

Experiment protocol

In step 1, the attacker chooses a prebelief, which is an initial belief about the high input. In step 2, the input to S is chosen. We conservatively allow the attacker to choose all of the low inputs, and require the system (perhaps acting on behalf of other users) to choose the high inputs. In step 3, the attacker constructs a distribution d representing the probability he ascribes to all possible final program states. In doing so, he takes into account his knowledge of the program text and low input. Since he does not know the high input chosen by the system, he substitutes his prebelief; this is why the prediction is a probability distribution. In step 4, the system runs the program, producing low and high outputs. The attacker is allowed to observe only low variables, so his observation o, which is a program state, consists only of the low variables. In step 5, the attacker calculates his postbelief, which is his belief after observing the output of the program. Postbelief b' is the result of probabilistically conditioning prediction d on observed output o, defined as:

d|o (s) = d(s and o) / d(o)

Of course, if d(o) = 0, d|o is undefined. Another way to think about conditioning is as an algorithm. First, the attacker writes down prediction d, then crosses out all the rows corresponding to states that contradict observation o. Then, the attacker adds up the probabilities in the remaining rows; call this sum N. Finally, the attacker normalizes the distribution by dividing each row by N. The result is d | o. This is a distribution over program states including both high and low variables; the attacker restricts the distribution to just high variables to determine his postbelief b'.

Example 1. The attacker conducts an experiment on PWC, as follows.

The attacker chooses prebelief b:

p Pr

A 1/3

B 1/3

C 1/3
The attacker guesses that the password is A by choosing A for low input g. The system chooses C for high input p.

The attacker predicts the output as a distribution d:

g	p	a	Pr
A	A	0	0
A	A	1	1/3
A	B	0	1/3
A	B	1	0
A	C	0	1/3
A	C	1	0
B or C	A,B, or C	0 or 1	0

The attacker ascribes zero probability to any program state where g is not equal to A (the last line of the table) because he set g to A on input and PWC does not modify g. He ascribes zero probability to state (A,A,0) because this state is impossible given the program text: if g equals p, the program must set a to 1, not 0. Similarly, (A,B,0) and (A,C,0) are impossible states. This leaves states (A,A,1), (A,B,0), and (A,C,0). In the first of these, the attacker has guessed correctly; in the latter two, incorrectly. He ascribes probability 1/3 to (A,A,1) because, according to his prebelief b, there is a 1/3 probability that p=A. Similarly, he ascribes 1/3 probability to each of the other two states.

The system executes PWC, producing a = 0 as observation o.

The attacker determines postbelief b'. First, he calculates d | o. According to the algorithm given above, he removes all probability from d that is inconsistent with o, that is, any probability assigned to a state where a = 1.

g	p	a	Pr
A	A	0	0
A	A	1	~~1/3~~ 0
A	B	0	1/3
A	B	1	0
A	C	0	1/3
A	C	1	0
B or C	A,B, or C	0 or 1	0

He then normalizes the distribution. The sum of remaining probability is N = 2/3. Dividing each row by N, he obtains (all rows with 0 probability are omitted):

g	p	a	Pr
A	B	0	1/2
A	C	0	1/2

Then, restricting to high variables only, he obtains postbelief b':

p	Pr
A	0
B	1/2
C	1/2

Example 2. The attacker conducts a very similar experiment on PWC. Step 1 remains the same as Example 1, but in step 2 the attacker chooses g=C, i.e., he guesses correctly. The attacker's prediction d will therefore be:

g	p	a	Pr
C	A	0	1/3
C	A	1	0
C	B	0	1/3
C	B	1	0
C	C	0	0
C	C	1	1/3
A or B	A,B, or C	0 or 1	0

His observation in this experiment will be that a = 1. After conditioning and restricting he obtains postbelief b':

p	Pr
A	0
B	0
C	1

Accuracy

The accuracy of a belief b is the distance from b to reality, measured by some distance function D. Reality is the program state r representing the high input chosen by the system in step 2 of the experiment protocol. So we denote the error (i.e., inaccuracy) in the attacker's prebelief as D(b → r). Similarly, the error in the attacker's postbelief is denoted D(b'→ r). The difference between these two quantities is the improvement, due to the experiment, in the accuracy of the attacker's belief. So we define the amount of information flow Q due to an experiment as:

Q = D(b → r) - D(b'→ r)

Distance function D could be defined in many ways; here we explore one possibility:

D(b→ r) = -log₂ b(r)

With this definition, the formula for Q simplifies to:

Q = log₂ (b'(r) / b(r))

Some reasons that this is a good definition for D are:

Suppose b(r) = 1, that is, the attacker (correctly) believes that reality is the only possible state. Then D(b → r) = -log₂ 1 = 0, indicating that there is no error in the attacker's belief.
Suppose that the set of program states is S = {s₁, ..., s_n}, where n = 2^k for some k. Note that a computer needs k bits of memory to store a single value from a set of this size. Also suppose that the attacker's prebelief b is that all states are equally likely, i.e., he has no information about reality. Then, for all s, b(s) = 1/2^k. So the error in the attacker's prebelief is D(b → r) = -log₂ 2^k = k. Thus, the uninformed attacker's distance from reality is the same as the number of bits a computer would need to store a value representing reality. So we say that the unit of measurement corresponding to our definition of D is bits.

We can construct the following table:

b(r)	D(b→ r)
0	∞
...	...
1/8	3
1/4	2
1/2	1
1	0

Observe that the distance from a belief to reality decreases by one when the probability ascribed to reality doubles. So we can interpret one bit of information as a doubling in the probability of reality. When an attacker learns one bit of information, he becomes twice as certain about r being the real high input.

Example 1, continued. How much information did the attacker learn in the experiment? We calculate:

Q	=	log₂ (b'(r) / b(r))
	=	log₂ ((1/2) / (1/3))
	=	log₂ (3/2)
	≅	0.6

The attacker learns 0.6 bits when he guesses incorrectly.

Example 2, continued. Again, we calculate the amount of information flow:

Q	=	log₂ (b'(r) / b(r))
	=	log₂ (1 / (1/3))
	=	log₂ 3
	≅	1.6

The attacker learns 1.6 bits when he guesses correctly.

Example 3. Suppose that we continue Example 1. The attacker's postbelief in that experiment was:

p	Pr
A	0
B	1/2
C	1/2

Now, let the attacker conduct a new experiment on PWC, using the postbelief from the previous experiment as his new prebelief. Following the experiment protocol (in step 2, assume the system still chooses p=C; it is irrelevant what the attacker guesses), we determine that the attacker's postbelief in the new experiment is:

p	Pr
A	0
B	0
C	1

How much information flows in the new experiment? We calculate:

Q	=	log₂ (b'(r) / b(r))
	=	log₂ (1 / (1/2))
	=	log₂ 2
	=	1

So the total amount of information flow across both experiments is 0.6 + 1 = 1.6 bits, which is the same amount of information that the attacker learned in Example 2 when he guessed correctly. This illustrates that information flow is cumulative across a sequence of experiments.