Intro to probability

We defined probability spaces, proved a few simple results, and introduced proof by induction.

Probability space

A probability space is a set S and a function P: 2^S→R satisfying the following properties:

for all E ⊆ S, P(E) ≥ 0
P(S) = 1
If E₁ and E₂ are disjoint (that is, if E₁ ∩ E₂ = ∅, then P(E₁ ∪ E₂) = P(E₁) + P(E₂).

S is called the sample space. The elements of S are called outcomes. The subsets of S are called events (so that an event is a collection of outcomes). If E is an event, P(E) is called the probability of E. The three properties listed above are called Kolmogorov's Axioms.

Simple facts about probability spaces

In this section, we assume that (S, P) is a probability space.

Claim: the probability that "nothing happens" is 0. That is, P(∅) = 0.

Proof sketch: Write S = S ∪ ∅. Apply axiom 3 (don't forget to check for disjointness!).

Claim: if E is an event, then the probability that E "doesn't happen" (that is, P(S \ E)) is 1 − P(E).

Proof sketch: Write S = E ∪ (S \ E). Apply axiom 3.

Induction

Consider the following claims and proofs:

Claim 2: If E₁ and E₂ are mutually disjoint, then
P(E₁ ∪ E₂) = P(E₁) + P(E₂)

Claim 3: If E₁, E₂, and E₃ are mutually disjoint, then
P(E₁ ∪ E₂ ∪ E₃) = P(E₁) + P(E₂) + P(E₃)

Claim 4: If E₁, E₂, E₃, and E₄ are mutually disjoint, then
P(E₁ ∪ E₂ ∪ E₃ ∪ E₄) = P(E₁) + P(E₂) + P(E₃) + P(E₄)

Claim 5: If E₁, ..., E₅ are mutually disjoint, then
P(E₁ ∪ ⋯ ∪ E₅) = P(E₁) + ⋯ + P(E₅)

Proof 2: This is just axiom 3.

Proof 3: We can add parentheses to see that E₁ ∪ E₂ ∪ E₃ = (E₁ ∪ E₂) ∪ E₃. Since E₃ is disjoint from E₁ and E₂, it must be disjoint from their union. Thus we can apply axiom 3 to conclude
P(E₁ ∪ E₂ ∪ E₃) = P(E₁ ∪ E₂) + P(E₃)
By claim 2, P(E₁ ∪ E₂) = P(E₁) + P(E₂), so we see that the right hand side is just P(E₁) + P(E₂) + P(E₃), as required.

Proof 4: We can add parentheses to see that E₁ ∪ E₂ ∪ E₃ ∪ E₄ = (E₁ ∪ E₂ ∪ E₃) ∪ E₄. Since E₄ is disjoint from E₁, E₂, and E₄, it must be disjoint from their union. Thus we can apply axiom 3 to conclude
P(E₁ ∪ E₂ ∪ E₃ ∪ E₄) = P(E₁ ∪ E₂ ∪ E₃) + P(E₄)
By claim 3, P(E₁ ∪ E₂ ∪ E₃) = P(E₁) + P(E₂) + P(E₃), so we see that the right hand side is just P(E₁) + P(E₂) + P(E₃) + P(E₄), as required.

Proof 5: We can add parentheses to see that E₁ ∪ ⋯ ∪ E₅ = (E₁ ∪ ⋯ ∪ E₄) ∪ E₅. Since E₅ is disjoint from each of E₁ ... E₄, it must be disjoint from their union. Thus we can apply axiom 3 to conclude
P(E₁ ∪ ⋯ ∪ E₅) = P(E₁ ∪ ⋯ ∪ E₄) + P(E₅)
By claim 4, P(E₁ ∪ ⋯ ∪ E₄) = P(E₁) + ⋯ + P(E₄), so we see that the right hand side is just P(E₁) + ⋯ + P(E₅), as required.

You have probably noticed that the proofs of claims 3, 4, and 5 are almost identical. You have probably concluded that for any n, you could copy and modify one of these proofs to produce a proof of claim n. The proof of claim n would rely on claim n − 1, but that's not a problem because by the time you get around to proving claim n you will already have proven claim n − 1.

In fact, your proof of claim n might look like this:

Claim n: If E₁, ..., E_n are mutually disjoint, then
P(E₁ ∪ ⋯ ∪ E_n) = P(E₁) + ⋯ + P(E_n)

Proof n: Assume that claim n-1 holds. We can add parentheses to see that E₁ ∪ ⋯ ∪ E_n = (E₁ ∪ ⋯ ∪ E_n − 1) ∪ E_n. Since E_n is disjoint from each of E₁ ... E_n − 1, it must be disjoint from their union. Thus we can apply axiom 3 to conclude
P(E₁ ∪ ⋯ ∪ E_n) = P(E₁ ∪ ⋯ ∪ E_n − 1) + P(E_n)

By claim n-1, P(E₁ ∪ ⋯ ∪ E_n − 1) = P(E₁) + ⋯ + P(E_n − 1), so we see that the right hand side is just P(E₁) + ⋯ + P(E_n), as required.

This is the core idea behind the technique of Proof by induction. The principle of induction says that if you want to prove a statement that says "for all n ≥ n₀, P(n) holds", then you need only prove two things:

P(n₀) holds, and
for an arbitrary n, if P(n − 1) is assumed then P(n) holds.

The first of these two proofs is often referred to as that base case; the second of the two proofs is often called the inductive step. The assumption that P(n − 1) holds is often called the inductive hypothesis.

Here is a complete proof by induction of the example fact discussed above.

Claim: For any n ≥ 2, if E₁, ... E_n are all mutually disjoint, then P(E₁ ∪ ⋯ ∪ E_n) = P(E₁) + ⋯ + P(E_n).

Proof: We will prove this claim by induction on n. In the base case (when n = 2), this statement is the same as axiom 3, and is thus true since S, P is a probability space.

For the inductive step, choose an arbitrary n and assume the inductive hypothesis: that whenever E₁, ..., E_n − 1 are mutually disjoint, that P(E₁ ∪ ⋯ ∪ E_n − 1) = P(E₁) + ⋯ + P(E_n − 1).

We wish to show that for the chosen n, that if E₁, ..., E_n are mutually disjoint, that P(E₁ ∪ ⋯ ∪ E_n) = P(E₁) + ⋯ + P(E_n).

We can add parentheses to see that E₁ ∪ ⋯ ∪ E_n = (E₁ ∪ ⋯ ∪ E_n − 1) ∪ E_n. Since E_n is disjoint from each of E₁ ... E_n − 1, it must be disjoint from their union. Thus we can apply axiom 3 to conclude
P(E₁ ∪ ⋯ ∪ E_n) = P(E₁ ∪ ⋯ ∪ E_n − 1) + P(E_n)
. By the inductive hypothesis,
P(E₁ ∪ ⋯ ∪ E_n − 1) = P(E₁) + ⋯ + P(E_n − 1)
so we see that the right hand side is just P(E₁) + ⋯ + P(E_n), as required.