Lecture 12: probability

Aside: proving facts about sets

Occasionally I assert properties of sets. For example, in the previous lecture I asserted that \(A \cup B = (A \setminus B) \cup (A \cap B) \cup (B \setminus A)\), while in today's lecture I asserted that if \(A \subseteq S\), then \(A \cup (S \setminus A) = S\).

On your homework, you may assert these kinds of properties without proof as long as:

  1. They are clearly stated.
  2. They are true
  3. They do not trivialize the problem

For example, if asked to prove that \(A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\), it is not enough to say "this is obvious", but it is fine to say that this is obvious in the context of another proof.

To show you how such a proof would go, I gave the following example:

Claim: If \(E \subseteq S\) then \(E \cup (S \setminus E) = S\).

Proof: We must show that if \(x\) is in the left hand side, then it is in the right hand side, and vice-versa; this is what it means for two sets to be equal.

First, choose an arbitrary \(x \in E \cup (S \setminus E)\). By definition of \(\cup\), either \(x \in E\) or \(x \in S \setminus E\). In the former case, since \(E \subseteq S\), we see that \(x \in S\), while in the latter case, \(x \in S\) because \(S \setminus E\) is the set of elements of \(S\) that don't appear in \(E\). In either case, \(x \in S\), completing the proof in this direction.

For the other direction, assume \(x \in S\). Then either \(x \in E\) or \(x \notin E\). In the former case, \(x \in E \cup (S \setminus E)\) by definition of \(\cup\). In the latter case, by definition of \(\setminus\), we see that \(x \in S \setminus E\), so that \(x \in E \cup (S \setminus E)\), again by definition of \(\cup\). In either case, \(x\) is in \(E \cup (S \setminus E)\), completing the proof in this direction.

Here is another example:

Claim: \(E \cap (S \setminus E) = \emptyset\).

Proof: by contradiction. Suppose \(E \cap (S \setminus E) \neq \emptyset\). Then there exists some \(x \in E \cap (S \setminus E)\). By definition of \(\cap\), we see \(x \in E\) and \(x \in S \setminus E\). By definition of \(\setminus\), we see that \(x \notin E\), but this contradicts the fact that \(x \in E\).

Definitions

I will use \(2^S\) and \(Pow(S)\) interchangably to refer to the power set of \(S\); I prefer \(2^S\) (it's shorter) but did not want to use it before we proved that \(|2^S| = 2^{|S|}\). Recall that the power set of \(S\) is the set of all subsets of \(S\).

A probability space is a set \(S\) (called the sample space) paired with a function \(Pr : 2^S → \mathbb{R}\), satisfying:

  1. for all \(E \subseteq S\), \(Pr(E) \geq 0\).
  2. \(Pr(S) = 1\).
  3. If \(E_1\) and \(E_2\) are disjoint, then \(Pr(E_1 \cup E_2) = Pr(E_1) + Pr(E_2)\).

\(Pr\) is called the probability function or probability measure.

The elements of \(S\) are called outcomes; the subsets of \(S\) are called events. Thus the probability measure assigns a (non-negative) real number to every event.

Important: The probability of \(E\) is not \(|E|/|S|\). This is true for some probability spaces, but not all. Assuming that \(Pr(E) = |E|/|S|\) will lead to incorrect answers for most problems.

Examples

To model the throw of a single six-sided die, we could choose the sample space \(S = \{1, 2, \dots, 6\}\). If we wanted to assume that all outcomes were equally likely, we could define \(Pr(E) = |E|/6\), but this is only one possible definition; we could certainly model a die with different likelihoods for different sides, which would give a different function.

There are many ways to model a throw of two dice. On possible sample space is

\[S_1 = \{1, 2, 3, \dots, 12\}\]

Another possible sample space is \(S_2 = N \times N\) where \(N = \{1,2,\dots,6\}\).

There are a few things that determine a good choice:

Another example: suppose we wanted to perform an experiment by selecting a student from the room uniformly at random and sampling their height. Possible sample spaces include:

Again, these are all perfectly reasonable ways to model the experiment (they will of course have different probability functions). However, some of them make it easier to write down the probability function.

Properties of probability spaces

Everything else that we know about probability is derived from the definition. Here are some examples:

Notation: if there is a sample space that is clear from context, I will write \(\bar{E}\) (read "\(E\) complement") for \(S \setminus E\).

Claims about probability all assume that \(S\) and \(Pr\) form a probability space; I will not explicitly write this down.

Claim: \(Pr(E) + Pr(\bar{E}) = 1\) (alternatively, \(Pr(\bar{E}) = 1 - Pr(E)\)).

Proof: By above, \(E\) and \(\bar{E}\) are disjoint, so \[\begin{aligned} Pr(E) + Pr(\bar{E}) &= Pr(E \cup \bar{E}) && \text{by rule 3} \\ &= Pr(S) && \text{since $E \cup \bar{E} = S$} \\ &= 1 && \text{by rule 2} \\ \end{aligned}\]

Claim: For all \(E\), \(Pr(E) \leq 1\).

Proof: For the sake of contradiction, suppose there were some \(E\) with \(Pr(E) > 1\). By rule 2, we know \(Pr(\bar{E}) \geq 0\). Adding these inequalities together, we see that \(Pr(E) + Pr(\bar{E}) > 1 + 0 = 1\). But by the previous claim, we know that \(Pr(E) + Pr(\bar{E}) = 1\); this is a contradiction.