- Reading: Cameron 3.1–3.3,
- examples of Bayes' rule and the law of total probability
definitions: random variable, expectation

**Claim:** (law of total probability) If \(A_1, \dots, A_n\) partition the sample space \(S\) (that is, if \(A_i \cap A_j = \emptyset\) for \(i \neq j\) and \(S = \cup A_i\)), then

\[Pr(B) = \sum_{i} Pr(B|A_i)Pr(A_i)\]

**Proof sketch:** Write \(B = \cup_{i} (B \cap A_i)\). Apply third axiom to conclude \(Pr(B) = \sum_{i} Pr(B \cap A_i)\). Apply definition of \(Pr(B | A_i)\).

Suppose we are given a test for a condition. Let \(A\) be the event that a patient has the condition, and let \(B\) be the event that the test comes back positive.

The probability that a patient has the condition is \(Pr(A) = 1/10000\). The test has a false positive rate of \(Pr(B | \bar{A}) = 1/100\) (a false positive is when the test says "yes" despite the fact that the patient does not have the disease), and a false negative rate of \(Pr(\bar{B} | A) = 5/100\).

Suppose a patient tests positive. What is the probability that they have the disease? In other words, what is \(Pr(A|B)\)?

Bayes's rule tells us \(Pr(A|B) = \frac{Pr(B|A)Pr(A)}{Pr(B)}\). We can find \(Pr(B|A)\) using the fact from last lecture: \(Pr(B|A) = 1 - Pr(\bar{B}|A) = 95/100\). \(Pr(A)\) is given. We can use the law of total probability to find \(Pr(B)\); \(Pr(B) = Pr(B|A)Pr(A) + Pr(B|\bar{A})Pr(\bar{A})\).

Plugging everything in, we have

\[ \begin{aligned} Pr(A|B) &= \frac{Pr(B|A)Pr(A)}{Pr(B|A)Pr(A) + Pr(B|\bar{A})Pr(\bar{A})} \\ &= \frac{(95/100)(1/10000)}{(95/100)(1/10000) + (1/100)(9999/10000)} \\ &= \frac{95}{95+9999} \approx 1/100 \\ \end{aligned} \]

This is a surprising result: we take a test that fails \(\lt 5\)% of the time, and it says we have the disease, yet we have only about a 1% chance of having the disease.

However, note that our chances have grown from \(0.0001\) to \(0.01\), so we did learn quite a bit from the test.

**Definition:** A (\(\mathbb{R}\)-valued) random variable \(X\) is a function \(X : S → \mathbb{R}\).

**Definition:** The expected value of \(X\), written \(E(X)\) is given by \[E(X) ::= \sum_{k \in S} X(k)Pr(\{k\})\]

**Definition:** Given a random variable \(X\) and a real number \(x\), the poorly-named event \((X = x)\) is defined by \((X = x) ::= \{k \in S \mid X(k) = x\}\).

This definition is useful because it allows to ask "what is the probability that \(X = x\)?"

**Claim:** (alternate definition of \(E(X)\)) \[E(X) = \sum_{x \in \mathbb{R}} x\cdot Pr(X=x)\]

**Proof sketch:** this is just grouping together the terms in the original definition for the outcomes with the same \(X\) value.

**Note:** You may be concerned about "\(\sum_{x \in \mathbb{R}}\). In discrete examples, \(Pr(X = x) = 0\) almost everywhere, so this sum reduces to a finite or at least countable sum. In non-discrete example, this summation can be replaced by an integral. Measure theory is a branch of mathematics that puts this distinction on firmer theoretical footing by replacing both the summation and the integral with the so-called "Lebesgue integral". In this course, we will simply use "\(\sum\)" with the understanding that it becomes an integral when the random variable is continuous.

**Example:** Suppose I roll a fair 6-sided die. On an even roll, I win $10. On an odd roll, I lose however much money is shown. We can model the experiment (rolling a die) using the sample space \(S = \{1,2,3,4,5,6\}\) and an equiprobable measure. The result of the experiment is given by the random variable \(X : S → \mathbb{R}\) given by \(X(1) ::= -1\), \(X(2) ::= 10\), \(X(3) ::= -3\), \(X(4) ::= 10\), \(X(5) ::= -5\), and \(X(6) ::= 10\).

According to the definition,

\[ \begin{aligned} E(X) &= (1/6)X(1) + (1/6)X(2) + (1/6)X(3) + (1/6)X(4) + (1/6)X(5) + (1/6)X(6) \\ &= (1/6)(-1) + (1/6)(10) + (1/6)(-3) + (1/6)(10) + (1/6)(-5) + (1/6)(10) \\ \end{aligned} \]

According to the alternate definition, \(E(X)\) is given by

\[ \begin{aligned} E(X) &= (-1)Pr(X = -1) + (-3)Pr(X = -3) + (-5)Pr(X = -5) + 10Pr(X = 10) \\ &= (-1)(1/6) + (-3)(1/6) + (-5)(1/6) + (10)(1/6 + 1/6 + 1/6) \end{aligned} \]

**Definition:** The **probability mass function (PMF)** of \(X\) is the function \(PMF_X : \mathbb{R} → \mathbb{R}\) given by \(PMF_X(x) = Pr(X = x)\).