Reading: MCS 19.2-19.5

- independent RVs
expectation, linearity of expectation, variance

- review exercises:
- prove any of the claims in these notes
- constants are independent of everything
- no non-constant random variable is independent from itself
- \(E(X - E(X)) = 0\)
- variance of the sum of independent random variables is the sum of the variances

Last lecture we gave two defintions of expectation.

**Definition 1:** \(E(X) := \sum_{s ∈ S} X(s)Pr(\{s\})\)

**Definition 2:** \(E(X) := \sum_{x ∈ ℝ} x Pr(X = x)\)

**Claim:** these two definitions are equivalent

**Proof:** We can group together terms in the first sum having the same value of \(X\):

\[\sum_{s ∈ S} X(s) Pr(\{s\}) = \sum_{x ∈ ℝ} \sum_{s \mid X(s) = x} X(s)Pr(\{s\})\]

We then apply the third Kolmogorov axiom, using the fact that the events \(\{s\}\) partition \((X = x)\):

\[\cdots = \sum_{x ∈ ℝ} x \sum_{X(s) = x} Pr(\{s\}) = \sum_{x ∈ ℝ} xPr(X = x)\]

Note that \(E\) by itself is a function; it takes in random variables and gives back numbers. So the domain of \(E\) is the set of all functions with domain \(S\) and codomain \(ℝ\).

**Notation:** In general, the set of functions with domain \(A\) and codomain \(B\) is written \([A → B]\).

Therefore, \(E : [S → ℝ] → ℝ\).

**Claim:** If \(X\), \(Y\) are RVs, then \(E(X+Y) = E(X) + E(Y)\).

**Proof:** We compute:

\[ \begin{aligned} E(X + Y) &= \sum_{s} (X+Y)(s)Pr(\{s\}) && \text{by definition of $E$} \\ &= \sum_{s} (X(s) + Y(s))Pr(\{s\}) && \text{by definition of $X+Y$} \\ &= \sum_{s} X(s)Pr(\{s\}) + \sum_{s} Y(s)Pr(\{s\}) && \text{algebra} \\ &= E(X) + E(Y) && \text{by definition of $E$} \\ \end{aligned} \]

**Fact:** If \(C\) is a constant RV with value \(c\) (that is, \(C(s) = c\) for all \(s\)) then \(E(CX) = cE(X)\)

**Fact:** If \(C\) is a constant RV with value \(c\), then \(E(C) = c\).

**Proofs:** left as review exercises.

**Note:** We usually don't make the distinction between the number \(c\) and the random variable \(C\); so the above are often written \(E(cX) = cE(X)\) and \(E(c) = c\).

**Note:** The fact that \(E(X + Y) = E(X)+E(Y)\) and \(E(cX) = cE(X)\) are summarized by saying that "expectation is linear".

It is not generally the case that \(E(XY) = E(X)E(Y)\). For example, imagine a single fair coin flip, and let \(X\) be the indicator variable for the flip being heads. That is, \(S = \{h,t\}\), \(X(h) = 1\), and \(X(t) = 0\).

We see \(E(X) = 1/2\). Moreover, \(X\cdot X = X\), because \((X \cdot X)(h) = X(h)X(h) = 1\) and \((X \cdot X)(t) = X(t)X(t) = 0\).

Thus \(E(X\cdot X) = E(X) = 1/2\) but \(E(X)E(X) = 1/4\).

However, we have the following:

**Definition:** Two random variables \(X\) and \(Y\) are **independent** if the events \(X = x\) and \(Y = y\) are independent for all \(x\) and \(y\).

**Claim:** If \(X\) and \(Y\) are independent, then \(E(XY) = E(X)E(Y)\).

**Proof:** Well,

\[ \begin{aligned} E(X)E(Y) &= \left(\sum_{x} xPr(X = x)\right)\left(\sum_{y} yPr(Y = y)\right) \\ &= \sum_{x,y} xyPr(X=x)Pr(Y=y) \\ &= \sum_{x,y} xyPr(X=x \cap Y=y) && \text{since $X$ and $Y$ are independent} \\ &= \sum_{z} \sum_{x,y~with~xy=z} xyPr(X = x \cap Y = y) && \text{grouping terms} \\ &= \sum_{z} z\sum_{x,y~with~xy=z} Pr(X = x \cap Y = y) \\ \end{aligned} \] Now, the union of the events \((X = x) \cap (Y = y)\) over all \(x\) and \(y\) with \(xy = z\) is just the event \(XY = z\). Moreover, these are disjoint, so we have \[\left[\sum_{x,y~with~xy=z} Pr(X = x \cap Y = y)\right] = Pr(XY = z)\] Plugging this in gives \[E(X)E(Y) = \cdots = \sum_z zPr(XY = z) = E(XY)\] by defintion.

Variance is a measure of how spread out a distribution is. You might ask "how far are the samples from the mean, on average?". This suggests finding the expectation of the random variable \(X - E(X)\) (this is the RV describing the distance from the expected value). Unfortunately, \(E(X - E(X)) = 0\) (exercise), because \(X - E(X)\) can be positive or negative. We could imagine taking the absolute value, but it turns out to have nicer properties if we square it instead. This gives the definition of variance:

**Definition:** For a random variable \(X\), \(Var(X) = E\left((X - E(X))^2\right)\).

If \(X\) is measured in a unit (such as inches) then the variance is measured in units squared (e.g. inches squared). Thus, it is often more useful to work with the square root of the variance, which is called the **standard deviation**:

**Definition:** the **standard deviation** of \(X\) is just \(\sqrt{Var(X)}\).

The following formula for the variance is often easier to compute in practice:

**Claim:** \(Var(X) = E(X^2) - (E(X))^2\).

**Proof:** Note that random variables satisfy the normal rules of arithmetic. For example, \(X(Y + Z) = XY + XZ\). This is because they are evaluated pointwise. For example, we can show \(X(Y+Z) = XY + XZ\) as follows: \[[X(Y+Z)](s) = (X(s))((Y+Z)(s)) = X(s)(Y(s) + Z(s)) = X(s)Y(s) + X(s)Z(s) = [XY + XZ](s)\]

Using this, the proof of the claim is just algebra:

\[ \begin{aligned} Var(X) &= E((X - E(X))^2) \\ &= E(X^2 - 2XE(X) + E(X)^2) \\ &= E(X^2) - 2E(XE(X)) + E(E(X)^2) && \text{by linearity of expectation} \\ &= E(X^2) - 2E(X)^2 + E(X)^2 && \text{because $E(X)$ and $E(X)^2$ are constants} \\ &= E(X^2) - E(X)^2 \end{aligned} \]