Up to 5 members in each group. Please include NetID and names on front page.

In this problem, we take a more rigorous look at the curse of dimensionality.

We aim to demonstrate two crucial claims, this time for normally distributed data:

- As data dimensionality increases, the distance between data points increases.
- As data dimensionality increases, this distance growth overwhelms any degree of closeness between points.

To do this, we will treat the squared Euclidean distance between two arbitrary points in a Gaussian point cloud as a single random variable, and prove certain properties of its expectation and variance. More formally, let:

$$D = ||\vec{x}-\vec{z}||^2 = \sum_{\alpha=1}^d (x_\alpha-z_\alpha)^2$$where $d$ is the dimensionality of the data, $x_\alpha$ and $z_\alpha$ are i.i.d. for all $\alpha$, and $x_\alpha,z_\alpha \sim N(0,1)$ for all $\alpha$.

It will help to keep in mind the definition of variance: $\mathbb{V}[X]=\mathbb{E}[(X-\mathbb{E}[X])^2]=\mathbb{E}[X^2]-\mathbb{E}[X]^2$.

First, calculate the expectation and variance of $(x_\alpha-z_\alpha)$ for arbitrary $\alpha$, given that $x_\alpha,z_\alpha \sim N(0,1)$. Is this distribution a Gaussian normal distribution?

Now, using (a), calculate the expectation of $D$ and simplify it to a function of $d$:

$$\mu_D^{\,} = \mathbb{E}[D] = \mathbb{E}\left[\sum_{\alpha=1}^d (x_\alpha-z_\alpha)^2\right]$$Using (a) and (b), calculate the variance of $D$ and simplify it to a function of $d$:

$$\sigma_D^2 = \mathbb{V}[D] = \mathbb{V}\left[\sum_{\alpha=1}^d (x_\alpha-z_\alpha)^2\right]$$You may use the fact that $\mathbb{E}[X^4]=3\sigma^4$ if $X \sim N(0,\sigma^2)$. (Hint: recall that $x_\alpha$ and $z_\alpha$ are i.i.d. for all $\alpha$)

We now know the expectation $\mu_D^{\,}$ and variance $\sigma_D^2$ for our squared-distance random variable $D$.

Chebyshev's Inequality states that for any draw $a$ from distribution $D$,

$$P(|\mu_D^{\,} - a| \ge k\sigma_D^{\,}) \le 1/k^2$$ $$P(|\mu_D^{\,} - a| \ge 2\sigma_D^{\,}) \le .25$$Thus, we can expect at least $75\%$ of point pairs in a Gaussian point cloud to have squared Euclidean distances within $4\sigma_D^{\,}$ of each other. Using (b) and (c), show that $$\frac{4\sigma_D^{\,}}{\mu_D^{\,}}\rightarrow 0$$ as the dimensionality $d\rightarrow \infty$.

If we assume instead that the perceptron might take more than one pass over the data set to converge, does this maximum difference increase?

How does this relate to your answer for (a)? Why do you think this is?