Up to 5 members in each group. Please include NetID and names on front page.

Recall that the expectation and variance of a continuous random variable $X$ are given by $$\mathbb{E}[X] = \int_{-\infty}^{\infty} x p(x) dx$$ $$\text{Var}(X) = \mathbb{E}[(X-\mathbb{E}[X])^2]$$ where $p(x)$ is the probability density function. Let $X$ and $Y$ be independent continuous random variables and let $a \in \mathbb{R}$. Using the above definitions, derive the expectation and variance of the following random variables in terms of $\mathbb{E}[X]$, $\mathbb{E}[Y]$, $\text{Var}(X)$ and $\text{Var}(Y)$:

(a) $X + a$

(b) $aX$

(c) $X + Y$

In this problem, you are going to look at a small dataset to understand various properties of *k*-NN better. Suppose there is a set of points on a two-dimensional plane from two different classes. Below are the coordinates of all points.

Points in class *Red*: (0, 1), (2, 3), (4, 4)

Points in class *Blue*: (2, 0), (5, 2), (6, 3)

(a) Draw the *k*-nearest-neighbor decision boundary for *k
* = 1. Remember that the decision boundary is defined as the line where the classification of a test point changes. Use the standard Euclidean distance between points to determine the nearest neighbors. Start by plotting the points as a two-dimensional graph. Please use the corresponding colors for points of each class (e.g, *blue and red*).

(b) If the *y*-coordinate of each point was multiplied by 5, what would happen to the *k* = 1 boundary (Draw another picture)? Explain in at most two sentences how this effect might cause problems when working with real data.

(c) The k-NN decision boundary for k = 3 is shown as below. Suppose now we have a test point at (1, 2). How would it be classied under 3-NN? Given that you can modify the 3-NN decision boundary by adding points to the training set in the diagram, what is the minimum number of points that you need to add to change the classication at (1, 2)? Show also where you need to add these points.

(d) What is the testing complexity for one instance, e.g., how long does it take for kNN to classify one point? Assume your data has dimensionality *d*, you have *n* training examples and use Euclidean distance. Assume also that you use a quick select implementation which gives you the k smallest elements of a list of length *m* in O(*m*).

In *k*-nearest-neighbor classification, the distance metric used plays a crucial role in the model's accuracy. It is often useful to consider distance metrics aside from the Euclidean distance. One class of distance metrics used in machine learning is the so-called *Mahalanobis distance*.

A function $d : \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$ is called a *distance metric* if for all $x,y,z \in \mathbb{R}^d$, it satisfies the following axioms:

(i) (non-negativity) $d(x,y) \geq 0$

(ii) (identity of indiscernibles) $d(x,y) = 0 \Leftrightarrow x = y$

(iii) (symmetry) $d(x,y) = d(y,x)$

(iv) (triangle inequality) $d(x,z) \leq d(x,y) + d(y,z)$

Let $x,y \in \mathbb{R}^d$ and let $M \in \mathbb{R}^{d \times d}$ be a positive definite matrix and define $$d_M(x,y) = \sqrt{(x-y)^\top M (x-y)}.$$

(a) Show that $d_M$ is a distance metric. (Hint: A positive definite matrix has only positive eigenvalues. Use the eigendecomposition of $M$ to show that $M = L^\top L$ for some suitable $L$.)

(b) Consider $d=2$ and $$M = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}.$$ Sketch the unit circle defined by $d_M$. You may (but not required to) use any plotting software available (e.g. Matlab).

(c) Is $d_M$ still a distance metric if $M$ is not positive definite?

(d) Is $d_M$ still a distance metric if $M$ is positive semi-definite but not positive definite?

For parts (c) and (d), either prove that $d_M$ is a distance metric or specify at least one of the axioms of a distance metric that $d_M$ does not satisfy.