Gradient Descent Underfitting/Overfitting Demo

Here we'll consider a simple example of linear regression with polynomial features, because it's the easiest to visualize. We'll use the following generative model. $$ x \sim \operatorname{Unif}[-1,1] \hspace{2em} \phi_k(x) = x^{k-1} \hspace{2em} y \mid x \sim \mathcal{N}(0.8 \cdot \tanh(2x), \sigma^2)$$ for $\phi: [-1,1] \rightarrow \mathbb{R}^d$ being our feature map, $w_{\text{gen}} \in \mathbb{R}^d$ being a parameter vector, and $\mathcal{N}(\mu, \sigma^2)$ denoting the Gaussian distribution with mean $\mu$ and variance $\sigma^2$. We'll start by setting $\sigma = 0.3$, $d = 12$, and make the training set size $10$ and the validation set size $30$.

This diagram visualizes what is going on. On the left, we plot the dataset (with the training set in blue and the validation set in red) along with the predictor $\hat y = h(x)$ in green after a number of steps $t$ of linear regression gradient descent. On the right, we plot the training loss, validation loss, and expected loss over the source distribution, against the number of gradient steps $t$ on the $x$-axis. Every time you refresh the page, you get a new dataset. You can also click-and-drag the points to change the dataset.

Train loss: Validation loss: