Processing math: 100%

Linear Regression

previous
next
back

In this lecture we will learn about Linear Regression.

Assumptions

Data Assumption: yiR
Model Assumption: yi=wxi+ϵi where ϵiN(0,σ2)
yi|xiN(wxi,σ2)P(yi|xi,w)=12πσ2e(xiwyi)22σ2

Estimating with MLE

w=argmaxwni=1log(P(yi|xi,w))=argmaxwni=1[log(12πσ2)+log(e(xiwyi)22σ2)]=argmaxw12σ2ni=1(xiwyi)2=argminw1nni=1(xiwyi)2

The loss is thus l(w)=1nni=1(xiwyi)2 AKA square loss or Ordinary Least Squares (OLS). OLS can be optimized with gradient descent, Newton's method, or in closed form.

Closed Form: w=(XX)1Xy

Estimating with MAP

Additional Model Assumption: P(w)=12πτ2eww2τ2
w=argmaxwP(w|y1,x1,...,yn,xn)=argmaxwP(y1,x1,...,yn,xn|w)P(w)P(y1,x1,...,yn,xn)=argmaxwP(y1,...,yn|x1,...,xn,w)P(x1,...,xn|w)P(w)=argmaxwni=1P(yi|xi,w)P(w)=argmaxwni=1[logP(yi|xi,w)+logP(w)]=argminw12σ2ni=1(xiwyi)2+12τ2ww=argminw1nni=1(xiwyi)2+λ||w||22

This formulation is known as Ridge Regression. It has a closed form solution of: w=(XX+λ2I)1Xy

Summary

Ordinary Least Squares:
Ridge Regression: