Processing math: 100%
Linear Regression
In this lecture we will learn about Linear Regression.
Assumptions
Data Assumption: yi∈R
Model Assumption: yi=w⊤xi+ϵi where ϵi∼N(0,σ2)
⇒yi|xi∼N(w⊤xi,σ2)⇒P(yi|xi,w)=1√2πσ2e−(x⊤iw−yi)22σ2
Estimating with MLE
w=argmaxwn∑i=1log(P(yi|xi,w))=argmaxwn∑i=1[log(1√2πσ2)+log(e−(x⊤iw−yi)22σ2)]=argmaxw−12σ2n∑i=1(x⊤iw−yi)2=argminw1nn∑i=1(x⊤iw−yi)2
The loss is thus l(w)=1n∑ni=1(x⊤iw−yi)2 AKA square loss or Ordinary Least Squares (OLS). OLS can be optimized with gradient descent, Newton's method, or in closed form.
Closed Form: w=(XX⊤)−1Xy⊤
Estimating with MAP
Additional Model Assumption: P(w)=1√2πτ2e−w⊤w2τ2
w=argmaxwP(w|y1,x1,...,yn,xn)=argmaxwP(y1,x1,...,yn,xn|w)P(w)P(y1,x1,...,yn,xn)=argmaxwP(y1,...,yn|x1,...,xn,w)P(x1,...,xn|w)P(w)=argmaxwn∏i=1P(yi|xi,w)P(w)=argmaxwn∑i=1[logP(yi|xi,w)+logP(w)]=argminw12σ2n∑i=1(x⊤iw−yi)2+12τ2w⊤w=argminw1nn∑i=1(x⊤iw−yi)2+λ||w||22
This formulation is known as Ridge Regression. It has a closed form solution of: w=(XX⊤+λ2I)−1Xy⊤
Summary
Ordinary Least Squares:
- minw1n∑ni=1(x⊤iw−yi)2.
- Squared loss.
- No regularization.
- Closed form: w=(XX⊤)−1Xy⊤.
Ridge Regression:
- minw1n∑ni=1(x⊤iw−yi)2+λ||w||22.
- Squared loss.
- l2-regularization.
- Closed form: w=(XX⊤+λI)−1Xy⊤.