Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js

Lecture 3: The Perceptron

previous
next
back

Assumptions

  1. Binary classification (i.e. yi{1,+1})
  2. Data is linearly separable

Classifier

h(xi)=sign(wxi+b)
b is the bias term (without the bias term, the hyperplane that w defines would always have to go through the origin). Dealing with b can be a pain, so we 'absorb' it into the feature vector w by adding one additional constant dimension. Under this convention, xibecomes[xi1]wbecomes[wb] We can verify that [xi1][wb]=wxi+b Using this, we can simplify the above formulation of h(xi) to h(xi)=sign(wx) Observation: Note that yi(wxi)>0xiis classified correctly where 'classified correctly' means that xi is on the correct side of the hyperplane defined by w. Also, note that the left side depends on yi{1,+1} (it wouldn't work if, for example yi{0,+1}).

Perceptron Algorithm

Now that we know what the w is supposed to do (defining a hyperplane the separates the data), let's look at how we can get such w.

Perceptron Algorithm


Geometric Intuition


Quiz#1: Can you draw a visualization of a Perceptron update?
Quiz#2: How often can a Perceptron misclassify a point x repeatedly?
































Perceptron Convergence

Suppose that w such that yi(wx)>0 (xi,yi)D.

Now, suppose that we rescale each data point and the w such that ||w||=1and||xi||1xiD The Margin of a hyperplane, γ, is defined as γ=min We can visualize this as follows


Theorem: If all of the above holds, then the perceptron algorithm makes at most 1 / \gamma^2 mistakes.

Proof:
Keeping what we defined above, consider the effect of an update (\vec{w} becomes \vec{w}+y\vec{x}) on the two terms \vec{w} \cdot \vec{w}^* and \vec{w} \cdot \vec{w}. We will use two facts: