Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perceptrons and Linear Classifiers William Cohen 2-4-2008.

Similar presentations


Presentation on theme: "Perceptrons and Linear Classifiers William Cohen 2-4-2008."— Presentation transcript:

1 Perceptrons and Linear Classifiers William Cohen 2-4-2008

2 Announcement: no office hours for William this Friday 2/8

3 Dave Touretzky’s Gallery of CSS Descramblers

4 Linear Classifiers Let’s simplify life by assuming: –Every instance is a vector of real numbers, x=(x 1,…,x n ). (Notation: boldface x is a vector.) –There are only two classes, y=(+1) and y=(-1) A linear classifier is vector w of the same dimension as x that is used to make this prediction:

5 w -W-W Visually, x · w is the distance you get if you “project x onto w” X1 x2 X1. w X2. w The line perpendicular to w divides the vectors classified as positive from the vectors classified as negative. In 3d: line  plane In 4d: plane  hyperplane …

6 w -W-W Wolfram MathWorld Mediaboost.com Geocities.com/bharatvarsha1947

7 w -W-W Notice that the separating hyperplane goes through the origin…if we don’t want this we can preprocess our examples:

8 What have we given up? +1 Outlook overcast Humidity normal

9 What have we given up? Not much! –Practically, it’s a little harder to understand a particular example (or classifier) –Practically, it’s a little harder to debug You can still express the same information You can analyze things mathematically much more easily

10 Naïve Bayes as a Linear Classifier Consider Naïve Bayes with two classes (+1, -1) and binary features (0,1).

11 Naïve Bayes as a Linear Classifier

12 “log odds”

13 Naïve Bayes as a Linear Classifier pipi qiqi

14

15 Summary: –NB is linear classifier –Weights w i have a closed form which is fairly simple, expressed in log-odds Proceedings of ECML-98, 10th European Conference on Machine Learning

16 An Even Older Linear Classifier 1957: The perceptron algorithm: Rosenblatt –WP: “A handsome bachelor, he drove a classic MGA sports car and was often seen with his cat named Tobermory. He enjoyed mixing with undergraduates, and for several years taught an interdisciplinary undergraduate honors course entitled "Theory of Brain Mechanisms" that drew students equally from Cornell's Engineering and Liberal Arts colleges…this course was a melange of ideas.. experimental brain surgery on epileptic patients while conscious, experiments on.. the visual cortex of cats,... analog and digital electronic circuits that modeled various details of neuronal behavior (i.e. the perceptron itself, as a machine).”Tobermory –Built on work of Hebbs (1949); also developed by Widrow-Hoff (1960) 1960: Perceptron Mark 1 Computer – hardware implementation

17 Bell Labs TM 59-1142-11– Datamation 1961 – April 1 1984 Special Edition of CACM

18 An Even Older Linear Classifier 1957: The perceptron algorithm: Rosenblatt –WP: “A handsome bachelor, he drove a classic MGA sports car and was often seen with his cat named Tobermory. He enjoyed mixing with undergraduates, and for several years taught an interdisciplinary undergraduate honors course entitled "Theory of Brain Mechanisms" that drew students equally from Cornell's Engineering and Liberal Arts colleges…this course was a melange of ideas.. experimental brain surgery on epileptic patients while conscious, experiments on.. the visual cortex of cats,... analog and digital electronic circuits that modeled various details of neuronal behavior (i.e. the perceptron itself, as a machine).”Tobermory –Built on work of Hebbs (1949); also developed by Widrow-Hoff (1960) 1960: Perceptron Mark 1 Computer – hardware implementation 1969: Minksky & Papert book shows perceptrons limited to linearly separable data, and Rosenblatt dies in boating accident 1970’s: learning methods for two-layer neural networks Mid-late 1980’s (Littlestone & Warmuth): mistake-bounded learning & analysis of Winnow method; early-mid 1990’s, analyses of perceptron/Widrow-Hoff

19 Experimental evaluation of Perceptron vs WH and Experts (Winnow-like methods) in SIGIR-1996 (Lewis, Schapire, Callan, Papka), and (Cohen & Singer) Freund & Schapire, 1998-1999 showed “kernel trick” and averaging/voting worked

20 The voted perceptron A B instance x i Compute: y i = sign(v k. x i ) ^ y i ^ If mistake: v k+1 = v k + y i x i

21 u -u 2γ2γ u -u-u 2γ2γ +x1+x1 v1v1 (1) A target u (2) The guess v 1 after one positive example.

22 u -u 2γ2γ u -u-u 2γ2γ v1v1 +x2+x2 v2v2 +x1+x1 v1v1 -x2-x2 v2v2 (3a) The guess v 2 after the two positive examples: v 2 =v 1 +x 2 (3b) The guess v 2 after the one positive and one negative example: v 2 =v 1 -x 2 I want to show two things: 1.The v’s get closer and closer to u: v.u increases with each mistake 2.The v’s do not get too large: v.v grows slowly

23 u -u 2γ2γ u -u-u 2γ2γ v1v1 +x2+x2 v2v2 +x1+x1 v1v1 -x2-x2 v2v2 (3a) The guess v 2 after the two positive examples: v 2 =v 1 +x 2 (3b) The guess v 2 after the one positive and one negative example: v 2 =v 1 -x 2 > γ

24 u -u 2γ2γ u -u-u 2γ2γ v1v1 +x2+x2 v2v2 +x1+x1 v1v1 -x2-x2 v2v2

25

26 On-line to batch learning 1.Pick a v k at random according to m k /m, the fraction of examples it was used for. 2.Predict using the v k you just picked. 3.(Actually, use some sort of deterministic approximation to this).

27 The voted perceptron

28 Some more comments Perceptrons are like support vector machines (SVMs) 1.SVMs search for something that looks like u: i.e., a vector w where ||w|| is small and the margin for every example is large 2.You can use “the kernel trick” with perceptrons Replace x.w with (x.w+1) d

29 Experimental Results

30 Task: classifying hand-written digits for the post office

31 More Experimental Results (Linear kernel, one pass over the data)


Download ppt "Perceptrons and Linear Classifiers William Cohen 2-4-2008."

Similar presentations


Ads by Google