Download presentation

Presentation is loading. Please wait.

Published byLayne Porritt Modified over 2 years ago

1
Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1

2
2 Online Learning Tyrannosaurus rex

3
3 Online Learning Triceratops

4
4 Online Learning Tyrannosaurus rex Velocireptor

5
5 Formal Setting – Binary Classification Instances –Images, Sentences Labels –Parse tree, Names Prediction rule –Linear predictions rules Loss –No. of mistakes

6
6 Online Framework Initialize Classifier Algorithm works in rounds On round the online algorithm : – Receives an input instance – Outputs a prediction – Receives a feedback label – Computes loss – Updates the prediction rule Goal : –Suffer small cumulative loss

7
7 Why Online Learning? Fast Memory efficient - process one example at a time Simple to implement Formal guarantees – Mistake bounds Online to Batch conversions No statistical assumptions Adaptive Not as good as a well designed batch algorithms

8
8 Update Rules Online algorithms are based on an update rule which defines from (and possibly other information) Linear Classifiers : find from based on the input Some Update Rules : –P–Perceptron (Rosenblat) –A–ALMA (Gentile) –R–ROMMA (Li & Long) –N–NORMA (Kivinen et. al) –M–MIRA (Crammer & Singer) –E–EG (Littlestown and Warmuth) –B–Bregman Based (Warmuth) –N–Numerous Online Convex Programming Algorithms

9
9 Today The Perceptron Algorithm : –Agmon 1954; –Rosenblatt , –Block 1962, Novikoff 1962, –Minsky & Papert 1969, –Freund & Schapire 1999, –Blum & Dunagan 2002

10
10 The Perceptron Algorithm If No-Mistake –Do nothing If Mistake –Update Margin after update : Prediction :

11
11 Geometrical Interpretation

12
12 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Cumulative Loss Suffered by the Algorithm Sequence of Prediction Functions Cumulative Loss of Competitor

13
13 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Inequality Possibly Large Gap Regret Extra Loss Competitiveness Ratio

14
14 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Grows With T Constant Grows With T

15
15 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Best Prediction Function in hindsight for the data sequence

16
16 Remarks If the input is inseparable, then the problem of finding a separating hyperplane which attains less then M errors is NP-hard (Open hemisphere) Obtaining a zero-one loss bound with a unit competitiveness ratio is as hard as finding a constant approximating error for the Open Hemesphere problem. Bound of the number of mistakes the perceptron makes with the hinge loss of any compitetor

17
17 Definitions Any Competitor The parameters vector can be chosen using the input data The parameterized hinge loss of on True hinge loss 1-norm of hinge loss

18
18 Geometrical Assumption All examples are bounded in a ball of radius R

19
19 Perceptron’s Mistake Bound Bounds : If the sample is separable then

20
20 Proof - Intuition Two views : –The angle between and decreases with –The following sum is fixed as we make more mistakes, our solution is better [FS99, SS05]

21
21 Proof Define the potential : Bound it’s cumulative sum from above and below [C04]

22
22 Proof Bound from above : Telescopic Sum Non-Negative Zero Vector

23
23 Proof Bound From Below : –No error on t th round –Error on t th round

24
24 Proof We bound each term :

25
25 Proof Bound From Below : –No error on t th round –Error on t th round Cumulative bound :

26
26 Proof Putting both bounds together : We use first degree of freedom (and scale) : Bound :

27
27 Proof General Bound : Choose : Simple Bound : Objective of SVM

28
28 Proof Better bound : optimize the value of

29
29 Remarks Bound does not depend on dimension of the feature vector The bound holds for all sequences. It is not tight for most real world data But, there exists a setting for which it is tight – worst

30
30 Three Bounds

31
31 Separable Case Assume there exists such that for all examples Then all bounds are equivalent Perceptron makes finite number of mistakes until convergence (not necessarily to )

32
32 Separable Case – Other Quantities Use 1 st (parameterization) degree of freedom Scale the such that Define The bound becomes

33
33 Separable Case - Illustration

34
34 separable Case – Illustration The Perceptron will make more mistakes Finding a separating hyperplance is more difficult

35
35 Inseparable Case Difficult problem implies a large value of In this case the Perceptron will make a large number of mistakes

36
36 Perceptron Algorithm Extremely easy to implement Relative loss bounds for separable and inseparable cases. Minimal assumptions (not iid) Easy to convert to a well-performing batch algorithm (under iid assumptions) Quantities in bound are not compatible : no. of mistakes vs. hinge-loss. Margin of examples is ignored by update Same update for separable case and inseparable case.

37
Concluding Remarks Batch vs. Online –Two phases: Training and then Test –Single continues process Statistical Assumption –Distribution over examples –All sequences Conversions –Online -> Batch –Batch -> Online 37

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google