Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.

Similar presentations


Presentation on theme: "Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are."— Presentation transcript:

1 Online Learning Rong Jin

2 Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are received one at each time ?

3 Online Learning For t=1, 2, … T Receive an instance Predict its class label Receive the true class label Encounter loss Update the classification model

4 4 Objective Minimize the total loss Loss function Zero-One loss: Hinge loss:

5 5 Loss Functions 1 1 Zero-One Loss Hinge Loss

6 6 Restrict our discussion to linear classifier Prediction: Confidence: Linear Classifiers

7 7 Separable Set

8 8 Inseparable Sets

9 9 Why Online Learning? Fast Memory efficient - process one example at a time Simple to implement Formal guarantees – Regret/Mistake bounds Online to Batch conversions No statistical assumptions Adaptive Not as good as a well designed batch algorithms

10 10 Update Rules Online algorithms are based on an update rule which defines from (and possibly other information) Linear Classifiers : find from based on the input Some Update Rules : –P–Perceptron (Rosenblat) –A–ALMA (Gentile) –R–ROMMA (Li & Long) –N–NORMA (Kivinen et. al) –M–MIRA (Crammer & Singer) –E–EG (Littlestown and Warmuth) –B–Bregman Based (Warmuth)

11 Perceptron Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then

12 12 Geometrical Interpretation

13 Mistake Bound: Separable Case Assume the data set D is linearly separable with margin , i.e., Assume Then, the maximum number of mistakes made by the Perceptron algorithm is bounded by

14 Mistake Bound: Separable Case

15 Mistake Bound: Inseparable Case Let be the best linear classifier We measure our progress by Consider we make a mistake for

16 Mistake Bound: Inseparable Case Result 1:

17 Mistake Bound: Inseparable Case Result 2

18 Perceptron with Projection Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then

19 19 Remarks Mistake bound is measured for a sequence of classifiers Bound does not depend on dimension of the feature vector The bound holds for all sequences (no i.i.d. assumption). It is not tight for most real world data. But, it can not be further improved in general.

20 Perceptron Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then Conservative: updates the classifier only when it misclassifies

21 Aggressive Perceptron Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then

22 Regret Bound

23 Learning a Classifier The evaluation (mistake bound or regret bound) concerns a sequence of classifiers But, by the end of the day, which classifier should used ? The last? By Cross Validation ?

24 Learning with Expert Advice Learning to combine the predictions from multiple experts An ensemble of d experts: Combination weights: Combined classifier

25 Hedge Simple Case There exists one expert, denoted by, who can perfectly classify all the training examples What is your learning strategy ? Difficult case What if we don’t have such a perfect expert ?

26 Hedge Algorithm +1 -1 +1 +1

27 Hedge Algorithm Initialize For t=1, 2, … T Receive a training example Prediction If then For i=1, 2, …, d If then

28 Mistake Bound

29 Measure the progress Lower bound

30 Mistake Bound Upper bound

31 Mistake Bound Upper bound

32 Mistake Bound


Download ppt "Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are."

Similar presentations


Ads by Google