Download presentation
Presentation is loading. Please wait.
Published byColleen Little Modified over 9 years ago
1
Online Learning Rong Jin
2
Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are received one at each time ?
3
Online Learning For t=1, 2, … T Receive an instance Predict its class label Receive the true class label Encounter loss Update the classification model
4
4 Objective Minimize the total loss Loss function Zero-One loss: Hinge loss:
5
5 Loss Functions 1 1 Zero-One Loss Hinge Loss
6
6 Restrict our discussion to linear classifier Prediction: Confidence: Linear Classifiers
7
7 Separable Set
8
8 Inseparable Sets
9
9 Why Online Learning? Fast Memory efficient - process one example at a time Simple to implement Formal guarantees – Regret/Mistake bounds Online to Batch conversions No statistical assumptions Adaptive Not as good as a well designed batch algorithms
10
10 Update Rules Online algorithms are based on an update rule which defines from (and possibly other information) Linear Classifiers : find from based on the input Some Update Rules : –P–Perceptron (Rosenblat) –A–ALMA (Gentile) –R–ROMMA (Li & Long) –N–NORMA (Kivinen et. al) –M–MIRA (Crammer & Singer) –E–EG (Littlestown and Warmuth) –B–Bregman Based (Warmuth)
11
Perceptron Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then
12
12 Geometrical Interpretation
13
Mistake Bound: Separable Case Assume the data set D is linearly separable with margin , i.e., Assume Then, the maximum number of mistakes made by the Perceptron algorithm is bounded by
14
Mistake Bound: Separable Case
15
Mistake Bound: Inseparable Case Let be the best linear classifier We measure our progress by Consider we make a mistake for
16
Mistake Bound: Inseparable Case Result 1:
17
Mistake Bound: Inseparable Case Result 2
18
Perceptron with Projection Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then
19
19 Remarks Mistake bound is measured for a sequence of classifiers Bound does not depend on dimension of the feature vector The bound holds for all sequences (no i.i.d. assumption). It is not tight for most real world data. But, it can not be further improved in general.
20
Perceptron Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then Conservative: updates the classifier only when it misclassifies
21
Aggressive Perceptron Initialize For t=1, 2, … T Receive an instance Predict its class label Receive the true class label If then
22
Regret Bound
23
Learning a Classifier The evaluation (mistake bound or regret bound) concerns a sequence of classifiers But, by the end of the day, which classifier should used ? The last? By Cross Validation ?
24
Learning with Expert Advice Learning to combine the predictions from multiple experts An ensemble of d experts: Combination weights: Combined classifier
25
Hedge Simple Case There exists one expert, denoted by, who can perfectly classify all the training examples What is your learning strategy ? Difficult case What if we don’t have such a perfect expert ?
26
Hedge Algorithm +1 -1 +1 +1
27
Hedge Algorithm Initialize For t=1, 2, … T Receive a training example Prediction If then For i=1, 2, …, d If then
28
Mistake Bound
29
Measure the progress Lower bound
30
Mistake Bound Upper bound
31
Mistake Bound Upper bound
32
Mistake Bound
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.