Presentation is loading. Please wait.

Presentation is loading. Please wait.

On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.

Similar presentations


Presentation on theme: "On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI."— Presentation transcript:

1 On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI

2 Overview 1. Introduction: online learning vs. offline learning 2. Predicting from Expert Advice  Weighted Majority Algorithm: Simple Version  Weighted Majority Algorithm: Randomized Version 3. Mistake Bound Model  Learning a Concept Class C  Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm  Learning Decision List 4. Conclusion 5. Q & A 2

3 Intro to Machine Learning  Offline Learning  Online Learning 1 WALEED ABDULWAHAB YAHYA AL-GOBI

4  Definition  More concrete Example: Task T: Prediction traffic patterns at a busy intersection. Experience E: Historical or past traffic pattern. Performance Measure P: Accuracy of predicting future traffic patterns.  Learned Model (i.e. Target function) y = h(x) 4 “ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E ” --- [Mitchell, 1997] Machine Learning | Definition

5  Offline Learning:  Learning Phase the learning algorithm is trained on a pre-defined set of learning examples to create a hypothesis.  Testing phase The hypothesis will be used to find the accurate conclusion for a given new data.  Example: MRI brain images classification 5 Machine Learning | Offline Learning vs Online Learning Training Examples  Learning Algorithm  h(x) Image Features Learned model h(x) Training Images Training Labels Training

6  Online Learning  As opposite to offline learning that finds the predictor h(x) on the entire training set at once.  Online learning algorithm is a common technique used in the areas of ML where it is computationally infeasible to train on the entire dataset all at once.  Online learning is a method of ML in which data becomes available in sequential order, and is used to update our predictor h(x) at each step. 6 Machine Learning | Offline Learning vs Online Learning

7  Examples of Online Learning  Stock Price Prediction: Here the data is generated as a function of time so online learning can dynamically adopt to new patterns in the new data.  Spam Filtering: Here the data is generated based on the output of learning algorithm (Spam Detector) so online learning can dynamically adopt to new pattern to minimize our losses. 7 Machine Learning | Offline Learning vs Online Learning

8  Online Learning: Example: Stock Price Prediction Training Examples  Learning Algorithm  h(x) …........... Training Examples  Learning Algorithm  h(x) 8 Stock prices Prediction Data Features Update hypothesis h(x) Time Receiving Truth Machine Learning | Offline Learning vs Online Learning

9 9 Offline LearningOnline Learning Two Phase Learning: How?Multi-phase Learning: How? Entire dataset given at onceOne Example given at time Learn the dataset to Construct target function h(x) Predict, Receive correct answer, Update target function h(x) at each step of learning Predict incoming new data Learning phase is separated from testing phase Learning phase is combined with testing phase Machine Learning | Offline Learning vs Online Learning

10 Predicting from Expert Advice  Basic Flow  An Example 2 WALEED ABDULWAHAB YAHYA AL-GOBI

11 Combining Expert Advice Prediction Truth Assumption: prediction ∈ {0, 1}. 11 Receiving prediction from experts Making its own prediction Being told the correct answer Predicting from Expert Advice | Basic Flow

12  Task: predicting whether it will rain today.  Input: advices of n experts ∈ {1 (yes), 0 (no)}.  Output: 1 or 0.  Goal: make the least number of mistakes. Expert 1Expert 2Expert 3Truth 21 Jan 20131011 22 Jan 20130101 23 Jan 20131011 24 Jan 20130111 25 Jan 20131011 12 Predicting from Expert Advice | An Example

13 The Weighted Majority Algorithm  Simple Version  Randomized Version 3 WALEED ABDULWAHAB YAHYA AL-GOBI

14 14 The Weighted Majority Algorithm

15 DateExpert AdviceWeight∑w i PredictionCorrect Answer x1x1 x2x2 x3x3 w1w1 w2w2 w3w3 (x i =0)(x i =1) 21 Jan 2013 22 Jan 2013 23 Jan 2013 24 Jan 2013 25 Jan 2013 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0.50 0.25 1 0.50 0.25 1 1 0.50 1 2 0.25 2 0.50 1 0.75 1 0 1 1 1 1 1 1 1 1 15 The Weighted Majority Algorithm

16  Proof:  Let M := # of mistakes made by Weight Majority algorithm. W := total weight of all experts (initially = n).  A mistaken prediction: At least ½ W predicted incorrectly.  In step 3 total weight reduced by a factor of ¼ (= ½ W x ½). W ≤ n(¾) M  Assuming the best expert made m mistakes. W ≥ ½ m  So, ½ m ≤ n(¾)M  M ≤ 2.41(m + lgn). 16 The Weighted Majority Algorithm

17 MUHAMMAD BURHAN HAFEZ Randomized Weighted Majority Algorithm (RWMA)  Simple Version  Randomized Version 4

18 18 M WMA ≤ 2.41 (m + lg n) Suppose n = 10, m = 20, and we run 100 prediction trials. M WMA = 56!!! Can we do better? The Randomized Weighted Majority Algorithm (RWMA)

19 19 1.View weights as probabilities. 1.Replace “multiply by ½” with “multiply by β”. Two modifications: The Randomized Weighted Majority Algorithm (RWMA)

20 20 The algorithm: 1.Initialize the weights w 1, …, w n of all experts to 1. 2.Given a set of predictions {x 1, …, x n } by experts, output x i with probability w i /W. 3.Receive the correct answer l and penalize each mistaken expert by multiplying its weight by β. Go-to 2. The Randomized Weighted Majority Algorithm (RWMA)

21 21 RWMA in action (β = ½ ): ExpertsE1E1 E2E2 E3E3 E4E4 E5E5 E6E6 prediction Correct answer Weights111111 Advice11000001 Weights11½½½½ Advice01111010 Weights1½¼¼¼½ The Randomized Weighted Majority Algorithm (RWMA)

22 22     On the i th trial,  Define F i to be the fraction of the total weight on the wrong answers at i th trial. Say we have seen t examples. Let M be our expected # of mistakes so far, so Mistake bound:  The Randomized Weighted Majority Algorithm (RWMA)

23 x 23     x The Randomized Weighted Majority Algorithm (RWMA)

24 24 The relation between β and M: β M ¼1.85m + 1.3 ln (n) ½1.39m + 2 ln (n) ¾1.15m + 4 ln (n) When β = ½ The simple algorithmRWMA M ≤ 2.41m + 2.41 ln(n) M ≤ 1.39m + 2 ln(n) The Randomized Weighted Majority Algorithm (RWMA)

25 25 Other advantages of RWMA: 1.Consider the case where just only %51 of the experts were mistaken. In WMA, we directly use this majority and predict accordingly, resulting in a wrong prediction. In RWMA, there is still roughly a 50/50 chance that we’ll predict correctly. 2.Consider the case where predictions are strategies (cannot easily be combined together). In WMA, since all strategies are generally different, we cannot combine experts who predicted the same strategies. RWMA can be directly applied, because it doesn’t depend on summing the weights of experts who gave the same strategy to make a decision, but rather on the individual weights of experts The Randomized Weighted Majority Algorithm (RWMA)

26 Learning a Concept Class in Mistake Bound Model  A Concept Class  Mistake Bound Model  Definition of learning a class in Mistake Bound Model 5 KIM HYEONGCHEOL

27 What we covered so far …  Input : Yes/No from the “experts” Weather experts Question to experts : Will it rain tomorrow? Experts’ prediction : Yes/No  Output : The algorithm make a prediction as well Question to the algorithm : Will it rain tomorrow? Prediction : Yes/No  Penalization according to the correctness  Simple algorithm & Better randomized algorithm 27 Quick Review

28 # Questions WWhat is a concept class C? WWhat is Mistake Bound Model? WWhat do we mean by learning a concept class in Mistake Bound Model? 28 On line learning a concept class C Mistake Bound Model in Learn a Concept Class

29 29 * Disjunction : a ∨ b * Conjunction : a ∧ b A Concept Class C

30  On-line learning  Iteration: The algorithm receives unlabeled example The algorithm predicts the label of the example The algorithm is then given the true label Penalization will be applied to the algorithm depending on correctness  Mistake Bound  The mistake made by the algorithm is bounded by M (ideally, we hope M is as small as possible) 30 Mistake Bound Model

31 Learning a Concept Class in Mistake Bound Model 31

32 Learning a Concept Class in Mistake Bound Model 32

33  If the algorithm takes the assumption and the condition, we can say that, it learns class C in the mistake bound learning model  Especially, if the number of mistakes made is only poly(size(c)) ∙ polylog(n), the algorithm is robust to the presence of many additional irrelevant variables : attribute efficient 33 Learning a Concept Class in Mistake Bound Model

34 Some examples of learning classes in Mistake Bound Model MMonotone disjunctions SSimple algorithm TThe winnow algorithm DDecision list 34 Examples of Learnings

35 Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm 6 KIM HYEONGCHEOL

36 Learning Monotone disjunctions | Problem Definition 36

37 Simple Algorithm 37

38  When the target concept c(x) = X 2 ∨ X 3 Red -> A mistake on negative example Green -> A correct prediction ‘n’ = 6 38 Hypothesis ‘h’ c(x) Negative examples Simple Algorithm | An example

39  39 Simple Algorithm | An example

40 Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm 6 HE RUIDAN

41  The simple algorithm learns the class of disjunctions with mistakes bound by n  Winnow algorithm : An algorithm with less mistakes 41 Learning the class of disjunctions | Winnow Algorithm

42  Winnow Algorithm:  Each input vector x = {x1, x2, … xn}, xi ∈ {0, 1}  Assume the target function is the disjunction of r relevant variables. i.e. c(x) = xt1 V xt2 V … V xtr  Winnow algorithm provides a linear separator 42 Winnow Algorithm | Basic Concept

43  Initialize: weights w 1 = w 2 = … = w n =1  Iterate:  Receive an example vector x = {x 1, x 2, … x n }  Predict: Output 1 if Output 0 otherwise  Get the true label  Update if making a mistake: Predict negative on positve example: for each x i = 1: w i = 2*w i Predict positive on negative example: for each x i = 1: w i = w i /2 43 Winnow Algorithm | Work Flow

44  Theorem: The Winnow Algorithm learns the class of disjunctions in the Mistake Bound model, making at most 2+3r(1 + lgn) mistakes when the target concept is a disjunction of r variables  Attribute efficient: the # of mistakes is only poly(r). polylog(n)  Particularly good for learning where the number of relevant variables r is much less than the total number of variables n 44 Winnow Algorithm | Mistake Bound

45  u: # of mistakes made on positive examples (output 0 while true result is 1)  v: # of mistakes made on negative examples (output 1 while true result is 0)  Proof 1: u <= r(1 + lgn)  Proof 2: v < 2(u + 1)  Therefore, # of total mistakes = u + v = 3u + 2, which is bounded by 2 + 3r(1 + lgn) 45 Winnow Algorithm | Proof of Mistake Bound

46  u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function 46 Winnow Algorithm | Proof of Mistake Bound

47  Any mistakes made on positive examples must double at least one of the weights in the target function  For an example X, h(X) = negative, c(X) = positive  c(X) = positive  at least one target variable is one in X  According to the algorithm, when hypothesis predicts positive as negative, the weights of variables equals to one in the example are doubled, therefore at least the weight of one target variable will be doubled. 47 Winnow Algorithm | Proof of Mistake Bound

48  u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function  The weights of target variables will not be halved. 48 Winnow Algorithm | Proof of Mistake Bound

49  The weights of target variables will not be halved.  According to the algorithm, only when h(X) = positive while c(X) = negative, the weights of variables equals to one in X will be halved.  c(X) = negative  no target variable is one in X  no target variables’ weight will be halved 49 Winnow Algorithm | Proof of Mistake Bound

50  u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function  The weights of target variables will not be halved.  Each of the weights of target variables can be doubled at most 1 + lgn times. 50 Winnow Algorithm | Proof of Mistake Bound

51  Each of the weights of target variables can be doubled at most 1 + lgn times  The weight of target variable could only be doubled, will never be halved.  When the weight of any target variable equals or larger than n, hypothesis will always predict positive, if the target variable is one.  Only when hypothesis predict negative on positive examples, the weights of variables equals to one will be doubled  if the hypothesis always predict positive, no weights will be doubled.  Therefore, the weight of any target variable cannot be doubled when it equals or larger than n  The weight of any target variable can be doubled at most 1+lgn times 51 Winnow Algorithm | Proof of Mistake Bound

52  u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function  Each of the weight of target variables will not be halved. As when any of the target variable is one, the example must not be negative  Each of the weights of target variables can be doubled at most 1 + lgn times as only weight less than n can be doubled.  Therefore, u <= r(1+lgn) since there are r variables in target function 52 Winnow Algorithm | Proof of Mistake Bound

53  u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 2: v < 2(u+1)  Initially, total weight W = n  Mistake on positive examples: W < W + n  Mistake on negative examples: W <= W – n/2  Therefore, W < n + un – v(n/2)  0 <= W < n + un – v(n/2)  v < 2(u + 1) 53 Winnow Algorithm | Proof of Mistake Bound # mistakes = u + v < 3u + 2 # mistakes < 2 + 3r(1 + lgn)

54 Learning Decision List in Mistake Bound Model  Learning Decision List in Mistake Bound Model 7 SHANG XINDI

55 55 Decision List --  level 1 --  level r --  level r+1

56 Decision List | Example 56

57 57 Decision List vs Disjunction

58 58 Learning Decision List

59 59 Learning Decision List | Algorithm

60 60 Learning Decision List | Example

61 61 Learning Decision List | Example

62 62 Learning Decision List | Example 0011 0100 1000 1100

63 63 Learning Decision List | Mistake Bound

64 Summary 1. Introduction: online learning vs. offline learning 2. Predicting from Expert Advice  Weighted Majority Algorithm: Simple Version  Weighted Majority Algorithm: Randomized Version 3. Mistake Bound Model  Learning a Concept Class C  Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm  Learning Decision List 4. Demo of online learning 64

65 Learning to Swing-Up and Balance 65

66 Q & A


Download ppt "On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI."

Similar presentations


Ads by Google