Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
STATISTICS Random Variables and Distribution Functions
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Introduction to Algorithms 6.046J/18.401J
February 21, 2002 Simplex Method Continued
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Summary of Convergence Tests for Series and Solved Problems
Conversion Problems 3.3.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Koby Crammer Department of Electrical Engineering
Announcements Homework 6 is due on Thursday (Oct 18)
Copyright © Cengage Learning. All rights reserved.
Detection Chia-Hsin Cheng. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outlines Detection Theory Simple Binary Hypothesis Tests Bayes.
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
EU market situation for eggs and poultry Management Committee 20 October 2011.
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
2 |SharePoint Saturday New York City
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
How to convert a left linear grammar to a right linear grammar
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
1 Using one or more of your senses to gather information.
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Simple Linear Regression Analysis
Energy Generation in Mitochondria and Chlorplasts
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Basics of Statistical Estimation
State Variables.
Online Learning Algorithms
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Presentation transcript:

Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1

2 Online Learning Tyrannosaurus rex

3 Online Learning Triceratops

4 Online Learning Tyrannosaurus rex Velocireptor

5 Formal Setting – Binary Classification Instances –Images, Sentences Labels –Parse tree, Names Prediction rule –Linear predictions rules Loss –No. of mistakes

6 Online Framework Initialize Classifier Algorithm works in rounds On round the online algorithm : – Receives an input instance – Outputs a prediction – Receives a feedback label – Computes loss – Updates the prediction rule Goal : –Suffer small cumulative loss

7 Why Online Learning? Fast Memory efficient - process one example at a time Simple to implement Formal guarantees – Mistake bounds Online to Batch conversions No statistical assumptions Adaptive Not as good as a well designed batch algorithms

8 Update Rules Online algorithms are based on an update rule which defines from (and possibly other information) Linear Classifiers : find from based on the input Some Update Rules : –P–Perceptron (Rosenblat) –A–ALMA (Gentile) –R–ROMMA (Li & Long) –N–NORMA (Kivinen et. al) –M–MIRA (Crammer & Singer) –E–EG (Littlestown and Warmuth) –B–Bregman Based (Warmuth) –N–Numerous Online Convex Programming Algorithms

9 Today The Perceptron Algorithm : –Agmon 1954; –Rosenblatt , –Block 1962, Novikoff 1962, –Minsky & Papert 1969, –Freund & Schapire 1999, –Blum & Dunagan 2002

10 The Perceptron Algorithm If No-Mistake –Do nothing If Mistake –Update Margin after update : Prediction :

11 Geometrical Interpretation

12 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Cumulative Loss Suffered by the Algorithm Sequence of Prediction Functions Cumulative Loss of Competitor

13 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Inequality Possibly Large Gap Regret Extra Loss Competitiveness Ratio

14 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Grows With T Constant Grows With T

15 For any competitor prediction function We bound the loss suffered by the algorithm with the loss suffered by Relative Loss Bound Best Prediction Function in hindsight for the data sequence

16 Remarks If the input is inseparable, then the problem of finding a separating hyperplane which attains less then M errors is NP-hard (Open hemisphere) Obtaining a zero-one loss bound with a unit competitiveness ratio is as hard as finding a constant approximating error for the Open Hemesphere problem. Bound of the number of mistakes the perceptron makes with the hinge loss of any compitetor

17 Definitions Any Competitor The parameters vector can be chosen using the input data The parameterized hinge loss of on True hinge loss 1-norm of hinge loss

18 Geometrical Assumption All examples are bounded in a ball of radius R

19 Perceptron’s Mistake Bound Bounds : If the sample is separable then

20 Proof - Intuition Two views : –The angle between and decreases with –The following sum is fixed as we make more mistakes, our solution is better [FS99, SS05]

21 Proof Define the potential : Bound it’s cumulative sum from above and below [C04]

22 Proof Bound from above : Telescopic Sum Non-Negative Zero Vector

23 Proof Bound From Below : –No error on t th round –Error on t th round

24 Proof We bound each term :

25 Proof Bound From Below : –No error on t th round –Error on t th round Cumulative bound :

26 Proof Putting both bounds together : We use first degree of freedom (and scale) : Bound :

27 Proof General Bound : Choose : Simple Bound : Objective of SVM

28 Proof Better bound : optimize the value of

29 Remarks Bound does not depend on dimension of the feature vector The bound holds for all sequences. It is not tight for most real world data But, there exists a setting for which it is tight – worst

30 Three Bounds

31 Separable Case Assume there exists such that for all examples Then all bounds are equivalent Perceptron makes finite number of mistakes until convergence (not necessarily to )

32 Separable Case – Other Quantities Use 1 st (parameterization) degree of freedom Scale the such that Define The bound becomes

33 Separable Case - Illustration

34 separable Case – Illustration The Perceptron will make more mistakes Finding a separating hyperplance is more difficult

35 Inseparable Case Difficult problem implies a large value of In this case the Perceptron will make a large number of mistakes

36 Perceptron Algorithm Extremely easy to implement Relative loss bounds for separable and inseparable cases. Minimal assumptions (not iid) Easy to convert to a well-performing batch algorithm (under iid assumptions) Quantities in bound are not compatible : no. of mistakes vs. hinge-loss. Margin of examples is ignored by update Same update for separable case and inseparable case.

Concluding Remarks Batch vs. Online –Two phases: Training and then Test –Single continues process Statistical Assumption –Distribution over examples –All sequences Conversions –Online -> Batch –Batch -> Online 37