Machine Learning with Discriminative Methods Lecture 02 – PAC Learning and tail bounds intro CS 790-134 Spring 2015 Alex Berg.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

VC Dimension – definition and impossibility result
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
ENGG 2040C: Probability Models and Applications Andrej Bogdanov Spring Limit theorems.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Machine Learning Week 2 Lecture 1.
Longin Jan Latecki Temple University
Randomized Algorithms Randomized Algorithms CS648 Lecture 8 Tools for bounding deviation of a random variable Markov’s Inequality Chernoff Bound Lecture.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
CSE 3504: Probabilistic Analysis of Computer Systems Topics covered: Probability axioms Combinatorial problems (Sec )
Sample size computations Petter Mostad
Computational Learning Theory
Probably Approximately Correct Model (PAC)
CS 4700: Foundations of Artificial Intelligence
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Population Proportion The fraction of values in a population which have a specific attribute p = Population proportion X = Number of items having the attribute.
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
Machine Learning CSE 681 CH2 - Supervised Learning.
Approximation Algorithms Pages ADVANCED TOPICS IN COMPLEXITY THEORY.
6.5 One and Two sample Inference for Proportions np>5; n(1-p)>5 n independent trials; X=# of successes p=probability of a success Estimate:
CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Section 15.3 Area and Definite Integral
Machine Learning with Discriminative Methods Lecture 01 – Of Machine Learning and Loss CS Spring 2015 Alex Berg.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Instructor: Shengyu Zhang. First week Part I: About the course Part II: About algorithms and complexity  What are algorithms?  Growth of functions 
Choosing Sample Size Section Starter A coin is weighted so that it comes up heads 80% of the time. You bet $1 that you can make it come.
Test of Goodness of Fit Lecture 43 Section 14.1 – 14.3 Fri, Apr 8, 2005.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
ENGG 2040C: Probability Models and Applications Andrej Bogdanov Spring Limit theorems.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Minimum Sample Size Proportions on the TI Section
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Revision of basic statistics Hypothesis testing Principles Testing a proportion Testing a mean Testing the difference between two means Estimation.
CS 188: Artificial Intelligence Spring 2006 Lecture 12: Learning Theory 2/23/2006 Dan Klein – UC Berkeley Many slides from either Stuart Russell or Andrew.
1 CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman Machine Learning: The Theory of Learning R&N 18.5.
Machine Learning with Discriminative Methods Lecture 05 – Doing it 1 CS Spring 2015 Alex Berg.
Machine Learning with Discriminative Methods Lecture 00 – Introduction CS Spring 2015 Alex Berg.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Probability Lesson 32Power Up GPage 210. Probability.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Final Review.  On the Saturday after Christmas, it has been estimated that about 14.3% of all mall-goers are there to return or exchange holiday gifts.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Measures of Dispersion
STA 291 Spring 2010 Lecture 18 Dustin Lueker.
Computational Learning Theory
CH. 2: Supervised Learning
INTRODUCTION TO Machine Learning
Computational Learning Theory
INTRODUCTION TO Machine Learning 3rd Edition
Computational Learning Theory
Population Proportion
Model Combination.
9. Limit Theorems.
STA 291 Summer 2008 Lecture 18 Dustin Lueker.
Lecture 14 Learning Inductive inference
INTRODUCTION TO Machine Learning
Lecture 43 Section 14.1 – 14.3 Mon, Nov 28, 2005
Presentation transcript:

Machine Learning with Discriminative Methods Lecture 02 – PAC Learning and tail bounds intro CS 790-134 Spring 2015 Alex Berg

Today’s lecture PAC Learning Tail bounds…

Rectangle learning - Hypothesis H + + - - + + - - - Hypothesis is any axis aligned rectangle. Inside rectangle is positive.

Rectangle learning – Realizable case - Actual boundary is also an axis-aligned rectangle, “The Realizable Case” (no approximation error) Hypothesis H + + - - + + - - -

Rectangle learning – Realizable case - Actual boundary is also an axis-aligned rectangle, “The Realizable Case” (no approximation error) Hypothesis H + + - - + + - - - - A mistake for the hypothesis H! Measure ERROR by the probability of making a mistake.

Rectangle learning – a strategy for a learning algorithm… - Hypothesis H (Output of learning algorithm so far…) + + - - + + - - - Make the smallest rectangle consistent with all the data so far.

Rectangle learning – making a mistake - Hypothesis H (Output of learning algorithm so far…) Current hypothesis makes a mistake on a new data item… + + - - + + + - - - Make the smallest rectangle consistent with all the data so far.

Rectangle learning – making a mistake - Hypothesis H (Output of learning algorithm so far…) Current hypothesis makes a mistake on a new data item… + + - - + + Use probability of such a mistake (this is our error measure) to find a bound for how likely it was we had not yet seen a training example in this region… + - - - Make the smallest rectangle consistent with all the data so far.

Very subtle formulation… R = Actual decision boundary R’ = Result of algorithm so far (after m sample) From the Kearns and Vazirani Reading

From the Kearns and Vazirani Reading

PAC Learning

Flashback: Learning/fitting is a process… Estimating the probability that a tossed coin comes up heads… The i’th coin toss Estimator based on n tosses Estimate is within epsilon Estimate is not within epsilon Probability of being bad is inversely proportional to the number of samples… (the underlying computation is an example of a tail bound) From Raginsky notes

Markov’s Inequality From Raginksy’s notes

Chebyshev’s Inequality From Raginksy’s notes

Not quite good enough… From Raginksy’s notes

For next class Read the wikipedia page for Chernoff Bound: http://en.wikipedia.org/wiki/Chernoff_bound Read at least first Raginsky’s introductory notes on tail bounds (pages 1-5) http://maxim.ece.illinois.edu/teaching/fall14/notes/concentration.pdf Come to class with questions! It is fine to have questions, but first spend some time trying to work through reading/problems. Feel free to post questions to Sakai discussion board!