Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004. Perceptron Rule and Convergence Proof Capacity.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Introduction to Neural Networks Computing
VC theory, Support vectors and Hedged prediction technology.
G53MLE | Machine Learning | Dr Guoping Qiu
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.
Data Mining Classification: Alternative Techniques
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Classification Neural Networks 1
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Biological neuron artificial neuron.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Artificial Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Multi-Layer Perceptrons Michael J. Watts
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
2. Bayes Decision Theory Prof. A.L. Yuille Stat 231. Fall 2004.
Artificial Intelligence Techniques Multilayer Perceptrons.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Non-Bayes classifiers. Linear discriminants, neural networks.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Linear Classification with Perceptrons
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Linear Models for Classification
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter 2 Single Layer Feedforward Networks
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Perceptrons Michael J. Watts
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Neural networks and support vector machines
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
CS621: Artificial Intelligence
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Classification Neural Networks 1
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter - 3 Single Layer Percetron
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity of Perceptrons. Multi-layer Perceptrons. Read 5.4, Duda, Hart, Stork.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 2. Linear Separation N samples where the Can we find a hyperplane in feature space through the origin, that separates the two types of samples

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Linear Separation For the two-class case, simplify by replacing all samples with Then find a plane such that The weight vector is almost never unique. Determine the weight vector that has the biggest margin m(>0), where (Next lecture). Discriminative: no attempt to model probability distributions. Recall that the decision boundary is a hyperplane if the distributions are Gaussian with identical covariance.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 4. Perceptron Rule Assume there is a hyperplane separating the two classes. How can we find it? Single Sample Perceptron Rule. Order samples Set loop over j, if is misclassified, set repeat until all samples are classified correctly.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 5. Perceptron Convergence Novikov’s Theorem: the single sample Perceptron rule will converge to a solution weight, if one exists. Proof. Suppose is a separating weight. Then decreases by at least for each misclassified sample. Initialize weight at 0. Then number of weight changes is less than

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 6. Perceptron Convergence Proof of claim. If Using

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 7. Perceptron Capacity The Perceptron was very influencial and unrealistic claims were made about its abilities (1950’s, early 1960’s). The model is an idealized model of neurons. An entire book was published in the mid 1960’s describing the limited capacity of Perceptrons (Minsky and Papert). Some classifications, exclusive or, can’t be performed by linear separation. But, from Learning Theory, limited capacity is good.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 8. Generalization and Capacity. The Perceptron is useful precisely because it has finite capacity and so cannot represent all classifications. The amount of training data required to ensure Generalization will need to be larger than the capacity. Infinite capacity requires infinite data. Full definition of Perceptron capacity must wait till we introduce Vapnik Chevonenkis (VC) dimension. But the following result (Cover) gives the basic idea..

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 9. Perceptron Capacity Suppose we have n sample points in a d dimensional feature space. Assume that these points are in general position – no subset of (d+1) points lies in a (d-1) dimensional subspace Let f(n,d) be the fraction of the 2^n dichotomies of the n points which can be expressed by linear separation. It can be shown (D.H.S) that f(n,d) =1, for otherwise There is a critical value 2(d+1). f(n,d)=1 for n << 2(d+1), f(n,d) =0 for n >> 2(d+1), transition rapid for large d.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 10. Capacity and Generalization Perceptron capacity is d+1. The probability of finding a separating hyperplane by chance alignment of the samples decreases rapidly for n > 2(d+1).

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 11. Multi-Layer Perceptrons Multilayer Perceptrons were introduced in the 1980’s to increase capacity. Motivated by biological arguments (dubious). Key Idea: replace the binary decision rule by a Sigmoid function: (Step function as T tends to 0). Input units activity Hidden units Output units Weights connecting the Input units to the hidden units, and the hidden units to the output units.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 12. Multilayer Perceptrons Multilayer perceptrons can represent any function provided there are a sufficient number of hidden units. But the number of hidden units may be enormous. Also the ability to represent any function may be bad, because of generalization/memorization. Difficult to analyze multilayer perceptrons. They are like “black boxes”. When they are successful, there is often a simpler, more transparent alternative The Neuronal plausibility for multilayer perceptrons is unclear.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 13. Multilayer Perceptrons Train the multilayer perceptron using training data Define error function for each sample Minimize the error function for each sample by steepest descent: Backpropagation algorithm (propagation of errors).

Lecture notes for Stat 231: Pattern Recognition and Machine Learning Summary Perceptron and Linear Separability. Perceptron rule and convergence proof. Capacity of Perceptrons. Multi-layer Perceptrons. Next Lecture – Support Vector Machines for Linear Separation.