Classification Heejune Ahn SeoulTech Last updated 2015. May. 03.

Slides:



Advertisements
Similar presentations
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Advertisements

Pattern Recognition and Machine Learning
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Chapter 4: Linear Models for Classification
Longin Jan Latecki Temple University
Lecture 20 Object recognition I
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Clustering.
Adaboost and its application
Machine Learning CMPT 726 Simon Fraser University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Introduction to machine learning
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
For Better Accuracy Eick: Ensemble Learning
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Data mining and machine learning A brief introduction.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Sergios Theodoridis Konstantinos Koutroumbas Version 2
Benk Erika Kelemen Zsolt
Image Classification 영상분류
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
Digital Image Processing
Non-Bayes classifiers. Linear discriminants, neural networks.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Lecture 2: Statistical learning primer for biologists
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Big data classification using neural network
Machine Learning – Classification David Fenyő
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Pattern Recognition and Image Analysis
Revision (Part II) Ke Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
What is Artificial Intelligence?
Presentation transcript:

Classification Heejune Ahn SeoulTech Last updated May. 03

Outline Introduction  Purpose, type, and an example Classification design  Design flow  Simple classifier  Linear discriminant functions  Mahalanobis distance Bayesian classification K-means clustering : unsupervised learning

1.Pupose Purpose  For decision making  Topics of Pattern recognition (in artificial intelligence) Model Automation and Human intervention  Task specification: what classes, what features  Algorithm to used  Training: tuning algorithm parameters Classifier (classification rules) classes Features (patterns, structures) Images

2. Supervised vs unsupervised Supervised (classification)  trained by examples (by humans) Unsupervised (clustering)  only by feature data  using the mathematical properties (statistics) of data set

3. An example Classifying nuts Classifier (classification rules) Pine-nuts Lentils Pumpkin seeds Features (circularity, line-fit-error) lentil pumpkin seed pine nut

Observations  What if a single features used?  What for the singular points? Classification  draw boundaries

Terminalogy

4. Design Flow

5. Prototypes & min-distance classifier Prototypes  mean of training samples in each class

6. Linear discriminant Linear discriminant function  g(x1,x2) = a*x1 + b*x2 + c = 0   Ex 11.1 & Fig11.6

8. Mahalanobis distance Problems In min-dist.  mean-value only, no distribution considered  e.g. (right figure) std(class 1) << std(class 2) Mahalanobis dist. Variance considered. (larger variance, less distance)

9. Bayesian classification Idea  To assign each data to the “most-probable” class, based on “apriori-known probability”  Assumption Priors (probability for class) are known. Bayes theorem

10. Bayes decision rule Classification rule Bayes Theorem Intuitively Class-conditional probability density function Prior probability Total probability & Not used in classification decision

Interpretation  Need to know priors and class-conditional pdf: often not available  MVN (multivariate normal) distribution model Practically quite good approximation MVN  N-D Normal distribution with

12. Bayesian classifier for M-varirates taking log( ) It is monotonic increasing function

Case 1: identical independent Linear Machine: the decision region is hyper-plane (linears) Note: when same prob(w), then Minimum distance criterion

Case 2: all covariance is same: Matlab  [class, err] = classify(test, training, group[, type, prior]) training and test Type ‘DiagLinear’ for naïve Baysian

Ex11.3 wrong priorscorrect priors

13. Ensemble classifier Combining multiple classifiers  Utilizing diversity, similar to ask multiple experts for decision. AdaBoost  Weak classifier: change (1/2) < accuracy << 1.0  weighting mis-classified training data for next classifiers H 1 (x)D 1 (x)H 2 (x) D 2 (x) H t (x) D 2 (x) H T (x) D T (x) uniform a t (x)

AdaBoost in details  Given:  Initialize weight:  For t = 1,..., T: 1. WeakLearn, which return the weak classifier with minimum error w.r.t. distribution D t 2. Choose 3. Update Where Z t is a normalization factor chosen so that D t+1 is a distribution  Output the strong classifier:

14. K-means clustering K-means  Unsupervised classification  Group data to minimize  Iterative algorithm (re-)assign X i ’s to class (re-)calculate c i  Demo

Issues  Sensitive to “initial” centroid values. Multiple trials needed => choose the best one  ‘K’ (# of clusters) should be given. Trade-off in K (bigger) and the objective function (smaller) No optimal algorithm to determine it. Nevertheless  used in most of un-supervised clustering now.

Ex11.4 & F11.10  kmeans function  [classIndexes, centers] = kmeans(data, k, options) k : # of clusters Options: ‘Replicates', ‘Display’