Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Institute of Information Theory and Automation Introduction to Pattern Recognition Jan Flusser
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture 3 Nonparametric density estimation and classification
Lecture 20 Object recognition I
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Vision and Dig. Image Analysis 1 Prof. Heikki Kälviäinen C50A6100 Lectures 12: Object Recognition Professor Heikki Kälviäinen Machine Vision and.
Classification and application in Remote Sensing.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Clustering Unsupervised learning Generating “classes”
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Sergios Theodoridis Konstantinos Koutroumbas Version 2
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Image Classification 영상분류
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Classification Heejune Ahn SeoulTech Last updated May. 03.
1 E. Fatemizadeh Statistical Pattern Recognition.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
1 CLUSTERING ALGORITHMS  Number of possible clusterings Let X={x 1,x 2,…,x N }. Question: In how many ways the N points can be assigned into m groups?
Digital Image Processing
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Introduction Machine Learning 14/02/2017.
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
Revision (Part II) Ke Chen
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Revision (Part II) Ke Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Text Categorization Berlin Chen 2003 Reference:
Hairong Qi, Gonzalez Family Professor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features (n-D vector in a metric space) Pattern Recognition

Recognition (classification) = assigning a pattern/object to one of pre-defined classes Syntactic (structural) PR - the pattern is described by its structure. Formal language theory (class = language, pattern = word) Pattern Recognition

Supervised PR – training set available for each class Unsupervised PR (clustering) – training set not available, No. of classes may not be known Pattern Recognition

PR system - Training stage Definition of the features Selection of the training set Computing the features of the training set Classification rule setup

Desirable properties of the features Invariance Discriminability Robustness Efficiency, independence, completeness

Desirable properties of the training set It should contain typical representatives of each class including intra-class variations Reliable and large enough Should be selected by domain experts

Classification rule setup Equivalent to a partitioning of the feature space Independent of the particular application

PR system – Recognition stage Image acquisition Preprocessing Object detection Computing of the features Classification Class label

An example – Fish classification

The features: Length, width, brightness

2-D feature space

Empirical observation For a given training set, we can have several classifiers (several partitioning of the feature space) The training samples are not always classified correctly We should avoid overtraining of the classifier

Formal definition of the classifier Each class is characterized by its discriminant function g(x) Classification = maximization of g(x) Assign x to class i iff Discriminant functions defines decision boundaries in the feature space

Minimum distance (NN) classifier Discriminant function g(x) = 1/ dist(x,  ) Various definitions of dist(x  One-element training set 

Voronoi polygons

Minimum distance (NN) classifier Discriminant function g(x) = 1/ dist(x,  ) Various definitions of dist(x  One-element training set  Voronoi pol NN classifier may not be linear NN classifier is sensitive to outliers  k-NN classifier

Find nearest training points unless k samples belonging to one class is reached

Discriminant functions g(x) are hyperplanes Linear classifier

Assumption: feature values are random variables Statistic classifier, the decission is probabilistic It is based on the Bayes rule Bayesian classifier

The Bayes rule A posteriori probability Class-conditional probability A priori probability Total probability

Bayesian classifier Main idea: maximize posterior probability Since it is hard to do directly, we rather maximize In case of equal priors, we maximize only

Equivalent formulation in terms of discriminat functions

How to estimate ? Parametric estimate (assuming pdf is of known form, e.g. Gaussian) Non-parametric estimate (pdf is unknown or too complex) From the case studies performed before (OCR, speech recognition) From the occurence in the training set Assumption of equal priors

Parametric estimate of Gaussian

d-dimensional Gaussian pdf

The role of covariance matrix

Two-class Gaussian case in 2D Classification = comparison of two Gaussians

Two-class Gaussian case – Equal cov. mat. Linear decision boundary

Equal priors Classification by minimum Mahalanobis distance If the cov. mat. is diagonal with equal variances then we get “standard” minimum distance rule max min

Non-equal priors Linear decision boundary still preserved

General G case in 2D Decision boundary is a hyperquadric

General G case in 3D Decision boundary is a hyperquadric

More classes, Gaussian case in 2D

What to do if the classes are not normally distributed? Gaussian mixtures Parametric estimation of some other pdf Non-parametric estimation

Non-parametric estimation – Parzen window

The role of the window size Small window  overtaining Large window  data smoothing Continuous data  the size does not matter

n = 1 n = 10 n = 100 n = ∞

The role of the window size Small window Large window

Applications of Bayesian classifier in multispectral remote sensing Objects = pixels Features = pixel values in the spectral bands (from 4 to several hundreds) Training set – selected manually by means of thematic maps (GIS), and on-site observation Number of classes – typicaly from 2 to 16

Satellite MS image

Other classification methods in RS Context-based classifiers Shape and textural features Post-classification filtering Spectral pixel unmixing

Typically for “YES – NO” features Feature metric is not explicitely defined Decision trees Non-metric classifiers

General decision tree

Binary decision tree Any decision tree can be replaced by a binary tree

Real-valued features Node decisions are in form of inequalities Training = setting their parameters Simple inequalities  stepwise decision boundary

Stepwise decision boundary

Real-valued features The tree structure and the form of inequalities influence both performance and speed.

How to evaluate the performance of the classifiers? - evaluation on the training set (optimistic error estimate) - evaluation on the test set (pesimistic error estimate) Classification performance

How to increase the performance? - other features - more features (dangerous – curse of dimensionality!) - other (larger, better) traning sets - other parametric model - other classifier - combining different classifiers Classification performance

Combining (fusing) classifiers C feature vectors Bayes rule: max max Several possibilities how to do that

Product rule Assumption: Conditional independence of x j max

Max-max and max-min rules Assumption: Equal priors max (max ) max (min )

Majority vote The most straightforward fusion method Can be used for all types of classifiers

Training set is not available, No. of classes may not be a priori known Unsupervised Classification (Cluster analysis)

What are clusters? Intuitive meaning - compact, well-separated subsets Formal definition - any partition of the data into disjoint subsets

What are clusters?

How to compare different clusterings? Variance measure J should be minimized Drawback – only clusterings with the same N can be compared. Global minimum J = 0 is reached in the degenerated case.

Clustering techniques Iterative methods - typically if N is given Hierarchical methods - typically if N is unknown Other methods - sequential, graph-based, branch & bound, fuzzy, genetic, etc.

Sequential clustering N may be unknown Very fast but not very good Each point is considered only once Idea: a new point is either added to an existing cluster or it forms a new cluster. The decision is based on the user-defined distance threshold.

Sequential clustering Drawbacks: -Dependence on the distance threshold -Dependence on the order of data points

Iterative clustering methods N-means clustering Iterative minimization of J ISODATA Iterative Self-Organizing DATa Analysis

N-means clustering 1.Select N initial cluster centroids.

N-means clustering 2. Classify every point x according to minimum distance.

N-means clustering 3. Recalculate the cluster centroids.

N-means clustering 4. If the centroids did not change then STOP else GOTO 2.

N-means clustering Drawbacks - The result depends on the initialization. - J is not minimized - The results are sometimes “intuitively wrong”.

N-means clustering – An example Two features, four points, two clusters (N = 2) Different initializations  different clusterings

Initial centroids N-means clustering – An example

Initial centroids N-means clustering – An example

Iterative minimization of J 1.Let’s have an initial clustering (by N-means) 2.For every point x do the following: Move x from its current cluster to another cluster, such that the decrease of J is maximized. 3. If all data points do not move, then STOP.

Example of “wrong” result

ISODATA Iterative clustering, N may vary. Sophisticated method, a part of many statistical software systems. Postprocessing after each iteration -Clusters with few elements are cancelled -Clusters with big variance are divided -Other merging and splitting strategies can be implemented

Hierarchical clustering methods Agglomerative clustering Divisive clustering

Basic agglomerative clustering 1.Each point = one cluster

Basic agglomerative clustering 1.Each point = one cluster 2.Find two “nearest” or “most similar” clusters and merge them together

Basic agglomerative clustering 1.Each point = one cluster 2.Find two “nearest” or “most similar” clusters and merge them together 3.Repeat 2 until the stop constraint is reached

Basic agglomerative clustering Particular implementations of this method differ from each other by - The STOP constraints - The distance/similarity measures used

Simple between-cluster distance measures d(A,B) = d(m 1,m 2 ) d(A,B) = min d(a,b) d(A,B) = max d(a,b)

Other between-cluster distance measures d(A,B) = Hausdorf distance H(A,B) d(A,B) = J(AUB) – J(A,B)

Agglomerative clustering – representation by a clustering tree (dendrogram)

How many clusters are there? 2 or 4 ? Clustering is a very subjective task

How many clusters are there? Difficult to answer even for humans “Clustering tendency” Hierarchical methods – N can be estimated from the complete dendrogram The methods minimizing a cost function – N can be estimated from “knees” in J-N graph

Life time of the clusters Optimal number of clusters = 4

Optimal number of clusters

Applications of clustering in image proc. Segmentation – clustering in color space Preliminary classification of multispectral images Clustering in parametric space – RANSAC, image registration and matching Numerous applications are outside image processing area

References Duda, Hart, Stork: Pattern Clasification, 2 nd ed., Wiley Interscience, 2001 Theodoridis, Koutrombas: Pattern Recognition, 2 nd ed., Elsevier, 2003