Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Component Analysis (Review)
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern recognition Professor Aly A. Farag
Visual Recognition Tutorial
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Pattern Recognition: Readings: Ch 4: , , 4.13
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Chapter 4 (part 2): Non-Parametric Classification
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Principles of Pattern Recognition
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Classification Techniques: Bayesian Classification
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Pattern Recognition 1 Pattern recognition is: 1. The name of the journal of the Pattern Recognition Society. 2. A research area in which patterns in data.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Computer and Robot Vision II Chapter 20 Accuracy Presented by: 傅楸善 & 王林農 指導教授 : 傅楸善 博士.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Machine Learning – Classification David Fenyő
Chapter 3: Maximum-Likelihood Parameter Estimation
Ch9: Decision Trees 9.1 Introduction A decision tree:
Pattern Recognition Pattern recognition is:
Classification Techniques: Bayesian Classification
LECTURE 05: THRESHOLD DECODING
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Hidden Markov Models Part 2: Algorithms
Computer Vision Chapter 4
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Computer Vision Chapter 4
Computer Vision Chapter 4
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
LECTURE 05: THRESHOLD DECODING
Multivariate Methods Berlin Chen, 2005 References:
Computer and Robot Vision I
Computer and Robot Vision I
Computer Vision II Chapter 20 Accuracy
Presentation transcript:

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C. Computer Vision Chapter 4 Statistical Pattern Recognition Yeh Jin Long Dr. Fuh Chiou Shann

DC & CV Lab. CSIE NTU Summary Introduction. Bayesian approach. Maximin decision rule. Misidentification and false-alarm error rates. Nearest neighbor rule. Construction of decision trees. Estimation of decision rules error. Neural network.

DC & CV Lab. CSIE NTU Introduction

DC & CV Lab. CSIE NTU Introduction(cont.) Units: Image regions and projected segments Each unit has an associated measurement vector data set : { circular arc, a hole, … } Using decision rule to assign unit to class or category optimally

DC & CV Lab. CSIE NTU Introduction (Cont.) Feature selection and extraction techniques Have no hole Have hole Decision rule construction techniques word O - number 0 Techniques for estimating decision rule error

DC & CV Lab. CSIE NTU Simple Pattern Discrimination Also called pattern identification process A unit is observed or measured A category assignment is made that names or classifies the unit as a type of object The category assignment is made only on observed measurement (pattern)

DC & CV Lab. CSIE NTU Simple Pattern Discrimination (cont.) a: assigned category from a set of categories C t: true category identification from C d: observed measurement from a set of measurements D (t, a, d): event of classifying the observed unit P(t, a, d): probability of the event (t, a, d)

DC & CV Lab. CSIE NTU e(t, a): economic gain/utility with true category t and assigned category a A mechanism to evaluate a decision rule Identity gain matrix Economic Gain Matrix

DC & CV Lab. CSIE NTU An Instance

DC & CV Lab. CSIE NTU Another Instance P(g, g): probability of true good, assigned good, P(g, b): probability of true good, assigned bad,... e(g, g): economic consequence for event (g, g), … e positive: profit consequence e negative: loss consequence

DC & CV Lab. CSIE NTU Another Instance (cont.)

DC & CV Lab. CSIE NTU Another Instance (cont.)

DC & CV Lab. CSIE NTU Another Instance (cont.) Fraction of good objects manufactured P(g) = P(g, g) + P(g, b) Fraction of bad objects manufactured P(b) = P(b, g) + P(b, b) Expected profit per object E =

DC & CV Lab. CSIE NTU Conditional Probability P(b|g): false-alarm rate P(g|b): misdetection rate

DC & CV Lab. CSIE NTU Conditional Probability (cont.) Another formula for expected profit per object E = =P(g|g)P(g)e(g,g)+P(b|g)P(g)e(g,b) + P(g|b)P(b)e(b,g)+P(b|b)P(b)e(b,b)

DC & CV Lab. CSIE NTU Example 4.1 P(g) = 0.95, P(b) = 0.05 Table 4.4: Machine performance P(g) = 0.95P(b) = 0.05Detected State GoodBad True State GoodP(g|g) = 0.8P(b|g) = 0.2 BadP(g|b) = 0.1P(b|b) = 0.9 E = Table 4.5: Economic consequence Detected State GoodBad True State Goode(g|g) = $2000 e(g|b) = - $100 Bad e(b|g) = - $10.000e(b|b) = - $100

DC & CV Lab. CSIE NTU Example 4.1 (cont.)

DC & CV Lab. CSIE NTU Example 4.2 P(g) = 0.95, P(b) = 0.05 Table 4.6: Machine performance P(g) = 0.95P(b) = 0.05Detected State GoodBad True State GoodP(g|g) = 0.85P(b|g) = 0.15 BadP(g|b) = 0.12P(b|b) = 0.88 E = Table 4.7: Economic consequence Detected State GoodBad True State Goode(g|g) = $2000 e(g|b) = - $100 Bad e(b|g) = - $10.000e(b|b) = - $100

DC & CV Lab. CSIE NTU Example 4.2 (cont.)

DC & CV Lab. CSIE NTU Decision Rule Construction (t, a): summing (t, a, d) on every measurements d Therefore, Average economic gain

DC & CV Lab. CSIE NTU Decision Rule Construction (cont.)

DC & CV Lab. CSIE NTU Decision Rule Construction (cont.) We can use identity matrix as the economic gain matrix to compute the probability of correct assignment:

DC & CV Lab. CSIE NTU Fair Game Assumption Decision rule uses only measurement data in assignment; the nature and the decision rule are not in collusion In other words, P(a| t, d) = P(a| d)

DC & CV Lab. CSIE NTU Fair Game Assumption (cont.) From the definition of conditional probability

DC & CV Lab. CSIE NTU P(t, a, d) = P(a| t, d)*P(t,d) //By conditional probability = P(a| d)*P(t,d) //By fair game assumption By definition, = Fair Game Assumption (cont.)

DC & CV Lab. CSIE NTU Deterministic Decision Rule We use the notation f(a|d) to completely define a decision rule; f(a|d) presents all the conditional probability associated with the decision rule A deterministic decision rule: Decision rules which are not deterministic are called probabilistic/nondeterministic/stochastic

DC & CV Lab. CSIE NTU Previous formula By // By conditional probability and //By p.23 => Expected Value on f(a|d)

DC & CV Lab. CSIE NTU Expected Value on f(a|d) (cont.)

DC & CV Lab. CSIE NTU Bayes Decision Rules Maximize expected economic gain Satisfy Constructing f

DC & CV Lab. CSIE NTU Measurement P(c,d)d1d1 d2d2 d3d3 True categoryc1c Identificationc2c Measurement f(a|d)d1d1 d2d2 d3d3 Assigned categoryc1c1 Identificationc2c2 e Assigned c1c1 c2c2 Truec1c1 c2c2 E[e; f] = Figure 4.2 Calculation of the Bayes decision rule and calculation of the expected gain. E[e; f] = Σ {Σ f(a | d) [Σ e(t, a)P(t,d)] } d € Da € Ct € C

DC & CV Lab. CSIE NTU Bayes Decision Rules (cont.)

DC & CV Lab. CSIE NTU Assigned c1c1 c2c2 True c1c1 c2c2

DC & CV Lab. CSIE NTU Bayes Decision Rules (cont.) + +

DC & CV Lab. CSIE NTU Continuous Measurement For the same example, try the continuous density function of the measurements: and Measurements lie in the close interval [0,1] Prove that they are indeed density function

DC & CV Lab. CSIE NTU Continuous Measurement (cont.) Suppose that the prior probability of is and the prior probability of is = When, a Bayes decision rule will assign an observed unit to t1, which implies =>

DC & CV Lab. CSIE NTU DC & CV Lab. CSIE NTU Continuous Measurement (cont.) E[e;f] =.805 >.68, the continuous measurement has larger expected economic gain than discrete

DC & CV Lab. CSIE NTU Continuous Measurement (cont.).805 >.68, the continuous measurement has larger expected economic gain than discrete

DC & CV Lab. CSIE NTU Prior Probability The Bayes rule: Replace with The Bayes rule can be determined by assigning any categories that maximizes

DC & CV Lab. CSIE NTU Economic Gain Matrix Identity matrix Incorrect loses 1 A more balanced instance

Economic Gain Matrix Suppose are two different economic gain matrix with relationship According to the construction rule. Given a measurement d, Because We then got DC & CV Lab. CSIE NTU

DC & CV Lab. CSIE NTU Maximin Decision Rule Maximizes average gain over worst prior probability

DC & CV Lab. CSIE NTU Example 4.3

DC & CV Lab. CSIE NTU Example 4.3 (cont.) P(d | c)d1d1 d2d2 d3d3 c1c c2c2.4.1

DC & CV Lab. CSIE NTU Example 4.3 (cont.)

E[e;f] P(c1) Maximin gain for deterministic rules Maximin gain Conditional gains E[e|c1;f]E[e|c2;f] Decision rule f110 f2.5.1 f3.7.4 f4.2.5 f5.8.5 f6.3.6 f7.5.9 f801

DC & CV Lab. CSIE NTU Example 4.3 (cont.)

DC & CV Lab. CSIE NTU Example 4.3 (cont.) The lowest Bayes gain is achieved when The lowest gain is

DC & CV Lab. CSIE NTU Example 4.3 (cont.)

DC & CV Lab. CSIE NTU P( d|c )d1d1 d2d2 c1c1 3/41/4 c2c2 1/87/8 ec1c1 c2c2 c1c1 c2c2 020/7 Example 4.4 d1d1 d2d2 E[e|c1;f]E[e|c2;f] f1c1c1 c1c1 f2c1c1 c2c2 f3c2c2 c1c1 f4c2c2 c2c2

DC & CV Lab. CSIE NTU Example 4.4 P( d|c )d1d1 d2d2 c1c1 3/41/4 c2c2 1/87/8 ec1c2 c1c1 c2c2 020/7 d1d1 d2d2 E[e|c1;f]E[e|c2;f] f1c1c1 c1c1 0 f2c1c1 c2c2 f3c2c2 c1c1 1/65/14 f4c2c2 c2c2 20/7

DC & CV Lab. CSIE NTU Example 4.4 (cont.)

DC & CV Lab. CSIE NTU Example 4.4 (cont.)

DC & CV Lab. CSIE NTU Example 4.5

DC & CV Lab. CSIE NTU Example 4.5 (cont.)

DC & CV Lab. CSIE NTU Example 4.5 (cont.)

f1 and f4 forms the lowest Bayes gain Find some p that eliminate P(c1) p = DC & CV Lab. CSIE NTU

DC & CV Lab. CSIE NTU Example 4.5 (cont.)

DC & CV Lab. CSIE NTU Decision Rule Error The misidentification error α k The false-identification error β k

DC & CV Lab. CSIE NTU An Instance

DC & CV Lab. CSIE NTU Reserving Judgment The decision rule may withhold judgment for some measurements Then, the decision rule is characterized by the fraction of time it withhold judgment and the error rate for those measurement it does assign. It is an important technique to control error rate.

Reserving Judgment Let be the maximum Type I error we can tolerate with category k Let be the maximum Type II error we can tolerate with category k Measurement that will not be rejected (acceptance region) DC & CV Lab. CSIE NTU

DC & CV Lab. CSIE NTU Nearest Neighbor Rule Assign pattern x to the closest vector in the training set The definition of “closest”: where is a metric or measurement space Chief difficulty: brute-force nearest neighbor algorithm computational complexity proportional to number of patterns in training set

DC & CV Lab. CSIE NTU Binary Decision Tree Classifier Assign by hierarchical decision procedure

DC & CV Lab. CSIE NTU Major Problems Choosing tree structure Choosing features used at each non-terminal node Choosing decision rule at each non-terminal node

DC & CV Lab. CSIE NTU Decision Rules at the Non-terminal Node Thresholding the measurement component Fisher’s linear decision rule Bayes quadratic decision rule Bayes linear decision rule Linear decision rule from the first principal component

DC & CV Lab. CSIE NTU Error Estimation An important way to characterize the performance of a decision rule Training data set: must be independent of testing data set Hold-out method: a common technique construct the decision rule with half the data set, and test with the other half

DC & CV Lab. CSIE NTU Neural Network A set of units each of which takes a linear combination of values from either an input vector or the output of other units

DC & CV Lab. CSIE NTU Neural Network (cont.) Has a training algorithm Responses observed Reinforcement algorithms Back propagation to change weights

DC & CV Lab. CSIE NTU Summary Bayesian approach Maximin decision rule Misidentification and false-alarm error rates Nearest neighbor rule Construction of decision trees Estimation of decision rules error Neural network