# Pattern Recognition Ku-Yaw Chang

## Presentation on theme: "Pattern Recognition Ku-Yaw Chang"— Presentation transcript:

Pattern Recognition Ku-Yaw Chang canseco@mail.dyu.edu.tw
Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh University

Outline Introduction Features and Classes Supervised v.s. Unsupervised
Statistical v.s. Structural (Syntactic) Statistical Decision Theory 2004/03/02 Pattern Recognition

Supervised v.s. Unsupervised
Supervised learning Using a training set of patterns of known class to classify additional similar samples Unsupervised learning Dividing samples into groups or clusters based on measures of similarity without any prior knowledge of class membership 2004/03/02 Pattern Recognition

Supervised v.s. Unsupervised
Dividing the class into two groups: Supervised learning Male features Female features Unsupervised learning Male v.s. Female Tall v.s. Short With v.s. Without glasses 2004/03/02 Pattern Recognition

Statistical v.s. Structural
Statistical PR To obtain features by manipulating the measurements as purely numerical (or Boolean) variables Structural (Syntactic) PR To design features in some intuitive way corresponding to human perception of the objects 2004/03/02 Pattern Recognition

Statistical v.s. Structural
Optical Character Recognition (OCR) Statistical PR Structural PR 2004/03/02 Pattern Recognition

Statistical Decision Theory
An automated classification system Classified data sets Selected features 2004/03/02 Pattern Recognition

Statistical Decision Theory
Hypothetical Basketball Association (HBA) apg : average number of points per game To predict the winner of the game Based on the difference between the home team’s apg and the visiting team’s apg for previous games Training set Scores of previously played games Home team classified as a winner or a loser 2004/03/02 Pattern Recognition

Statistical Decision Theory
Given a game to be played, predict the home team to be a winner or loser using the feature: dapg = Home Team apg – Visiting Team apg 2004/03/02 Pattern Recognition

Statistical Decision Theory
Game dapg Home Team 1 1.3 Won 16 -3.1 2 -2.7 Lost 17 1.7 3 -0.5 18 2.8 4 -3.2 19 4.6 5 2.3 20 3.0 6 5.1 21 0.7 7 -5.4 22 10.1 8 8.2 23 2.5 9 -10.8 24 0.8 10 -0.4 25 -5.0 11 10.5 26 8.1 12 -1.1 27 -7.1 13 28 2.7 14 -4.2 29 -10.0 15 -3.4 30 -6.5 2004/03/02 Pattern Recognition

Statistical Decision Theory
A histogram of dapg 2004/03/02 Pattern Recognition

Statistical Decision Theory
The classification cannot be performed perfectly using the single feature dapg. Probability of membership in each class With the smallest expected penalty Decision boundary or threshold The value T for Home Team Won: dapg is less than or equal to T Lost: dapg is greater than T 2004/03/02 Pattern Recognition

Statistical Decision Theory
Home team’s apg = 103.4 Visiting team’s apg = 102.1 dapg = – = 1.3 and 1.3 > T Home team will win the game T = 0.8 or -6.5 T = 0.8 achieves the minimum error rate 2004/03/02 Pattern Recognition

Statistical Decision Theory
Adding an additional feature to increase the accuracy of classification dwp = Home Team wp – Visiting Team wp wp denotes the winning percentage 2004/03/02 Pattern Recognition

Statistical Decision Theory
Game dapg dwp Home Team 1 1.3 25.0 Won 16 -3.1 9.4 2 -2.7 -16.9 Lost 17 1.7 6.8 3 -0.5 5.3 18 2.8 17.0 4 -3.2 -27.5 19 4.6 13.3 5 2.3 -18.0 20 3.0 -24.0 6 5.1 31.2 21 0.7 -17.8 7 -5.4 5.8 22 10.1 44.6 8 8.2 34.3 23 2.5 -22.4 9 -10.8 -56.3 24 0.8 12.3 10 -0.4 25 -5.0 -3.8 11 10.5 16.3 26 8.1 36.0 12 -1.1 -17.6 27 -7.1 -20.6 13 5.7 28 2.7 23.2 14 -4.2 16.0 29 -10.0 -46.9 15 -3.4 30 -6.5 19.7 2004/03/02 Pattern Recognition

Statistical Decision Theory
Feature vector (dapg, dwp) Presented as a scatterplot W W W W W W W W W W L W W W W L L W W W L L L L W W L W L L 2004/03/02 Pattern Recognition

Statistical Decision Theory
The feature space can be divided into two decision region by a straight line Linear decision boundary If a feature space cannot be perfectly separated by a straight line, a more complex boundary line might be used. 2004/03/02 Pattern Recognition

Exercise One The values of a feature x for nine samples from class A are 1, 2, 3, 3, 4, 4, 6, 6, 8. Nine samples from class B had x values of 4, 6, 7, 7, 8, 9, 9, 10, 12. Make a histogram (with an interval width of 1) for each class and find a decision boundary (threshold) that minimizes the total number of misclassifications for this training data set. 2004/03/02 Pattern Recognition

Exercise Two Can the feature vectors (x,y) = (2,3), (3,5), (4,2), (2,7) from class A be separated from four samples from class B located at (6,2), (5,4), (5,6), (3,7) by a linear decision boundary? If so, give the equation of one such boundary and plot it. If not, find a boundary that separates them as well as possible. 2004/03/02 Pattern Recognition

Similar presentations