Ch.9 Pattern Recognition

Ch.9 Pattern Recognition
Slides are from KAIST J. Kim

What is Pattern Recognition?
A pattern is an object, process or event that can be given a name Pattern Recognition The assignment of a physical object or event to one of several prespecified categories -- Duda & Hart A subfield of Artificial Intelligence The bulk of human intelligence is based on pattern recognition: the quintessential example of self-organization 2

Resources Professional Association Text Books Journals
International Association for Pattern Recognition (IAPR) Text Books Pattern Classification by Richard O. Duda, Pater E. Hart, and David G, Storks Journals IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern Recognition Pattern Recognition Letters Artificial Intelligence and Pattern Recognition … Conference and Workshops International Conference for Pattern Recognition IEEE Computer Vision and Pattern Recognition Int’l Conference on Document Analysis and Recognition Int’l Workshop in Frontiers of Handwriting Recognition 3

Examples of Patterns 4

Related fields and Application areas of PR
Adaptive signal processing Machine learning Artificial neural networks Robotics and vision Cognitive science Mathematical statistics Nonlinear optimization Exploratory data analysis Fuzzy and genetic algorithm Detection and estimation theory Formal languages Structural modeling Biological cybernetics Computational neuroscience … Image processing /segmentation Computer vision Speech recognition Automated target recognition Optical character recognition Seismic analysis Man and machine dialogue Fingerprint identification Industrial inspection Medical diagnosis ECG signal analysis Data mining Gene sequence analysis Protein structure anlaysis Remote sensing Aerial reconnaissance 5

패턴인식의 응용 Computer aided diagnosis 영상인식 음성인식 Genome Sequence Analysis
Medical imaging, EEF, ECG, X- ray mammography 영상인식 공장 자동화, Robot Navigation 얼굴식별, Gesture Recognition Automatic Target Recognition 음성인식 Speaker identification Speech recognition Genome Sequence Analysis 6

생체 인식 불변의 생체 특징을 이용한 사람 식별 정적 패턴 동적 패턴 출입통제 전자상거래 인증 지문, 홍채, 얼굴, 장문
손등의 정맥 동적 패턴 Signature, 성문 Typing pattern 출입통제 전자상거래 인증 7

Gesture Recognition Object manipulation in Virtual Reality
Tele-operations Control remote by gesture input TV control by hand motion Text editing on Pen Computers Sign language Interpretation 8

데이터로부터 패턴의 추출 Data Mining
데이타 의사결정 정보 인구통계 Point of Sale ATM 금융통계 신용정보 문헌 첩보자료 진료기록 신체검사기록 A상품 구매자의 80%가 B상품도 구매한다 (CRM) 미국시장의 자동차 구매력이 6개월간 감소 A상품의 매출 증가가 B상품의 2배 탈수 증상을 보이면 위험 광고전략은 ? 상품의 진열 최적의 예산 할당은 ? 시장점유의 확대방안은 ? 고객의 이탈 방지책은 ? 처방은 ? 국내 사례: 신용카드 사용 패턴의 학습에 의한 분실 카드 사용 방지 9

e-Book, Tablet PC, PDA, M-phone
10

Input Devices PDA with Camera or Scanner Pen Scanner & Computer
Motion_Sensing Pen 11

Freehand Equation Input
수식 입력 시스템 : 문서 편집기와 연계 전자 펜으로 수식 입력 수식 인식 12

古文書 : 承政院日記 13

Verification & Correction Interface
14

Mail Sorter 15

Autonomous Land Vehicle (DARPA’s GrandChallenge contest)
16

Protein Structure Analysis
17

Types of PR problems Classification Clustering Regression Description
Assigning an object to a class Output: a label of class Ex: classifying a product as ‘good’ or ‘bad’ in quality control Clustering Organizing objects into meaningful groups Output: (hierarchical) grouping of objects Ex: taxonomy of species Regression Predict value based on observation Ex: predict stock price, future prediction Description Representing an object in terms of a series of primitives Output: a structural or linguistic description Ex: labeling ECG signals, video indexing, protein structure indexing 18

Pattern Class A collection of “similar” (not necessarily identical) objects Inter-class variability Intra-class variability Pattern Class Model descriptions of each class/population (e.g., a probability density like Gaussian) 19

Classification vs Clustering
Classification (known categories) Clustering (creation of new categories) Category “A” Category “B” Clustering (Unsupervised Classification) Classification (Recognition) (Supervised Classification) 20

Pattern Recognition : Key Objectives
1) Process the sensed data to eliminate noise Data vs Noise 2) Hypothesize the models that describe each class population Then we may recover the process that generated the patterns. 3) Choose the best-fitting model for given sensed data to assign the class label associated with the model. 21

일반적인 Classification 과정
Sensor signal Sensor Feature Extractor Feature Classifier Class Membership 22

Example : Salmon or Sea Bass
Sort incoming fish on a belt according to two classes: Salmon or Sea Bass Steps: Preprocessing (segmentation) Feature extraction (measure features or properties) Classification (make final decision) 23

Sea bass vs Salmon Discrimination
Possible features to be used: Length Lightness Width Number and shape of fins Position of the mouth Etc … 24

Salmon vs. Sea Bass (by length)
25

Salmon vs. Sea Bass (by lightness)
Best Decision Strategy with lightness 26

Cost of Misclassification
There are two possible classification errors. (1) deciding a sea bass into a salmon. (2) deciding a salmon into a sea bass. Which error is more important ? Generalized as Loss function Then, look for the decision of minimun Risk Risk = Expected Loss Loss Function Salmon Sea Bass -10 Sea bass -20 decision truth 27

Classification with more features (by length and lightness)
It is possibly better. Really ?? 28

How Many Features and Which?
Choice of features determines success or failure of classification task For a given feature, we may compute the best decision strategy from the (training) data Is called training, parameter adaptation, learning Machine Learning Issues Issues with feature extraction: Correlated features do not improve performance. It might be difficult to extract certain features. It might be computationally expensive to extract many features. “Curse” of dimensionality … 29

Components of PR system
Sensors and preprocessing Feature extraction Class assignment Pattern Classifier Learning algorithm Teacher Sensors and preprocessing. A feature extraction aims to create discriminative features good for classification. A classifier. A teacher provides information about hidden state A learning algorithm sets PR from training examples. 32

PR Approaches Template matching Statistical PR: Structural PR:
The pattern to be recognized is matched against a stored template Works only simple problems Statistical PR: based on underlying statistical model of patterns(features) and pattern classes. Class-conditional probability Pr(X|Ci) Structural PR: pattern classes represented by means of formal structures as grammars, automata, strings, etc. Based on measures of structural similarity Not only for classification but also description Often called Syntactic pattern recognition Neural networks classifier is represented as a network of cells modeling neurons of the human brain (connectionist approach). Knowledge is stored in the connectivity and strength of synaptic weights Trainable, requires minimum a priori knowledge, works for complex decision boundaries Statistical structure Analysis Combining Structure and statistical analysis Bayesian Network, MRF 등의 Probabilistic framework을 활용 33 Modified From Vojtěch Franc

Template Matching Template 34 Input scene

Deformable Template Matching: Snake
Example : Corpus Callosum Segmentation Prototype registration to the low-level segmented image Shape training set Prototype and variation learning Prototype warping 35

Classifier The task of classifier is to partition feature space into class-labeled decision regions Borders between decision regions  decision boundaries Determining decision region of a feature vector X 37

Representation of classifier
A classifier is typically represented as a set of discriminant functions … The classifier assigns a feature vector x to the i-the class if Class identifier …. Feature vector 38 Discriminant function

Classification of Classifiers by Form of Discriminant Function
A posteriori Probability P( yi | X) Bayesian Linear Function Linear Discrinant Analysis, Support Vector Machine Non-Linear Function Non-Linear Discrinant Analysis Artificial Neuron Artificial Neural Network 39

Bayesian Decision Making
Statistical approach the optimal classifier with Minimum error Assume that complete statistical model is known. Decision given the posterior probabilities X is an observation : if P(1 | x) > P(2 | x) decide state of nature = 1 if P(1 | x) < P(2 | x) decide state of nature = 2 40

Searching Decision Boundary
41

Bayesian Rule : P(x|1) P(1|x)
42

Limitations of Bayesian approach
Statistical model p(x|y) is mostly not known learning to estimate p(x|y) from training examples {(x1,y1),…,(x,y)} Usually p(x|y) is assumed to be a parametric form Ex: multivariate normal distribution Non-parametric estimation of p(x|y) requires a large set of training samples Non-Bayesian methods offers equally good (??) 43

Discriminative approaches
Assume that G(x) is a polynomial function Quadratic function Linear function – Linear Discriminant Analysis (LDA) Classifier design is to determination of separating hyperplane. 44

LDA Example height Task: jockey-hoopster recognition.
The set of hidden state is The feature space is weight Training examples … Linear classifier: 45

Artificial Neural Network
For a given structure, find best weight sets which minimizes sum of square error, J(w), from training examples {(x1,y1),…,(x,y)} 46

PR design cycle Data collection Feature choice Model choice and design
Probably the most time-intensive component of project How many examples are enough ? Feature choice Critical to the success of the PR project Require basic prior knowledge, engineering sense Model choice and design Statistical, neural and structural Parameter settings Training Given a feature set and ‘blank’ model, adapt the model to explain the training data Supervised, unsupervised, reinforcement learning Evaluation How well does the trained model do ? Overfitting vs. generalization 47

Learning for PR system Learning algorithm
Sensors and preprocessing Feature extraction Class assignment Pattern Classifier Learning algorithm Teacher Which Feature is good for classifying given classes ? Feature analysis Can we get required probabilities or boundaries ? Learning from training Data 48

Learning Change of contents and organization of system’s knowledge enabling to improve to its performance on task - Simon Data Mining Learning rules from large set of data Availability of large database allows application of machine learning to real problems 49

Learning Algorithm Categorization (Depending on Available Feedback)
supervised learning examples of correct input/output pair is available unsupervised learning No hint at all about the correct outputs. Clustering or consistent interpretation. reinforcement learning Receives no examples in advance, but rewards or punishments at the end Transduction : Semi-supervised learning Training with labeled training examples and unlabeled examples 50

Issues on Learning Algorithm
Prior Knowledge Prior knowledge can help in learning. Assumptions on parametric forms and range of values Incremental learning Update old knowledge whenever new example arrives Batch learning Apply learning algorithm to the entire set of examples Analytic approach : find the optimal parameter values by analysis Iterative adaptation : improve parameter values from initial guess 51

Learning Algorithms General Ideas
Tweak parameters so as to optimize performance criterion In the course of learning, the parameter vector traces a path that (hopefully) ends at the best parameter vector 52

Inductive Learning For given training examples (correct input-output pairs), Recover unknown underlying function from which the training data generated generalization ability for unseen data is required Learning can be seen as learning the representation of a function Forms of the Function Logical sentences / Polynomials / Set of weights (Neural Networks), … Given form of polynomial function (structure of neural network), adjust parameters (synaptic weights) to minimize error 53

Model Complexity B A Decision Boundary of Salmon and Sea bass
Which is better ? A or B A B 54

Model Complexity We can get perfect classification performance on the training data by choosing complex models. Complex models are tuned to the particular training samples, rather than on the characteristics of the true model. Issue of generalization 55

Generalization The main goal of pattern classification system is to generalize or suggest the class or action of objects yet unseen. Some complex decision boundaries are not good at generalization. Some simple boundaries are not good either. One must look for a tradeoff between performance and simplicity This is at the core of statistical pattern recognition 56

Generalization Strategy
How can we improve generalization performance ? More training examples (i.e., better pdf estimates). Simpler models (i.e., simpler classification boundaries) usually yield better performance. 57 Simplify the decision boundary!

Overfitting and underfitting
From Vojtěch Franc Overfitting and underfitting underfitting good fit overfitting Problem of generalization: a small emprical risk Remp does not imply small true expected risk R. 58

Curse of Dimensionality
Adding too many features can, paradoxically, lead to a worsening of performance. If each input feature is divided into M divisions, then the total number of cells is Md (d: # of features) which grows exponentially with d. With fixed training data, adding more features worsening the quality of generalization Because too many cell causes small number of training samples per cell Small training data  poor generalization Training data is always not sufficient !! 59

Curse of Dimensionality
Function의 수를 늘려면 error 감소 훈련데이더에 대한Classifier 성능 향상 제한된 양의 training Data로 훈련 시에 Feature 수를 늘 리면 일반화 능력 감소 적절한 일반화 능력 향상을 위하여 요구되는 훈련데이 터의 양은 feature dimension에 따라 급격히 증가 For a finite set of training data, Finding Optimal set of Features is a difficult problem 60

Optimal Number of Cells (example)
61

Implication of Curse of Dimensionality to PR system design
With finite training samples, be cautious of adding features Features of high Discrimination power first Feature analysis is mandatory 62

Empirical risk minimization principle
From Vojtěch Franc Empirical risk minimization principle The true expected risk R(q) is approximated by empirical risk with respect to a given labeled training set {(x1,y1),…,(x,y)}. The learning based on the empirical minimization principle is defined as Examples of algorithms: Perceptron, Back-propagation, etc. 63

Cross-Validation Validate learned model on different set to assess the generalization performance guarding against overfitting Partition Training set into two subsets Estimation subset for learning parameters validation subset Leave-one-out validation method N-1 for training, 1 for validation, takes turn Overcome Small training set 64

Unsupervised learning
Input: training examples {x1,…,x} without information about the hidden state. Clustering: goal is to find clusters of data sharing similar properties.

Example of unsupervised learning algorithm
Goal is to minimize k-Means clustering: … Classifier Learning algorithm … 66

Other Issues in Pattern Recognition

Difficulty of Class Modeling
68

인식에는 Context Processing이 필수
69

문자인식에서의 Context Processing
70

Local decision is not enough
Global Consistency Local decision is not enough 71

Combining Multiple Classifiers
Approaches for improving the performance of the group of experts Best single classifier vs Combining multiple classifiers Two heads (experts, classifiers) are better than one Classifier output is either Best (single) class Ranking Score as each class Method for generating multiple classifiers Co-related classifiers would not be a help Method for combining multiple classifiers Majority rules, borda count, decorelated combination, etc. A study on combining multiple classifiers has been investigated in pattern recognition during the last decade. For a classifier, it is well known that it makes a decision in the form of one of the three classification results; measurement score, ranking, and single choice. We consider only classifiers producing single choice level decision. For a combination type of classifiers according to the arrangements of them, only parallel combination type is chosen in this paper. 72

패턴 인식 성능의 평가 실제 (정) 인식률 = (p+q)/(p+q+r+s) 오인식률 = (r+s)/(p+q+r+s)
Miss detection = r/(p+r) False alarm = s/(p+s) Recall = p/p+r Precision = p/p+s 기각율 (refuse to make decision) 처리율 A not A 인식결과 p a s q not a r A case: 20% 기각했는데 결과에는 0.5% error B case : 10% 기각했는데 결과에는 1.0% error Which is better ? 73

패턴인식의 성능 향상 100% 를 향하여 성능 시간, 노력 74

Ch.9 Pattern Recognition

Similar presentations

Presentation on theme: "Ch.9 Pattern Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ch.9 Pattern Recognition

Similar presentations

Presentation on theme: "Ch.9 Pattern Recognition"— Presentation transcript:

Similar presentations

About project

Feedback