Design Compact Recognizers of Handwritten Chinese Characters Using Precision Constrained Gaussian Models, Minimum Classification Error Training and Parameter.

Slides:



Advertisements
Similar presentations
1 Discriminative Learning for Hidden Markov Models Li Deng Microsoft Research EE 516; UW Spring 2009.
Advertisements

Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Aggregating local image descriptors into compact codes
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
Principal Component Analysis
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.
Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
An Introduction to Support Vector Machines Martin Law.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Summarized by Soo-Jin Kim
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang Tatung University.
Presented by Tienwei Tsai July, 2005
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Yang, Luyu.  Postal service for sorting mails by the postal code written on the envelop  Bank system for processing checks by reading the amount of.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
A New Fingertip Detection and Tracking Algorithm and Its Application on Writing-in-the-air System The th International Congress on Image and Signal.
An Introduction to Support Vector Machines (M. Law)
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
CSE 185 Introduction to Computer Vision Face Recognition.
1 A Compact Feature Representation and Image Indexing in Content- Based Image Retrieval A presentation by Gita Das PhD Candidate 29 Nov 2005 Supervisor:
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors :
1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Dimensionality reduction
2D-LDA: A statistical linear discriminant analysis for image matrix
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Unsupervised Learning II Feature Extraction
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Quadratic Perceptron Learning with Applications
University of Ioannina
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Principal Component Analysis (PCA)
Classification Discriminant Analysis
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Boosting Nearest-Neighbor Classifier for Character Recognition
3. Applications to Speaker Verification
A Hybrid PCA-LDA Model for Dimension Reduction Nan Zhao1, Washington Mio2 and Xiuwen Liu1 1Department of Computer Science, 2Department of Mathematics Florida.
Classification Discriminant Analysis
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
EE513 Audio Signals and Systems
Feature space tansformation methods
Generally Discriminant Analysis
CS4670: Intro to Computer Vision
Presentation transcript:

Design Compact Recognizers of Handwritten Chinese Characters Using Precision Constrained Gaussian Models, Minimum Classification Error Training and Parameter Compression Yongqiang Wang 1,2, Qiang Huo 1 1 Microsoft Research Asia, Beijing, China 2 The University of Hong Kong, Hong Kong, China ICDAR-2009, Barcelona, Spain, July 26-29, 2009

Outline Prior art Our approach Experiments and results Conclusions

Prior art to build a compact online CJK handwritten character recognizer Feature vector: – 8-directional features (Bai & Huo, 2005) Z.-L. Bai and Q. Huo, “A study on the use of 8-directional features for online handwritten Chinese character recognition,” Proc. ICDAR-2005, pp – LDA for dimension reduction Classifier: – Modified Quadratic Discriminant Function (MQDF, Kimura et al., 1987) F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, “Modified quadratic discriminant functions and the application to Chinese character recognition,” IEEE Trans. on PAMI, vol. 9, pp , – Minimum Classification Error (MCE) training for MQDF-based classifier (Liu et al., 2004) C.-L. Liu, H. Sako, and H. Fujisawa, “Discriminative learning quadratic discriminant function for handwriting recognition,” IEEE Trans. on Neural Networks, vol. 15, no. 2, pp , – Parameter compression using Split-VQ (Long & Jin, 2008) T. Long and L.-W. Jin, “Building compact MQDF classifier for large character set recognition by subspace distribution sharing,” Pattern Recognition, vol. 41, no. 9, pp , 2008.

What’s wrong: bad accuracy-footprint tradeoff

Our New Approach Precision Constrained Gaussian Model (PCGM) for handwritten CJK character recognition: – Borrowed from ASR area – Demonstrated previously that ML-trained PCGM can achieve much better accuracy-footprint tradeoff than the conventional ML-trained MQDF (Wang & Huo, 2009) Y.-Q. Wang and Q. Huo, “Modeling inverse covariance matrices by expansion of tied basis matrices for online handwritten Chinese character recognition, ” Pattern Recognition, What’s new in this study? – MCE training of PCGM-based classifier using Quickprop – Compression of PCGM parameters using scalar quantization and split VQ – Embed parameter compression into MCE training – Comparative study with their MQDF counterparts

For a pattern classification problem with M “character classes”: – Each pattern class C j is modeled by a Gaussian PDF, where the inverse covariance matrix (a.k.a. precision matrix) is constrained to lie in a subspace: PCGM-based discriminant function: Classifier parameters to be stored: – Character-dependent (transformed) mean vectors – Character-independent basis matrices – Character-dependent basis coefficients Precision Constrained Gaussian Model (PCGM)

MCE Training of PCGM-based Classifier Discriminant function of PCGM-based classifier For the i-th training feature vector, x ji, belonging to j-th class, a misclassification measure is defined as Per-sample loss function: Batch-mode Quickprop algorithm is used to minimize the empirical loss function defined on the training data set: – Take advantage of “cluster computing” – In practice, only mean vectors are updated

Basis coefficients – Compressed using split VQ as well Parameter Compression (1) Compress mean vectors using split VQ – Split D-dimensional (transformed) mean vectors into Q streams – Sub-vectors in different streams are quantized by VQ with different codebooks

Parameter Compression (2) Compress basis matrices – Basis matrices are symmetric, therefore only need to store the diagonal and upper- diagonal elements – Diagonal elements are not compressed – Upper-diagonal elements are compressed using 8-bit scalar quantization

Combining MCE Training and Parameter Compression Start from ML-trained PCGM models – Compress basis coefficients – Compress basis matrices Do MCE training to fine-tune the mean vectors using the above compressed parameters Compress the (transformed) mean vectors A similar strategy is also applied to MCE training and model compression for MQDF-based classifier!

Experimental Setup (1) Task: – Recognition of 2965 “JIS level-1” handwritten Kanji characters Data sets: – Training set: 704,650 samples from Nakayosi database – Development set: 229,398 samples from Nakayosi/Kuchibue databases – Testing set: 506,848 samples from Kuchibue database Feature vector: – directional raw features extracted from each handwriting sample – Use LDA to reduce dimension to 128

Experimental Setup (2) ML training – MQDF 20 eigenvectors are retained for each character class, i.e. MQDF(20) – PCGM # of basis matrices : 32, 64, 128, i.e., PCGM(32/64/128) MCE training – Run 20 epochs in Quickprop algorithm to update mean vectors – Both MQDF and PCGM Parameter compression – Compress MQDF/PCGM-based recognizers to less than 2MB footprint

Experimental Results Given the same footprint, PCGM achieves higher recognition accuracy than MQDF; PCGM-based recognizer can be made very compact, yet achieves high recognition accuracy; PCGM-based recognizers are slower than MQDF-based recognizers: – Evaluated on a PC with 3 GHz Pentium-4 CPU – Re-scoring on a short-list of 50 candidates ClassifiersAccuracy (%)Footprint (MB)Latency (ms) PCGM(32) PCGM(64) PCGM(128) MQDF(20)

Conclusion PCGM-based approach provides a good solution to designing a compact CJK handwritten character recognizer Ongoing and future works: – Better discriminative training for both MQDF/PCGM to boost the recognition accuracy further – Try it out for Japanese and Korean

Backup slides

Experimental Results Comparison of ML-trained PCGM and MQDF based classifiers without parameter compression ClassifiersAccuracy (%)Footprint (MB) PCGM(32) PCGM(64) PCGM(128) MQDF(20)

Experimental Results ClassifiersAccuracy (%)Footprint (MB) PCGM(32) PCGM(64) PCGM(128) MQDF(20) ClassifiersAccuracy (%)Footprint (MB) PCGM(32) PCGM(64) PCGM(128) MQDF(20) Before Precision Matrices Compression After Precision Matrices Compression

Experimental Results ClassifiersAccuracy (%)Footprint (MB) PCGM(32) PCGM(64) PCGM(128) MQDF(20) Before MCE training After MCE training ClassifiersAccuracy (%)Footprint (MB) PCGM(32) PCGM(64) PCGM(128) MQDF(20)

Experimental Results ClassifiersAccuracy (%)Footprint (MB) PCGM(32) PCGM(64) PCGM(128) MQDF(20) Before Mean Compression After Mean Compression ClassifiersAccuracy (%)Footprint (MB) PCGM(32) PCGM(64) PCGM(128) MQDF(20)