Presented by Tienwei Tsai July, 2005

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Clustering Categorical Data The Case of Quran Verses
Word Spotting DTW.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Bayesian Decision Theory
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Data Mining Techniques: Clustering
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
CS292 Computational Vision and Language Pattern Recognition and Classification.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Decision Tree Algorithm
Chapter 2: Pattern Recognition
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Distinguishing Photographic Images and Photorealistic Computer Graphics Using Visual Vocabulary on Local Image Edges Rong Zhang,Rand-Ding Wang, and Tian-Tsong.
Evaluating Hypotheses
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Handwritten Thai Character Recognition Using Fourier Descriptors and Robust C-Prototype Olarik Surinta Supot Nitsuwat.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Lecture 09 Clustering-based Learning
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Evaluating Performance for Data Mining Techniques
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang Tatung University.
Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Visual Inspection Product reliability is of maximum importance in most mass-production facilities.  100% inspection of all parts, subassemblies, and.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Texture scale and image segmentation using wavelet filters Stability of the features Through the study of stability of the eigenvectors and the eigenvalues.
BARCODE IDENTIFICATION BY USING WAVELET BASED ENERGY Soundararajan Ezekiel, Gary Greenwood, David Pazzaglia Computer Science Department Indiana University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Content-Based Image Retrieval Using Fuzzy Cognition Concepts Presented by Tienwei Tsai Department of Computer Science and Engineering Tatung University.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Clustering.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Content-Based Image Retrieval Using Block Discrete Cosine Transform Presented by Te-Wei Chiang Department of Information Networking Technology Chihlee.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
A Statistical Approach to Texture Classification Nicholas Chan Heather Dunlop Project Dec. 14, 2005.
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Data statistics and transformation revision Michael J. Watts
Singular Value Decomposition and its applications
Presented by Li-Jen Kao July, 2005
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
1-D DISCRETE COSINE TRANSFORM DCT
Text Categorization Berlin Chen 2003 Reference:
Coarse Classification via Discrete Cosine Transform and Quantization
Presentation transcript:

Presented by Tienwei Tsai July, 2005 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005 Mr. Chairman (Madam Chairman), Ladies and gentlemen, I am pleased to have you all here today. My name is XXX. I am here today to present a paper titled ” XXX”.

Outline Introduction Feature Extraction Statistical Coarse Classification Experimental Results Conclusion The outline of this presentation is shown as follows: First, I’ll introduce the character recognition problem, which is our application. Second, I’ll describe how to extract features via the Discrete Cosine Transform Third, I’ll introduce our progressive matching method Next, I’ll show the experimental results. Finally, I’ll summarize our approach.

1 Introduction Paper documents -> Computer codes OCR(Optical Character Recognition) To retrieve information from paper documents, we have to transform these documents into computer codes. To achieve this goal, OCR techniques can be applied. Traditional approaches include: neural networks, graph-matching methods and hidden Markov model. For the sake of time, I won’t go into details on these approaches.

Two subproblems in the design of classification systems Feature extraction Classification The classification process is usually divided into two stages: Coarse classification process Fine classification process Thus, we may consider the design of classification systems in terms of two subproblems: (1) feature extraction and (2) classification. To alleviate the burden of classification process, the process is usually divided into two stages: the coarse classification (also called preclassification) process and the fine classification process. To classify an unknown object, firstly, the coarse classification is employed to reduce the large set of candidate objects to a smaller one. Then, the fine classification is applied to identify the class of the object.

To achieve this goal, we need: The purpose of this paper is to design a general coarse classification scheme: low dependent on domain-specific knowledge. To achieve this goal, we need: reliable and general features general classification method The purpose of this paper is to design a general coarse classification scheme, which is low dependent on domain-specific knowledge. To achieve this goal, we need not only reliable and general features in the feature extraction stage, but also general classification method in the coarse classification stage.

2 Feature Extraction The characteristics of features used for coarse classification: Simple Reliable Low dimension We are motivated to apply Discrete Cosine Transform (DCT) to extract statistical features. 1. In general, features are combinations of the measurements that summarize them in a useful way. 2. In most applications, feature design has not found a general solution which is better than an artisanal solution. 3. Generally speaking, reliable and simple features will be extracted for the coarse classification. 4. Using low dimensional reliable features to perform coarse classification can eliminate a large number of improbable candidates in the early stage and hence reduce the burden of the subsequent fine classification process. 5. Therefore, we are motivated to apply DCT to extract statistical features.

Discrete Cosine Transform (DCT) It is widely used in compression A close relative of the discrete Fourier transform (DFT) Converting a signal into elementary frequency components Separating the image into parts of differing importance (with respect to the image's visual quality). Energy compaction property In this paper we apply a technique, called DCT, to do character recognition. DCT has the following characteristics: It is widely used in compression It is a close relative of the discrete Fourier transform (DFT). More specifically, it’s a technique for converting a signal into elementary frequency components. It helps separate the image into parts of differing importance (with respect to the image's visual quality). It has an excellent energy compaction property and requires only real operations in transformation process.

The DCT coefficients of the character image of “佛”. This figure shows the DCT coefficients of a character image (“佛”) of size 48×48. The number of coefficients is equal to the number of pixels in the original character image. From this figure, we can found that most of the signal energy appears in the upper left corner of the DCT coefficients.

Energy Compacting Property of DCT For most images, much of the signal energy lies at low frequencies. About 90% signal energy appears in the upper left corner of size 10×10 for the DCT coefficients of size 48×48. That is, about 90% signal energy appears in 4.34% DCT coefficients. From previous figure, we can observe the energy compacting property of DCT. For most images, much of the signal energy lies at low frequencies; these appear in the upper left corner of the DCT. The lower right values represent higher frequencies, and are often small - small enough to be neglected with little visible distortion. In our experiments, we found that about 90% signal energy appears in the upper left corner of size 10×10 for the DCT coefficients of size 48×48.

Discrete Cosine Transform (DCT) The DCT coefficients C(u, v) of an N×N image represented by x(i, j) can be defined as where The DCT can be defined as this equation In this equation, x denotes the original image before transformation. And C represents the DCT coefficients after the transformation.

3 Statistical Coarse Classification Two philosophies of classification Statistical the measurements that describe an object are treated only formally as statistical variables, neglecting their “meaning Structural regards objects as compositions of structural units, usually called primitives. There are two distinct philosophies of classification in general, known as “statistical” [2], [3] and “structural” [4]. In the statistical approach the measurements that describe an object are treated only formally as statistical variables, neglecting their “meaning”. The structural approach, on the other hand, regards objects as compositions of structural units, usually called primitives. Since our main goal is to develop a general coarse classification scheme for vision-oriented applications (such as OCR, face recognition, image retrieval, etc.), the statistical approach is more suitable than the structural approach.

The Classification Problem The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM). In statistical approach, each pattern is represented by a set of D features, viewed as a D-dimensional feature vector. The statistical classification system is operated in two modes: Training (learning) Classification (testing) Before going into the details of our approach, I would like to briefly introduce the problem. The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes In statistical approach, a pattern is represented by a set of D features viewed as a D-dimensional feature vector. Basically, the statistical classification system is operated in two modes: training (learning) and classification (testing). In the training mode, the feature extraction module finds the appropriate features for representing the input patterns and the classifier is trained to partition the feature space. 4. In the classification mode, the trained classifier assigns the input pattern to one of the pattern classes under consideration based on the measured features.

Techniques to speed up the classification process To speed up the classification process, two techniques can be applied: A. Clustering B. pruning In order to speed up the classification process, two techniques can be applied in advance of the time-consuming fine classification process. They are clustering and pruning.

A. Clustering Clustering is the process of grouping the data objects into clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. Two main categories of existing clustering algorithms: hierarchical methods: agglomerative (bottom-up) divisive (top-down). partitioning methods: k-means k-medoids 1. Hierarchical methods are either agglomerative (bottom-up) or divisive (top-down). Both suffer from the fact that once a step (merge or split) is done, it can never be undone. In contrast, given the number k of partitions to be found, a partitioning method tries to find the best k partitions of the n objects. 2. Most of the partitioning-based clustering algorithms adopt one of two popular heuristic methods: (1) the k-means algorithm, where each cluster is represented by the mean value of the objects in the cluster, and (2) the k-medoids algorithm, where each cluster is represented by one of the objects located near the center of the cluster.

Coarse Classification via Clustering In the training mode: the feature vectors of learning samples are clustered first based on a certain criterion. In the classification mode: the distances between a test sample and every cluster is calculated, and the clusters that are nearest to the test sample are chosen as candidate clusters. Then the classes within those candidate clusters are selected as the candidates of the test sample. By clustering, a coarse classification scheme can be developed as follows. In the training mode, the feature vectors of learning samples are clustered first based on a certain criterion. In the classification mode, the distances between a test sample and every cluster is calculated, and the clusters that are nearest to the test sample are chosen as candidate clusters. Then the classes within those candidate clusters are selected as the candidates of the test sample.

B. Pruning In the training mode: In the classification mode: The template vector of a class is generated from the training samples of the class. In the classification mode: When a test sample is being classified, all the distances between the feature vector of the test sample and every template vector are calculated. Then the templates whose distance to the test sample is beyond a predefined threshold will be pruned. Therefore, for the test sample to be classified, the size of the candidate list is reduced. 1. The concept underlying pruning is to eliminate the obvious unqualified candidates via a certain criterion. 2. In the training mode, the template vector of a class is generated from the training samples of the class. 3. In the classification mode, when a test sample is being classified, all the distances between the feature vector of the test sample and every template vector are calculated. 4. Then the templates whose distance to the test sample is beyond a predefined threshold will be pruned. 5. Therefore, for the test sample to be classified, the size of the candidate list is reduced.

Coarse classification scheme Our coarse classification is accomplished by four main modules: Pre-processing Feature extraction Clustering Pruning 1. To find the reduced set of candidates for an input pattern, we propose a coarse classification scheme which takes advantage of both clustering and pruning. 2. In our approach the coarse classification is accomplished by four main modules: (1) pre-processing, (2) feature extraction, (3) clustering and (4) pruning.

Then the average D features are obtained for each class. In the training mode, the training samples are first normalized to a certain size. Then the most significant D features of each training sample are extracted by DCT. Ex. The most important 4 features of a pattern are the 4 DCT coefficients with the lowest frequencies, namely C(0,0), C(0,1), C(1,0) and C(1,1). Then the average D features are obtained for each class. In the training mode, the training samples are first normalized to a certain size. Then the most significant D features of each training sample are extracted by DCT. For example, the most important 4 features of a pattern are the 4 DCT coefficients with the lowest frequencies, namely C(0,0), C(0,1), C(1,0) and C(1,1). After that the average D features are obtained for each class, assuming there is at least one training sample for each class.

k-means algorithm Chooses k templates arbitrarily as the initial centers of the k clusters. The remaining (M-k) templates are assigned to their nearest clusters one by one, according to a certain distance measure. Each cluster center is updated progressively from the average of the total patterns within the cluster and can be viewed as a virtual reference pattern. 1.The k-means algorithm first chooses k templates arbitrarily as the initial centers of the k clusters. 2. Then the remaining (M-k) templates are assigned to their nearest clusters one by one, according to a certain distance measure. 3. Each cluster center is updated progressively from the average of the total patterns within the cluster and can be viewed as a virtual reference pattern. 4. The distance between a pattern and a cluster is the distance between the pattern and the representative pattern (i.e., center) of the cluster.

Distance Measure Suppose Pi = [pi1, pi2,…, piD] represents the feature vector of pattern Pi. Then the dissimilarity (distance) between patterns Pi and Pj is defined as In our approach, the sum of squared difference is adopted to evaluate the dissimilarity (distance) between two patterns. Suppose Pi = [pi1, pi2,…, piD] represents the feature vector of pattern Pi. Then the dissimilarity (distance) between patterns Pi and Pj is defined as .

Then the features of the pattern are extracted by DCT. In the classification mode, for an input pattern to be recognized, the pattern is first normalized to a certain size. Then the features of the pattern are extracted by DCT. After that the pattern is matched against the center of each cluster to obtain the distance to each cluster. Finally, the clusters which do not satisfy some distance-based thresholds will be excluded from the set of candidate clusters, and the members within the remaining clusters will be served as the candidates for the input pattern. In the classification mode, for an input pattern to be recognized, the pattern is first normalized to a certain size. Then the features of the pattern are extracted by DCT. After that the pattern is matched against the center of each cluster to obtain the distance to each cluster. Finally, the clusters which do not satisfy some distance-based thresholds will be excluded from the set of candidate clusters, and the members within the remaining clusters will be served as the candidates for the input pattern.

3.2.1. Cluster pruning rules To observe the rate of distance per maximum distance, the relative distance dm(x, Ci) is defined as where d(x,Ci) is the distance between the test pattern x and each cluster Ci . denotes the maximum distance between x and any cluster. On classifying a test pattern, the distance between the test pattern x and each cluster Ci is obtained, namely d(x,Ci). To observe the rate of distance per maximum distance, the relative distance dm(x, Ci) is defined as … , where the notation denotes the maximum distance between x and any cluster.

We also define the relative distance da(x, Ci) as where denotes the average distance over all clusters. We also define the relative distance da(x, Ci) as where the notation denotes the average distance over all clusters.

Pruning Rules R1 : Pruning via absolute distance threshold qd the distance d(x,Ci) is compared with a pre-determined distance threshold qd. If d(x,Ci) is larger than qd, then cluster Ci is excluded from the candidate cluster set of x. R2 : Pruning via cluster rank threshold qr to filter out the farthest clusters by eliminating the clusters whose ranks are larger than qr. To filter out the clusters which are most dissimilar to the pattern, we devise the following four threshold-based pruning rules to eliminate unqualified clusters in the coarse classification process. R1 : Pruning via absolute distance threshold qd In this rule the distance d(x,Ci) is compared with a pre-determined distance threshold qd. If d(x,Ci) is larger than qd, then cluster Ci is excluded from the candidate cluster set of x. R2 : Pruning via cluster rank threshold qr The aim of this rule is to filter out the farthest clusters by eliminating the clusters whose ranks are larger than qr. R3 : Pruning via relative distance threshold qm If dm(x,Ci) is larger than qm, then cluster Ci is excluded from the candidate cluster set of x. R4 : Pruning via relative distance threshold qa If da(x,Ci) is larger than qa, then cluster Ci is excluded from the candidate cluster set of x.

R3 : Pruning via relative distance threshold qm If dm(x,Ci) is larger than qm, then cluster Ci is excluded from the candidate cluster set of x. R4 : Pruning via relative distance threshold qa If da(x,Ci) is larger than qa, then cluster Ci is excluded from the candidate cluster set of x.

Each threshold value is obtained from the statistics of the training samples in the training mode, based on a certain accuracy requirement. Rule R1 and R2 are obtained straightforwardly, which cannot measure the similarity between x and each cluster precisely. Rule R3 and R4 take advantage of the discriminating ability of the relative distances. For instance, ‘da(x,Ci)=0.5’ means the distance between x and cluster Ci is about half those distances between x and other clusters. Each threshold value is obtained from the statistics of the training samples in the training mode, based on a certain accuracy requirement. Rule R1 and R2 are obtained straightforwardly, which cannot measure the similarity between x and each cluster precisely. In contrast, rule R3 and R4 take advantage of the discriminating ability of the relative distances. For instance, ‘da(x,Ci)=0.5’ means the distance between x and cluster Ci is about half those distances between x and other clusters; in other words, cluster Ci is more similar to x than most of the other clusters.

4 Experimental Results 6000 samples (about 500 categories) are extracted from Kin-Guan (金剛) bible. Each character image was transformed into a 48×48 bitmap. 5000 of the 6000 samples are used for training and the others are used for testing. In our experiment The number of clusters used in k-means clustering is set to 10. The number of DCT coefficients extracted from each sample is set to 9, i.e., C(i, j), where i,j=0, 1, 2. Now, let’s look at the experimental results. In our experiments, there are… In our experiment the number of clusters used in k-means clustering is set to 10. The number of DCT coefficients extracted from each sample is set to 9, i.e., C(i, j), where i,j=0, 1, 2.

High reduction rate always results in low accuracy rate Table 1 shows the average reduction and accuracy rate of the test samples under the nearest r cluster(s). We can find that high reduction rate always results in low accuracy rate.

Rules R3 performs better than the other rules. Table 2 shows the average reduction rate and accuracy rate of the test samples under different rules. From the table we can find that rules R3, which corresponds to relative distance threshold qa, performs better than the other rules. Rules R3 performs better than the other rules.

6 Conclusions This paper presents a coarse classification scheme based on DCT and k-means clustering. The advantages of our approach: The DCT features are simple, reliable and appropriate for coarse classification. The proposed coarse classification scheme is a general approach to most of the vision-oriented applications. This paper presents a coarse classification scheme based on DCT and k-means clustering. The advantages of our approach include: Through the energy compacting property of DCT, the extracted features are simple, reliable and appropriate for coarse classification. The proposed coarse classification scheme is a general approach to most of the vision-oriented applications.

Future works Future works include the application of another well-known vision-oriented feature extraction method: wavelet transform. Since features of different types complement one another in classification performance, by using features of different types simultaneously, classification accuracy could be further improved. Future works include the application of another well-known vision-oriented feature extraction method: wavelet transform. Since features of different types complement one another in classification performance, by using features of different types simultaneously, classification accuracy could be further improved.