Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Institute of Information Science, Academia Sinica, Taiwan C.H. Chou, C.Y. Guo, and F. Chang Inter. Conf. Document Analysis and Recognition 2007

Introduction Recognizing fragmented characters, broken characters, in printed documents of poor printing quality. Recognizing fragmented characters, broken characters, in printed documents of poor printing quality. Complement to ordinary mending techniques. Complement to ordinary mending techniques. Using only intact characters as training samples. Using only intact characters as training samples. Multiple features apply to enhance recognition accuracy. Multiple features apply to enhance recognition accuracy. The resultant classifiers can classify both intact and fragmented characters with a high degree of accuracy. The resultant classifiers can classify both intact and fragmented characters with a high degree of accuracy.

Example from Chinese newspapers published between 1951 and 1961. (a) most severe (b) less severe (c) least severe

Feature Extraction Binary image, each pixel is represented by 1 (black) or 0 (white). Binary image, each pixel is represented by 1 (black) or 0 (white). LD (Linear Normalization + Density Feature) LD (Linear Normalization + Density Feature) Invariant to character fragmentation. Invariant to character fragmentation. LN → Reduction. LN → Reduction. Feature vector consists of 256 components, values range [0, 16]. Feature vector consists of 256 components, values range [0, 16]. ND (Nonlinear Sharp Normalization + Direction Feature) ND (Nonlinear Sharp Normalization + Direction Feature) Invariant to sharp deformation. Invariant to sharp deformation. NSN → Contour → 4 Direction map → Blurring → Reduction. NSN → Contour → 4 Direction map → Blurring → Reduction. Feature vector consists of 256 components, values range [0, 255]. Feature vector consists of 256 components, values range [0, 255].

Random Subspace Method The Random Subspace Method (RSM) consists in random selection of a certain number of subspaces from the original feature space, and train a classifier on each subspace Each set of training samples is derived from a set of feature vectors projected into a subspace. Each set of training samples is derived from a set of feature vectors projected into a subspace. Subspace Projection of ordinary feature vector to Sub- characters. Subspace Projection of ordinary feature vector to Sub- characters. Randomly select a small number of dimensions from a ordinary feature vector. Randomly select a small number of dimensions from a ordinary feature vector. The applied dimensions (w) of subspace: 32, 64, 128. The applied dimensions (w) of subspace: 32, 64, 128.

Random Subspace Method

Voting

Filter Model of Feature Selection RSM

Wrapper Model of Feature Selection

Architecture of the proposed method

An Example

Classification Methods

Experiment results

The accuracy of different classification methods Multiple classifiers outperform single classifiers. Hybrid feature always outperforms both LD and ND features. GCNNs performs higher accuracy than CARTs.

Computation time of the two classification methods.

The accuracy for three types of test documents LD outperforms ND for most severe and less severe data. LD outperforms ND for most severe and less severe data. ND is better than LD for least severe data. ND is better than LD for least severe data. Hybrid has the better accuracy than either LD or ND. Hybrid has the better accuracy than either LD or ND.

CARTs VS. GCNNs The accuracy rates of CARTs and GCNNs with incremental number of classifiers and different w of subspace The accuracy rates of CARTs and GCNNs with incremental number of classifiers and different w of subspace The more classifiers get the better accuracy. The more classifiers get the better accuracy. GCNNs require fewer classifiers to archive saturation accuracy than CARTs. GCNNs require fewer classifiers to archive saturation accuracy than CARTs.

CARTs VS. GCNNs

Conclusion Proposing a learning approach to deal with both intact and fragmented characters in archived newspapers The multiple predictors achieve much higher accuracy rates than single classifiers. The hybrid predictors, which use both types of feature, perform better than those using only a single feature. GCNN rule achieve higher accuracy, and require fewer classifiers, than those generated by the CART algorithm.

Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Similar presentations

Presentation on theme: "Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers.

Similar presentations

Presentation on theme: "Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers."— Presentation transcript:

Similar presentations

About project

Feedback