Presentation is loading. Please wait.

Presentation is loading. Please wait.

Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒.

Similar presentations


Presentation on theme: "Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒."— Presentation transcript:

1 Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

2 Outline  Introduction  HTD and VTD  Class of Character Objects  Similarity Measure of Documents  Experimental Results  Conclusions

3 Introduction  Retrieval of Imaged Documents  Process with OCR v.s. without OCR  Language dependence v.s. language independence

4 Procedure  Image Preprocessing  Feature extraction of character objects  Horizontal Traverse Density (HTD)  Vertical Traverse Density (VTD)  Clustering  To Identify classes of character objects  Document representation  Hash Table  N-Gram  To construct indexes for imaged document retrieval

5 Features: HTD and VTD

6 Class of Character Objects  Unsupervise Clustering with HTD and VTD  Distance measure of character objects

7 Distance Measure of Character Objects

8 Examples of Character Objects

9 Similarity Measure of Documents  N-Gram Algorithm  Cosine angle between two documents

10 Corpus  UW1 database (600 dpi)

11 Experimental Results  Corpus I  E01-E26

12 Experimental Results  Corpus II

13 Experimental Results

14

15

16

17 Conclusion and Future Work  A new method for image document retrieval without OCR  Retrieval of language independence  Improvement of robustness for different fonts and noisy documents


Download ppt "Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒."

Similar presentations


Ads by Google