Presentation is loading. Please wait.

Presentation is loading. Please wait.

IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.

Similar presentations


Presentation on theme: "IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar."— Presentation transcript:

1 IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar

2 IIIT Hyderabad Motivation Large number of printed books are digitized

3 IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Digital Library Database

4 IIIT Hyderabad Motivation Large number of printed books are digitized Digital libraries like Universal Digital library (UDL), Digital library of India (DLI) and Google Books etc. Need to design efficient and effective methodology for content level access Digital Library Database

5 IIIT Hyderabad Process Overview Index Database Documents ProcessingInput Query Matching Retrieved Documents Scanning Matching can be done by two levels : “Text” and “Image”

6 IIIT Hyderabad Matching Approaches Recognition Based Approach (Text Level Matching) Optical Character Recognition (OCR) Recognition Free Approach (Image Level Matching) Word Spotting

7 IIIT Hyderabad Recognition Based Approach Optical Character Recognition (OCR) Binarization of Document Segmentation using connected components Line level Word level Character level Character recognition using different features like patch, profile etc Classification using ANN or SVM

8 IIIT Hyderabad Limitations of Recognition Based Approach Cuts

9 IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges

10 IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script

11 IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting

12 IIIT Hyderabad Limitations of Recognition Based Approach Cuts Merges Variation in Script Variation in Font and Typesetting Underline and Over Written

13 IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features

14 IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc

15 IIIT Hyderabad Recognition Free Approach Word Spotting Representation of word image using global (profile) features Matching features using different distance measures like L1, L2 etc Comparison of different size word images using Dynamic time warping (DTW)

16 IIIT Hyderabad Why Recognition Free Approach ? Robust OCRs are unavailable for many non-Latin languages These languages have rich heritage and there is a need for content level search Word Spotting based methods are too slow for real time system Most of the existing retrieval methods are memory intensive Scalability is an immediate challenge

17 IIIT Hyderabad Word Image Retrieval using Bag of Visual Words

18 IIIT Hyderabad Bag of Visual Words (BoVW) Bag of Words (BoW) representation is the most popular representation for text retrieval BoW based efficient systems like Lucene are publically available Bag of Visual Words (BoVW) performs excellently for image and video retrieval BoVW based system is flexible, powerful and scalable to Billions of images

19 IIIT Hyderabad BoVW Representation Word Images are represented using Histogram of Visual Words

20 IIIT Hyderabad BoVW Representation Code Book generation Subset of Images is used Clustering is done using Hierarchical K-Means (HKM) HKM is faster than K-Means both in building tree and finding nearest neighbours

21 IIIT Hyderabad BoVW based Representation

22 IIIT Hyderabad BoVW based Representation

23 IIIT Hyderabad Histogram of Visual Words BoVW based Representation

24 IIIT Hyderabad BoVW based Representation Cuts

25 IIIT Hyderabad Histogram of Visual Words BoVW based Representation Cuts

26 IIIT Hyderabad BoVW based Representation Merges

27 IIIT Hyderabad Histogram of Visual Words BoVW based Representation Merges

28 IIIT Hyderabad Proposed Architecture

29 IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation

30 IIIT Hyderabad Fixed size representation Advantages of BoVW based Representation Clean

31 IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation

32 IIIT Hyderabad Fixed size representation Robust against degradation Advantages of BoVW based Representation Cuts Merge Clean

33 IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of images Advantage of BoVW based Representation

34 IIIT Hyderabad Fixed size representation Robust against degradation Scalable to Billions of Images Language independent Advantages of BoVW based Representation

35 IIIT Hyderabad Lost Geometry Spatial Verification

36 IIIT Hyderabad Lost Geometry Spatial Verification Clean

37 IIIT Hyderabad Lost Geometry Spatial Verification Clean

38 IIIT Hyderabad Lost Geometry Spatial Verification Clean

39 IIIT Hyderabad Lost Geometry Spatial Verification

40 IIIT Hyderabad Lost Geometry Spatial Verification

41 IIIT Hyderabad Lost Geometry Spatial Verification

42 IIIT Hyderabad Re-ranking SIFT based re-ranking Higher the Total Score, better the match

43 IIIT Hyderabad Experimentations Books Used in Experimentations Language#Books#Pages#Words Hindi4427112677 Malayalam6610108767 Telugu5742131156 Bangla3363124584 Hindi3239921008138

44 IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#QuerymAP after Re-ranking mAP after Spatial Verification Hindi112677 1380.68080.78200.7865 Malayalam108767 1010.69620.79910.8188 Telugu131156 1310.64830.73280.7495 Bangla124584 1250.78060.87660.8947 Hindi1008138 1380.58950.70220.7062

45 IIIT Hyderabad Quantitative Results Performance Statistics Language#Images#Query Prec@10 Prec@10 after Re-ranking Prec@10 after Spatial Verification Hindi112677 138 0.84370.87190.8770 Malayalam108767 1010.76680.83280.8581 Telugu131156 1310.85070.86680.883 Bangla124584 1250.84980.90220.9182 Hindi1008138 138 0.80590.85090.8543

46 IIIT Hyderabad Quantitative Results mAP Vs Query Length

47 IIIT Hyderabad Quantitative Results mAP Vs Query Length More the # characters, better the results

48 IIIT Hyderabad Quantitative Results Retrieval Time and Index Size #ImagesRetrieval TimeIndex Size 25K50ms28 MB 100K209ms130 MB 0.5M411ms550 MB 1M700ms1.2 GB

49 IIIT Hyderabad Qualitative Results QueryRetrieved Results HI

50 IIIT Hyderabad Qualitative Results QueryRetrieved Results

51 IIIT Hyderabad Qualitative Results QueryRetrieved Results

52 IIIT Hyderabad Qualitative Results QueryRetrieved Results

53 IIIT Hyderabad Qualitative Results Sample Output for Noisy Images where Commercial OCR fails QueryRetrieved Results

54 IIIT Hyderabad Enhancement over Bag of Visual Words based Word Image Retrieval

55 IIIT Hyderabad Query Expansion Observation: Top ranked results are correct Top-k results are used to form new query Improves the precision of retrieved list Modified average query expansion ─Instead of equal weight to every Top-k results, rank based weight (1/2 rank ) is given Improves mAP and Prec@10 by 2%

56 IIIT Hyderabad Query Expansion Query Image Index Histogram Querying Refined Histogram Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Query Image Rank 1 Rank 2 Rank 3 Rank 4Rank 5 Rank 6 Query Histogram

57 IIIT Hyderabad Query Expansion Query Image Index Expanded Query Histogram Querying Previous Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6 Modified Results Rank 1 Rank 2Rank 3Rank 4Rank 5Rank 6

58 IIIT Hyderabad Text Query Support Originally formulated in a “query by example” setting but users would prefer textual interface for document image collection We propose a novel and simple framework for text query support Used a small subset of data with ground truth covering all possible characters in a particular language Visual words are learnt specific to each character and averaged across its different variations Given a textual query, we synthesize its BoVW histogram Text query results are comparable to word image results

59 IIIT Hyderabad Text Query Support Query by example setting Input Query ImageHistogram

60 IIIT Hyderabad Text Query Support Query by example setting Text Queries Support Input Text Query Text Query Histogram

61 IIIT Hyderabad Qualitative Results Sample output for queries using different techniques

62 IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

63 IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

64 IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment

65 IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment (a) Input Descriptor

66 IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ

67 IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty

68 IIIT Hyderabad Vector Quantization In Vector Quantization (VQ), each feature vector is mapped to single visual word (VW), i.e, Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible

69 IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Mapping single VW from out of 2 or more possible

70 IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility

71 IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary

72 IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Mapping a visual word without a suitable candidate in the vocabulary.

73 IIIT Hyderabad Vector Quantization In Vector Quantization(VQ), each feature vector is mapped to single visual word(VW) i.e Hard Assignment Problems with VQ Visual word uncertainty Visual word plausibility Solution: Soft Assignment Map each feature vector to 2 or more possible VW

74 IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error

75 IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Input Descriptor

76 IIIT Hyderabad Soft Assignment Map each feature vector to 2 or more possible VW Approached of Soft Assignment Distance based Equal weight Based on Distance in Feature Space Gaussian Distance Does not minimize reconstruction error Through learning optimal reconstruction

77 IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector

78 IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector

79 IIIT Hyderabad Locality-constrained Linear Coding (LLC) Similar patch should have similar code Locality of Visual Word is used to describe feature vector LLC Coding Process Find K – Nearest Neighbors of x i denoted as B Reconstruct x i using B Replace input x i with non-zero code obtained from previous step Input Descriptor

80 IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on x-axis Larger the size, better the match 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 2012 2.Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012 V1V1 V2V2 V6V6 V4V4 V4V4 V8V8 V9V9 x y 0.5 0 1 11.5 2 2.5 3

81 IIIT Hyderabad Re-ranking SIFT based re-ranking 1 Longest common sub-sequence (LCS) based re-ranking 2 Size of LCS of visual words projected on X-axis Larger the size, better the match Linear Combination 2 Final Score = λ * Index_Score + (1-λ) * Re-ranking _Score where λ weighting parameter 1.Ravi Shekhar, C. V. Jawahar: Word Image Retrieval Using Bag of Visual Words. DAS 2012 2.Ismet Zeki Yalniz, R. Manmatha: An Efficient Framework for Searching Text in Noisy Document Images, DAS 2012

82 IIIT Hyderabad Dataset Used Books Used For The Experiments Book#Pages#Words Telugu- 17161204121 Telugu- 171810021345 English-1601363113008

83 IIIT Hyderabad Quantitative Results LLC Based Statistics (mAP) BookBoVW BoVW + SIFT Re-ranking BoVW + LCS Re-ranking LLC LLC + LCS Re-raking Telugu-17160.81730.86450.90360.910.95 Telugu-17180.78340.88610.9180.920.96 English-16010.80150.85310.920.87650.9451

84 IIIT Hyderabad Quantitative Results Text Query Based Statistics BookMethodmAP Telugu- 1716Text Query0.8413 Telugu- 1718Text Query0.90 English-1601Text Query0.87

85 IIIT Hyderabad Patch Based Word Image Retrieval

86 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch

87 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features

88 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature

89 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile

90 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Measures ink distribution of word image

91 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Measures internal shape of image

92 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Measures internal shape of image

93 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile

94 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Distance from Upper Boundary of word image

95 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Distance from Upper Boundary of word image

96 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile

97 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile Distance from Lower Boundary of word image

98 IIIT Hyderabad Patch Based Word Image Retrieval Designed feature based on patch Representation of Patch using Profile Features Profile Feature Projection Profile Ink Transition Upper Word Profile Lower Word Profile Distance from Lower Boundary of word image

99 IIIT Hyderabad Overview of Feature Calculation... Calculate 4 profile features Concatenate 4 profile features Projection profile Lower word profile Ink Transition Upper word profile Input word image Descriptor

100 IIIT Hyderabad Fast Pre-Processing......... V1V1 V2V2 V3V3...... VkVk Input Patch Corresponding Patch Vector Lookup Table Is patch Vector Present ? Find corresponding Visual Word Retrieve corresponding Visual Word Yes No Update

101 IIIT Hyderabad Dataset Used Book#Pages#Words Telugu- 171810021345 English-1601363113008

102 IIIT Hyderabad Quantitative Results Baseline Statistics BookMethodmAP Telugu- 1718SIFT0.7834 Telugu- 1718Patch0.53 Telugu- 1718Patch Feature0.6183 Telugu- 1718Patch Feature with Overlap0.7214

103 IIIT Hyderabad Quantitative Results Enhancement on Baseline Statistics Enhancement MethodSIFTPatch Feature Query Expansion0.79200.75 Spatial Verification0.85710.83 LCS Re-ranking0.87980.8481

104 IIIT Hyderabad Quantitative Results Results with Split Features BookSIFTPatch Feature Telugu -17180.940.954 English – 16010.930.90

105 IIIT Hyderabad Qualitative Results

106 IIIT Hyderabad Contributions Language Independent System Tested on 4 different languages Scalable to huge dataset Tested on 1 Millions of word Images Handles Noisy document images Demonstrated performance on dataset where commercial OCR fails. Enhancement on baseline results Query Expansion Text Query Support Document specific Sparse coding Document Specific descriptor is proposed

107 IIIT Hyderabad Future Work Test on different font dataset Similar method for handwritten, camera based datasets Learning character level visual word automatically using annotated data Multi Keyword support Combine both recognition based and recognition free methods Improve patch based descriptor.

108 IIIT Hyderabad Related Publications Ravi Shekhar and C. V. Jawahar, “Word Image Retrieval using Bag of Visual Words”, In Proceedings of 10 th IAPR International Workshop on Document Analysis Systems (DAS), 2012. Praveen Krishnan, Ravi Shekhar and C. V. Jawahar, “Content Level Access to Digital Library of India Pages”, In Proceedings of 8 th Indian Conference on Vision, Graphics and Image Processing (ICVGIP), 2012. Ravi Shekhar and C. V. Jawahar, “Document Specific Sparse Coding for Word Retrieval”, In Proceedings of 12 th International Conference on Document Analysis and Recognition (ICDAR), 2013.

109 IIIT Hyderabad Thanks !!!


Download ppt "IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar."

Similar presentations


Ads by Google