Near Duplicate Image Detection: min-Hash and tf-idf weighting

Slides:

Advertisements

Similar presentations

Image Retrieval with Geometry-Preserving Visual Phrases

Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

Aggregating local image descriptors into compact codes

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Content-Based Image Retrieval

TP14 - Indexing local features

Query Specific Fusion for Image Retrieval

Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.

Image alignment Image from

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.

Large-scale matching CSE P 576 Larry Zitnick

Bag of Features Approach: recent work, using geometric information.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.

Object retrieval with large vocabularies and fast spatial matching

Lecture 28: Bag-of-words models

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Bag-of-features models

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Near-duplicates detection Comparison of the two algorithms seen in class Romain Colle.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Keypoint-based Recognition and Object Search

10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.

Object Recognition and Augmented Reality

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

CS 766: Computer Vision Computer Sciences Department, University of Wisconsin-Madison Indexing and Retrieval James Hill, Ozcan Ilikhan, Mark Lenz {jshill4,

Indexing Techniques Mei-Chen Yeh.

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

CSE 473/573 Computer Vision and Image Processing (CVIP)

Special Topic on Image Retrieval

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague.

10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

18 th August 2006 International Conference on Pattern Recognition 2006 Epipolar Geometry from Two Correspondences Michal Perďoch, Jiří Matas, Ondřej Chum.

CVPR 2006 New York City Spatial Random Partition for Common Visual Pattern Discovery Junsong Yuan and Ying Wu EECS Dept. Northwestern Univ.

P ROBING THE L OCAL -F EATURE S PACE OF I NTEREST P OINTS Wei-Ting Lee, Hwann-Tzong Chen Department of Computer Science National Tsing Hua University,

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

Using Cross-Media Correlation for Scene Detection in Travel Videos.

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.

CS654: Digital Image Analysis

776 Computer Vision Jan-Michael Frahm Spring 2012.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

IIIT HYDERABAD Techniques for Organization and Visualization of Community Photo Collections Kumar Srijan Faculty Advisor : Dr. C.V. Jawahar.

The topic discovery models

CS 2770: Computer Vision Feature Matching and Indexing

Nonparametric Semantic Segmentation

Video Google: Text Retrieval Approach to Object Matching in Videos

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

The topic discovery models

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Indexing and Retrieval

The topic discovery models

Video Google: Text Retrieval Approach to Object Matching in Videos

Presentation transcript:

Near Duplicate Image Detection: min-Hash and tf-idf weighting Ondřej Chum Center for Machine Perception Czech Technical University in Prague co-authors: James Philbin and Andrew Zisserman

Outline Near duplicate detection and large databases (find all groups of near duplicate images in a database) min-Hash review Novel similarity measures Results on TrecVid 2006 Results on the University of Kentucky database (Nister & Stewenius) Beyond near duplicates

Scalable Near Duplicate Image Detection Images perceptually (almost) identical but not identical (noise, compression level, small motion, small occlusion) Similar images of the same object / scene Large databases Fast – linear in the number of duplicates Store small constant amount of data per image

Image Representation Feature detector SIFT descriptor [Lowe’04] Vector quantization 1 ... 4 2 ... … Bag of words Visual vocabulary Set of words 4

P{m(A1) == m(A2)} = sim (A1 , A2) min-Hash Min-Hash is a locality sensitive hashing (LSH) function m that selects elements m(A1) from set A1 and m(A2) from set A2 so that P{m(A1) == m(A2)} = sim (A1 , A2) Image similarity measured as a set overlap (using min-Hash algorithm) Spatially related images share visual words A1 ∩ A2 A1 A2 A1 U A2 5

min-Hash A B C D E F A B C B C D A E F C F A B C A B E Vocabulary Set A Set B Set C A B C D E F A B C B C D A E F Ordering min-Hash f1: 1 4 5 2 6 3 0.07 0.75 0.59 0.22 0.90 0.41 C F ~ Un (0,1) f2: 4 5 3 6 2 1 0.63 0.88 0.55 0.94 0.31 0.19 A B ~ Un (0,1) f3: 5 4 6 1 2 3 C A f4: 2 1 6 5 3 4 B E overlap (A,B) = 3/4 (1/2) overlap (A,C) = 1/4 (1/5) overlap (B,C) = 0 (0)

... } } } } } } min-Hash Retrieval sim(A, B)s 1 – (1 - sim(A, B)s)k A Sketch collision } } A A sketch s-tuple of min-Hashes s – size of the sketch k – number of hash tables Probability of sketch collision Q C ... } } V V sim(A, B)s E E } } J Z Probability of retrieval (at least one sketch collision) Y Q 1 – (1 - sim(A, B)s)k k hash tables

Probability of Retrieving an Image Pair Images of the same object Near duplicate images s = 3, k = 512 probability of retrieval Unrelated images similarity (set overlap) 8

More Complex Similarity Measures

Document / Image / Object Retrieval Term Frequency – Inverse Document Frequency (tf-idf) weighting scheme [1] Baeza-Yates, Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999. [2] Sivic, Zisserman. Video Google: A text retrieval approach to object matching in videos. ICCV’03. [3] Nister, Stewenius. Scalable recognition with a vocabulary tree. CVPR’06. [4] Philbin, Chum, Isard, Sivic, Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR’07. idfW = log # docs containing XW # documents Words common to many documents are less informative 4 2 ... t Frequency of the words is recorded (good for repeated structures, textures, etc…)

More Complex Similarity Measures Set of words representation Different importance of visual words importance dw of word Xw Bag of words representation (frequency is recorded) Histogram intersection similarity measure Different importance of visual words importance dw of word Xw

Word Weighting for min-Hash For hash function (set overlap similarity) all words Xw have the same chance to be a min-Hash For hash function the probability of Xw being a min-Hash is proportional to dw A Q V E R J C Z A U B: Y dA dC dE dV dJ dQ dY dZ dR

Histogram Intersection Using min-Hash Idea: represent a histogram as a set, use min-Hash set machinery Visual words: A C D B Bag of words A / set A’ Bag of words B / set B’ A1 C1 B1 A2 C2 C3 C1 D1 B1 B2 C2 C3 tA = (2,1,3,0) tB = (0,2,3,1) min-Hash vocabulary: A1 C1 D1 B1 A2 B2 C2 C3 A’ U B’: A1 C1 D1 B1 A2 B2 C2 C3 Set overlap of A’ of B’ is a histogram intersection of A and B

Results Quality of the retrieval Speed – the number of documents considered as near-duplicates

TRECVid Challange 165 hours of news footage, different channels, different countries 146,588 key-frames, 352×240 pixels No ground truth on near duplicates

Min-Hash on TrecVid DoG features vocabulary of 64,635 visual words 192 min-Hashes, 3 min-Hashes per a sketch, 64 sketches similarity threshold 35% Examples of images with 24 – 45 near duplicates # common results / set overlap only / weighted set overlap only Quality of the retrieval appears to be similar

Comparison of Similarity Measures Images only sharing uninformative visual words do not generate sketch collisions for the proposed similarity measures Number of sketch collisions Set overlap Weighted set overlap Weighted histogram Image pair similarity

University of Kentucky Dataset 10,200 images in groups of four Querying by each image in turn Average number of correct retrievals in top 4 is measured

Evaluation Vocabulary sizes 30k and 100k Number of min-Hashes 512, 640, 768, and 896 2 min-Hashes per sketch Number of sketches 0.5, 1, 2, and 3 times the number of min-Hashes Score on average: weighted histogram intersection 4.6 % better than weighted set overlap weighted set overlap 1.5 % better than set overlap Number of considered documents on average: weighted histogram intersection 1.7 times less than weighted set overlap weighted set overlap 1.5 times less than set overlap Absolute numbers for weighted histogram intersection: min-Hashes sketches score 30k score 100k docs 30k docs 100k Usable 640 2.928 2.889 488.2 117.6 Best 896 2688 3.090 3.166 1790.8 452.8 Retrieval tf-idf flat scoring [Nister & Stewenius] score 3.16 Number of considered documents (non-zero tf-idf) 10,089.9 (30k) and 9,659.4 (100k)

Set overlap, weighted set overlap, weighted histogram intersection Query Examples Query image: Results Set overlap, weighted set overlap, weighted histogram intersection

Beyond Near Duplicate Detection

Discovery of Spatially Related Images Find and match ALL groups (clusters) of spatially related images in a large database, using only visual information, i.e. not using (flicker) tags, EXIF info, GPS, …. Chum, Matas: Large Scale Discovery of Spatially Related Images, TR May 2008 available at http://cmp.felk.cvut.cz/~chum/Publ.htm 22

Probability of Retrieving an Image Pair Images of the same object Near duplicate images probability of retrieval similarity (set overlap) 23

Image Clusters as Connected Components Randomized clustering method: Seed Generation – hashing (fast, low recall) characterize images by pseudo-random numbers stored in a hash table time complexity equal to the sum of second moments of Poisson random variable -- linear for database size D ≈ 240 2. Seed Growing – retrieval (thorough – high recall) complete the clusters only for cluster members c << D, complexity O(cD) 24

Clustering of 100k Images Images downloaded from FLICKR Includes 11 Oxford Landmarks with manually labelled ground truth All Soul's Hertford Ashmolean Keble Balliol Magdalen Bodleian Pitt Rivers Christ Church Radcliffe Camera Cornmarket 25

Results on 100k Images Number of images: 104,844 Component Recall (CR) Number of images: 104,844 Timing: 17 min + 16 min = 0.019 sec / image Good OK Unrelated CR All Souls 24 54 97.44 Ashmolean 12 13 68.00 Balliol 5 7 33.33 Bodleian 11 1 95.83 Christ Church 51 27 89.74 Cornmarket 4 66.67 Hertford 35 19 96.30 Keble 6 85.71 Magdalen 41 5.56 Pitt Rivers 3 100 Radcliffe Camera 105 116 98.64 Chum, Matas TR, May 2008 26

Philbin, Sivic, Zisserman Results on 100k Images Component Recall (CR) Number of images: 104,844 Timing: 17 min + 16 min = 0.019 sec / image 5,062 ? Good OK Unrelated CR All Souls 24 54 97.44 Ashmolean 12 13 68.00 Balliol 5 7 33.33 Bodleian 11 1 95.83 Christ Church 51 27 89.74 Cornmarket 4 66.67 Hertford 35 19 96.30 Keble 6 85.71 Magdalen 41 5.56 Pitt Rivers 3 100 Radcliffe Camera 105 116 98.64 CR 96 60 33 71 67 65 57 20 100 98 Chum, Matas TR, May 2008 Philbin, Sivic, Zisserman BMVC 2008 27

Conclusions New similarity measures were derived for the min-Hash framework Weighted set overlap Histogram intersection Weighted histogram intersection Experiments show that the similarity measures are superior to the state of the art in the quality of the retrieval (up to 7% on University of Kentucky dataset) in the speed of the retrieval (up to 2.5 times) min-Hash is a very useful tool for randomized image clustering

Thank you!