8CBIR Challenges: How to represent visual content of images What are “visual contents” ?Colors, shapes, textures, objects, or meta-data (e.g., tags) derived from imagesWhich type of “visual content” should be used for representing image ?Difficult to understand the information needs of an user from a query imageHow to retrieve images efficientlyShould avoid linear scan of the entire database
9Image Representation Similar color distribution Histogram matching Similar texture patternTexture analysisImage Segmentation,Pattern recognitionSimilar shape/patternDegree of difficultySimilar real contentLife-time goal :-)
10Vector based Image Representation Represent an image by a vector of fixed number of elementsColor histogram: discretize color space; count pixels for each discretized color binTexture: Gabor filters texture features…
15Challenges in CBIR That’s what it’s like to be a CBIR system! You get drunk,REALLY drunkHit over the headKidnapped to another cityin a country on the other side of the worldWhen you wake up,You try to figure out what city are you in, and what is going onThat’s what it’s like to be a CBIR system!
16Near Duplicate Image Retrieval Given a query image, identify gallery images with high visual similarity.
17Appearance based Image Matching Parts-based image representationParts (appearance) + shape (spatial relation)Parts: local features by interesting point operatorShape: graphical models or neighborhood relationship
18Interesting Point Detection Local features have been shown to be effective for representing imagesThey are image patterns which differ from their immediate neighborhood.They could be points, edges, small patches.We call local features key points or interesting points of an image
19Interesting Point Detection An image example with key points detected by a corner detector.
20Interesting Point Detection The detection of interesting point needs to be robust to various geometric transformationsOriginalScaling+Rotation+TranslationProjection
21Interesting Point Detection The detection of interesting point needs to be robust to imaging conditions, e.g. lighting, blurring.
22Descriptor Representing each detected key point Take measurements from a region centered on a interesting pointE.g., texture, shape, …Each descriptor is a vector with fixed lengthE.g. SIFT descriptor is a vector of 128 dimension
23They should have similar descriptors The descriptor should also be robust under different image transformation.They should have similar descriptors
24Image Representation Bag-of-features representation: an example Each descriptor is 5 dimension2219231661034563823244114829551297811032220303421Original imageDetected key pointsDescriptors of the key points
25How to measure similarity? Retrieval221923166103456382324411482955129...How to measure similarity?
26Count number of matches ! Retrieval221923166103456382324411482955129...Count number of matches !
27RetrievalIf the distance between two vectors is smaller than the threshold, we get one match
29Problems Computationally expensive Requiring linear scan of the entire data baseExample: match a query image to a database of 1 million images0.1 second for computing the match between two imagesTake more than one day to answer a single query
30Bag-of-words ModelCompare to the bag-of-words representation in text retrievalAn imageA documentWhat is thedifferenceA collection of the words in the documentA collection of the key points of the image
31Bag-of-words An image A document What is thedifferenceA collection of the words in the documentA collection of the key points of the imageThe same word appears in many documentsNo “same key point”, but “similar key point” appears in many images which have similar “visual content”Group “similar key point” in different images in to “visual words”
32Bag-of-words Model … b1 b2 b3 b1 b2 b3 b4 b5 b6 b7 b8 b4 Represent images by histograms of visual wordsGroup key points into visual words
33Bag-of-words The “grouping” is usually done by clustering. Clustering the key points of all images into a number of cluster centers (e.g 100,000 clusters).Each cluster center is called a “visual word”The collection of all cluster centers is called “ visual vocabulary”
34Retrieval by Bag-of-words Model Generate “visual vocabulary”Represent each key point by its nearest “visual word”Represent an image by “a bag of visual words”Text retrieval technique can be applied directly.
35Project Build a system for near duplicate image retrieval A database with 10,000 imagesConstruct bag-of-words models for each image (offline)Construct a bag-of-words model for a query imageRetrieve first 10 visually most “similar” images from the database for the given query
36Step 1: Dataset 10,000 color images under the folder ‘./img’ The key points of each image have already been extractedKey points of all images are saved in a single file ‘./feature/esp.feature’Each line corresponds to a key point with 128 attributesAttributes in each line are separated by tabs
37Step 1: DatasetTo locate key points for individual images, two other files are needed:‘./imglist.txt’: the order of images when saving their keypoints‘./feature/esp.size’: the number of key points an image have.
38Step 1: Dataset Example: Three images imgA, imgB, imgC. imgA : 2 key points; imgB: 3 key points; imgC: 2 key points.imglist.txtesp.sizeesp.featureimgB.jpgimgC.jpgimgA.jpg32imgB-key point 1imgB-key point 2imgB-key point 3imgC-key point 1imgC-key point 2imgA-key point 1imgA-key point 2
39Step 2: Key Point Quantization Represent each image by a bag of visual words:Construct the visual vocabularyClustering all the key points into 10,000 clustersEach cluster center is a visual wordMap each key point to a visual wordFind the nearest cluster center for each key point (nearest neighbor search)
40Step 2: Key Point Quantization Clustering 7 key points into 3 clustersThe cluster centers are: cnt1, cnt2, cnt3Each center is a visual word: w1, w2, w3Find the nearest center to each key pointimglist.txtesp.sizeesp.featureimgB.jpgimgC.jpgimgA.jpg32imgB-key point 1imgB-key point 2imgB-key point 3imgC-key point 1imgC-key point 2imgA-key point 1imgA-key point 2
41Step 2: Key Point Quantization imgA.jpg1st key point w22nd key point w1imgB.jpg1st key point w32nd key point w33rd key point w2imgC.jpg2nd key point w2Bag-of-words Rep.imgA.jpg: w2 w1imgB.jpg: w3 w3 w2imgC.jpg: w3 w2
42Step 2: Key Point Quantization We provide FLANN library for clustering and nearest neighbor search.For clustering, use flann_compute_cluster_centers(float* dataset, // your key pointsint rows, // number of key pointsint cols, // 128, dim of a key pointint clusters, // number of clustersfloat* result, // cluster centersstruct IndexParameters* index_params, struct FLANN
43Step 2: Key Point Quantization For nearest neighbor searchBuild index for the cluster centersflann_build_index(float* dataset, // your cluster centersint rows, int cols, float* speedup, struct IndexParameters* index_params, struct FLANNParameters* flann_params);For each key point, search nearest cluster centerflann_find_nearest_neighbors_index(FLANN_INDEX index_id, // your index abovefloat* testset, // your key pointsint trows, int* result, int nn, int checks, struct FLANNParameters* flann_params);
44Step 2: Key Point Quantization In this step, you need to save:the cluster centers to a file. You will use this later on for quantizing key points of query imagesbag-of-words representation of each image in “trec” format.Bag-of-words Rep.imgA.jpg: w2 w1imgB.jpg: w3 w3 w2imgC.jpg: w3 w2<DOC><DOCNO>imgB</DOCNO><TEXT>w3 w3 w2</TEXT></DOC><DOC><DOCNO>imgA</DOCNO><TEXT>w2 w1</TEXT></DOC><DOC><DOCNO>imgC</DOCNO><TEXT>w3 w2</TEXT></DOC>
45Step 3: Build index using Lemur The same as what we did in the previous home workUse “KeyfileIncIndex” indexNo stemmingNo stop words
46Step 4: Extract key points for a query Three sample query images under ‘./sample query/’The query images are in the format of .pgmExtracting tool is under ‘./sift tool/’For windows, use “siftW32.exe”For Linux, use “sift”Example: issue commandSift < input.pgm > output.keypoints
47Step 5: Generate a bag-of-words model for a query Map each key point of a given query to a visual word.Use the cluster center file generated in step 2Build index for the cluster centers using flann_build_index()For each key point, search nearest cluster center using flann_find_nearest_neighbors_index()
48Step 5: Generate a bag-of-words model for a query Write the bag-of-words model for a query image in the Lemur format.<DOC 1>The mapped cluster ID for the 1st key pointThe mapped cluster ID for the 2nd key point…</DOC>
49Step 6: Image Retrieval by Lemur Use the Lemur command ‘RetEval’as:RetEval <parameter_file>An example of parameter file<parameters><index>/home/user1/myindex/myindex.key</index><retModel>tfidf</retModel><textQuery>/home/user1/query/q1.query</textQuery><resultFile>/home/user1/result/ret.result</resultFile><TRECResultFormat>1</TRECResultFormat><resultCount>10</resultCount></parameters>
50Step 7: Graphical User Interface Build a GUI for the image retrieval systemBrowse the image databaseSelect an image from the database to query the database and display the top 10 retrieved resultsExtract the bag-of-words representation of the queryWrite it into the file with the format specified in step7Run the “RetEval” command for retrievalLoad in the external query image, search the images in the database and display the top 10 retrieved results
51Step 8: Evaluation Demo your system in the classes of the last week. We will provide a number of test query imagesRun your GUI, load in each test query image and display the first ten most similar images from the database