Presentation on theme: "Image and video descriptors"— Presentation transcript:
1 Image and video descriptors Advanced Topics in Computer VisionSpring 2010Weizmann Institute of ScienceOded Shahar and Gil Levi1
2 Outline Overview Image Descriptors Video Descriptors Histograms of Oriented Gradients DescriptorsShape DescriptorsColor DescriptorsVideo Descriptors
3 Overview - MotivationThe problem we are trying to solve is image similarity.Given two images (or image regions) – are they similar or not ?
4 Overview - Motivation Solution: Image Descriptors. An image descriptors “describes” a region in an image.To compare two such regions we will compare their descriptors.
5 Overview - DescriptorTo compare two images, we will compare their descriptorsDescriptor FunctionSimilar?Similar?
6 Overview - Similarity But what is similar to you ? Depends on the application !
7 OverviewImage (or region) similarity is used in many CV applications, for example:Object recognitionScene classificationImage registrationImage retrievalRobot localizationTemplate matchingBuilding panoramaAnd many more…
8 Overview Example – 3D reconstruction from stereo images. 1275200102195803931503015192320811010380196Comparing the pixels as they are, will not work!
9 OverviewDescriptors provide a means for comparing images or image regions.Descriptors allow certain differences between the regions – scale, rotation, illumination changes, noise, shape, etc.
10 Overview - Motivation Again, can’t take the pixels alone… Descriptor FunctionSimilar ?Similar ?Again, can’t take the pixels alone…
11 Overview Comonly used as follows Extract features from the image as small regions2. Describe each region using a feature descriptor3. Use the descriptors in application (comparison, training a classifier, etc.)
12 Overview Main problems Features Detection – Where to compute the descriptors? will cover brieflyFeature Description (Descriptors)How to compute descriptors? todayFeature ComparisonHow to compare two descriptors? will cover briefly
13 Overview - Features Detection Detection MethodsWhere to compute the descriptors?GridKey-PointsGridMultiscaleSnap to edgesUniform areaGlobal DescriptorsUse for lage image dtabase where speed and memory are limitedGlobal
14 Key-Points as Detector Output Overview - Features DetectionKey-Points as Detector OutputCan bePointsRegions (of different orientation, scale and affine trans.)SquaresEllipsesCirclesEtc..
15 Overview – Descriptor Comparison Given two region description, how to compare them?Usually descriptor come with it’s own distance functionMany descriptors use L2 distance
16 Overview – Descriptor Invariance Different descriptors measure different similarityDescriptors can have invariance for visual effectsIlluminationNoiseColorsTextureSimilar ?The final point, it is all depend on the application,What is similar for you?Different applications require different invariance therefore require different descriptors
17 Outline Overview Image Descriptors Video Descriptors Histograms of Oriented Gradients DescriptorsShape DescriptorsColor DescriptorsVideo Descriptors
18 Descriptor To compare two images, we will compare their descriptors Descriptor FunctionSimilar?Similar?
19 Descriptors Types of descriptors Intensity based Histogram Gradient basedColor BasedFrequencyShapeCombination of the above
20 Descriptors Why not use patches? Very large representation. Not invariant to small deformations in the descriptor location.Not invariant to changes in illumination.
21 Descriptors Intensity Histogram 255- Not invariant to light intensity change- Does not capture geometric information
22 Descriptors Histogram of image gradients Normalize for light intensity invarianceDoes not capture geometric information
23 Descriptors Solution: Divide the area For each section compute it’s own histogramSIFT - David Lowe 1999
24 Descriptors - SIFT How to compute SIFT descriptor Input: an image and a location to compute the descriptor16 x 16Step 1: Warp the image to the correct orientation and scale, and than extract the feature as 16x16 pixels
25 Descriptors - SIFTStep 2: Compute the gradient for each pixel (direction and magnitude)16 x 16Step 3: Divide the pixels into 16, 4x4 squares
26 Descriptors - SIFTStep 4: For each square, compute gradient direction histogram over 8 directions.The result: 128 dimensions feature vector.
27 Descriptors - SIFT Warp the feature into 16x16 square. Divide into 16, 4x4 squares.For each square, compute an histogram of the gradient directions.We will do this procedure by dividing the aria into 16 squares and compute an histogram of 8 directions for each square.The result is a 128-dimantions feature vectorWe will also Weighted by magnitude and Gaussian window, s.t. closer to the center pixels will higher weight on the descriptor.And also Normalize the feature vector to achieve variance over Light intensity change=> Feature vector (128)
28 Descriptors - SIFTWeighted by magnitude and Gaussian window ( σ is half the window size)Normalize the feature to unit vectorUse L2 distance to compare featuresCan use other distance functionsX^2 (chi square)Earth mover’s distance
29 Descriptors - SIFT Invariance to illumination Gradient are invariant to Light intensity shift (i.e. add a scalar to all the pixels)Normalization to unit length add invariance to light intensity change (i.e. multiply all the pixels by a scalar)Invariance to shift and rotationHistograms does not contains any geometric informationUsing 16 histograms allows to preserve geometric information.
30 Descriptors - GLOH Similar to SIFT Divide the feature into log-polar bins instead of dividing the feature into square.17 log-polar location bins16 orientation binsWe get 17x16=272 dimensions.Same as SIFT, but instead of dividing the aria into squares, we quantize the correlation surface using a log-polar binning, such that farther bins from the point are larger than closer ones.Analyze the 17x16=272 DimensionsApply PCA analysis, keep 128 componentsC. S. Krystian Mikolajczyk. A performance evaluation of local descriptors. TPAMI 2005
31 SURF Use integral images to detect and describe SIFT like features SURF describes image faster than SIFT by 3 timesSURF is not as well as SIFT on invariance to illumination change and viewpoint change
32 Descriptors Histograms of Oriented Gradients Descriptors SIFT David Lowe 1999GLOH Mikolajczyk K., Schmid C 2005SURF Bay H., Ess A., Tuytelaars T., Van Gool L 2008Gradient Base Descriptors was the significant breakthru in image descriptors with the introduction of SIFT descriptor by David Lowe in 1999
33 Outline Overview Image Descriptors Video Descriptors Histograms of Oriented Gradients DescriptorsShape DescriptorsColor DescriptorsVideo Descriptors33
39 Complex Notion of Similarity A good similarity measure between objects in images or actions in videos is important in many Computer Vision tasks like – object recognition, detection, retrieval, action recognition, etc.In many cases the notion of similarity between objects or actions can be quite complex, as you can see in the following 4 images.You can all probably see that these are four images of hearts, and you do it quite easily despite the extreme differences in appearance between the objects -These images do not share similar edges / intensity / color / texture patterns. So what makes them look similar to us?Here is another example (peace symbols). Again, these images contain completely different local patterns - in one image there are people standing on a mountain and in the other there are skulls.So, what makes these images similar is the fact that their geometric layout of local repetitive image patterns is similar,while the image patterns themselves can be completely different.(these different patterns are repeated locally in a similar geometric layout across those images. )In other words, these images share local self-similarity of image patterns, where the image patterns themselves are NOT shared across these images.We would like to use this Self-Similarity property in order to be able to match such challenging images.
40 The Local Self-Similarity Descriptor Input imageCorrelation surfaceImage descriptorNow let’s see how we can exploit this property to a method for matching such challenging objects in images.By correlating … using a simple SSD and transforming the distances to similarities, we obtain a correlation surface around that point. This surface represents how similar is the patch to its surrounding patches.Next we quantize the correlation surface using a log-polar binning, such that farther bins from the point are larger than closer ones.Thus we obtain a compact descriptor vector that captures the self-similarity property at a point, and it can be computed at each and every pixel in the image.
41 The Local Self-Similarity Descriptor 321231Now let’s see what does this descriptor give us.Consider the following images. Let’s focus on three points at corresponding locations on these objects.Note that although these objects are composed of completely different patterns, their Self-Similarity descriptors at matching locations are very similar.321
42 The Local Self-Similarity Descriptor Input imageCorrelation surfaceImage descriptorMAXProperties & Benefits:A unified treatment of repetitive patterns, color, texture, edgesCaptures the shape of a local regionInvariant to appearanceAccounts for small local affine & non-rigid deformationsLet’s see properties and benefits:This descriptor allows us to treat uniformly various types of image patterns: we already have seen how it can capture repetitive patterns, it can also capture a region of uniform color, a textured region, and an edge pattern up to some variation in the orientation of the edge.Captures the shape of the local region, regardless if it is a color region, textured region or an edge. This is opposed to common local region descriptors that are based onaffine – due to the log-polar quantization, similar to the shape context descriptor, non-rigid – this is achieved by taking the maximal value in each bin, therefore being invariant to the exact location of the peak within the bin.Then we take the Maximal value in each log-polar cell in order to be robust to small local affine deformations.ColorTextureEdges
44 Descriptors Shape Descriptors Allows measuring of shape similarity Shape ContextBelongie S., Malik J., Puzicha J. Shape Matching and Object Recognition Using Shape Contexts. PAMI, 2002.Local Self-SimilarityShechtman E., Irani M. Matching Local Self-Similarities across Images and Videos. CVPR, 2007.Geometric Blurrg A. C., Malik J. Geometric Blur for Template Matching. CVPR, 2001.Outperform the commonly used SIFT in object classification taskHorster E., Greif T., Lienhart R., Slaney M. Comparing local feature descriptors in pLSA-based image models.
45 Outline Overview Image Descriptors Video Descriptors Histograms of Oriented Gradients DescriptorsShape DescriptorsColor DescriptorsVideo Descriptors45
47 Color Descriptors Color spaces RGB HSV Opponent There are many descriptor evaluating and comparison woks done in the last few year.We will show you today what we think is the most relevant and resent work.“Evaluating Color Descriptors for Object and Scene Recognition”Which test deferent kind of color based descriptors in the task of Object and Scene RecognitionA common framework for Object and Scene Recognition is to:Extract features from the images.Compute descriptors for the featurescreate “codebook” or “bag of features” for each image.Use standard machine learning algorithms for Recognition.Opponent
48 Color Descriptors Opponent color space intensity information is represented by channel O3color information is represented by channel O1 and O2O1 and O2 are invariant to offset
49 Color Descriptors RGB color histogram Opponent O1, O2 Color moments Use all generalized color moments up to the second degree and the first order.Gives information on the distribution of the colors.
50 Color DescriptorsRGB-SIFT descriptors are computed for every RGB channel independentlyNormalize each channel separatelyInvariant to light color changerg-SIFT - SIFT descriptors over to r and g channels of the normalized-RGB space (2x128 dimensions per descriptor)OpponentSIFT - describes all the channels in the opponent color spaceC-SIFT - Use O1/O3 and O2/O3 of the opponent color space (2x128 dimensions per descriptor)Scale-invariant with respect to light intensity.Due to the definition of the color space, the offset does not cancel out when taking the derivativeG. J. Burghouts and J. M. GeusebroekPerformance evaluation of local color invariants 2009
51 Color DescriptorsStudies the invariance properties and the distinctiveness of color descriptorsLight color changeLight intensity changeLight color change and shiftLight intensity shiftLight intensity shift and changeWe wish to test deferent descriptors for:Light intensity change.Light intensity shiftLight intensity change and shiftLight color changeLight color change and shift – the most complex model
59 Descriptors Name Capture SIFT Gradient histograms Texture, gradients GLOHVariant of SIFT, log-polar descriptorSURFFaster variant of SIFT with lower performanceShape ContextHistogram of edges, good for shapes descriptionShape, edgesSelf-SimilarityHigher level shape description, Invariant to appearanceShapeRGB-SIFTSIFT descriptors are computed for every RGB channel independentlyC-SIFTSIFT base on the opponent color space, shown to be better then SIFT for object and scene recognitionTexture, gradients, color
60 Outline Overview Image Descriptors Video Descriptors Histograms of Oriented Gradients DescriptorsShape DescriptorsColor DescriptorsVideo Descriptors
61 Video Descriptors Application: Action recognition Video: More then just a sequence of imagesWant to capture temporal information
62 Video Descriptors Space-Time SIFT 64-directions histogram 64-directions histogramP. Scovanner, S. Ali, M. Shah A 3-dimensional sift descriptor and its application to action recognition
64 3D Shape ContextRepresent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over timeM. Grundmann, F. Meier, and I. Essa (2008) “3D Shape Context and Distance Transform for Action Recognition”
65 The Local Self-Similarity Descriptor in Video Input videoVideo descriptorCorrelation volumexytimespace-time patchspace-time regionAction detection
66 Video Descriptors On Space-Time Interest Points; Ivan Laptev Local image features provide compact and abstract representations of images, eg: cornersExtend the concept of a spatial corner detector to a spatio-temporal corner detector
67 Space-Time Interest Points Consider a synthetic sequence of a ball moving towards a wall and colliding with itAn interest point is detected at the collision point
68 Space-Time Interest Points Consider a synthetic sequence of 2 balls moving towards each otherDifferent interest points are calculated at different spatial and temporal scalescoarser scale
69 ConclusionThe problem we are trying to solve is similarity between images and videos.Descriptors provide a solution
70 ConclusionTradeoff between keeping the geometric structure and obtaining invariance properties (perturbations & rotations).Tradeoff between preserving information and obtaining invariance.