Presentation is loading. Please wait.

Presentation is loading. Please wait.

Image and video descriptors

Similar presentations


Presentation on theme: "Image and video descriptors"— Presentation transcript:

1 Image and video descriptors
Advanced Topics in Computer Vision Spring 2010 Weizmann Institute of Science Oded Shahar and Gil Levi 1

2 Outline Overview Image Descriptors Video Descriptors
Histograms of Oriented Gradients Descriptors Shape Descriptors Color Descriptors Video Descriptors

3 Overview - Motivation The problem we are trying to solve is image similarity. Given two images (or image regions) – are they similar or not ?

4 Overview - Motivation Solution: Image Descriptors.
An image descriptors “describes” a region in an image. To compare two such regions we will compare their descriptors.

5 Overview - Descriptor To compare two images, we will compare their descriptors Descriptor Function Similar? Similar?

6 Overview - Similarity But what is similar to you ?
Depends on the application !

7 Overview Image (or region) similarity is used in many CV applications, for example: Object recognition Scene classification Image registration Image retrieval Robot localization Template matching Building panorama And many more…

8 Overview Example – 3D reconstruction from stereo images.
12 75 200 102 195 80 39 3 150 30 15 19 23 208 110 103 80 196 Comparing the pixels as they are, will not work!

9 Overview Descriptors provide a means for comparing images or image regions. Descriptors allow certain differences between the regions – scale, rotation, illumination changes, noise, shape, etc.

10 Overview - Motivation Again, can’t take the pixels alone…
Descriptor Function Similar ? Similar ? Again, can’t take the pixels alone…

11 Overview Comonly used as follows
Extract features from the image as small regions 2. Describe each region using a feature descriptor 3. Use the descriptors in application (comparison, training a classifier, etc.)

12 Overview Main problems Features Detection –
Where to compute the descriptors? will cover briefly Feature Description (Descriptors) How to compute descriptors? today Feature Comparison How to compare two descriptors? will cover briefly

13 Overview - Features Detection
Detection Methods Where to compute the descriptors? Grid Key-Points Grid Multiscale Snap to edges Uniform area Global Descriptors Use for lage image dtabase where speed and memory are limited Global

14 Key-Points as Detector Output
Overview - Features Detection Key-Points as Detector Output Can be Points Regions (of different orientation, scale and affine trans.) Squares Ellipses Circles Etc..

15 Overview – Descriptor Comparison
Given two region description, how to compare them? Usually descriptor come with it’s own distance function Many descriptors use L2 distance

16 Overview – Descriptor Invariance
Different descriptors measure different similarity Descriptors can have invariance for visual effects Illumination Noise Colors Texture Similar ? The final point, it is all depend on the application, What is similar for you? Different applications require different invariance therefore require different descriptors

17 Outline Overview Image Descriptors Video Descriptors
Histograms of Oriented Gradients Descriptors Shape Descriptors Color Descriptors Video Descriptors

18 Descriptor To compare two images, we will compare their descriptors
Descriptor Function Similar? Similar?

19 Descriptors Types of descriptors Intensity based Histogram
Gradient based Color Based Frequency Shape Combination of the above

20 Descriptors Why not use patches? Very large representation.
Not invariant to small deformations in the descriptor location. Not invariant to changes in illumination.

21 Descriptors Intensity Histogram
255 - Not invariant to light intensity change - Does not capture geometric information

22 Descriptors Histogram of image gradients
Normalize for light intensity invariance Does not capture geometric information

23 Descriptors Solution: Divide the area
For each section compute it’s own histogram SIFT - David Lowe 1999

24 Descriptors - SIFT How to compute SIFT descriptor
Input: an image and a location to compute the descriptor 16 x 16 Step 1: Warp the image to the correct orientation and scale, and than extract the feature as 16x16 pixels

25 Descriptors - SIFT Step 2: Compute the gradient for each pixel (direction and magnitude) 16 x 16 Step 3: Divide the pixels into 16, 4x4 squares

26 Descriptors - SIFT Step 4: For each square, compute gradient direction histogram over 8 directions. The result: 128 dimensions feature vector.

27 Descriptors - SIFT Warp the feature into 16x16 square.
Divide into 16, 4x4 squares. For each square, compute an histogram of the gradient directions. We will do this procedure by dividing the aria into 16 squares and compute an histogram of 8 directions for each square. The result is a 128-dimantions feature vector We will also Weighted by magnitude and Gaussian window, s.t. closer to the center pixels will higher weight on the descriptor. And also Normalize the feature vector to achieve variance over Light intensity change => Feature vector (128)

28 Descriptors - SIFT Weighted by magnitude and Gaussian window ( σ is half the window size) Normalize the feature to unit vector Use L2 distance to compare features Can use other distance functions X^2 (chi square) Earth mover’s distance

29 Descriptors - SIFT Invariance to illumination
Gradient are invariant to Light intensity shift (i.e. add a scalar to all the pixels) Normalization to unit length add invariance to light intensity change (i.e. multiply all the pixels by a scalar) Invariance to shift and rotation Histograms does not contains any geometric information Using 16 histograms allows to preserve geometric information.

30 Descriptors - GLOH Similar to SIFT
Divide the feature into log-polar bins instead of dividing the feature into square. 17 log-polar location bins 16 orientation bins We get 17x16=272 dimensions. Same as SIFT, but instead of dividing the aria into squares, we quantize the correlation surface using a log-polar binning, such that farther bins from the point are larger than closer ones. Analyze the 17x16=272 Dimensions Apply PCA analysis, keep 128 components C. S. Krystian Mikolajczyk. A performance evaluation of local descriptors. TPAMI 2005

31 SURF Use integral images to detect and describe SIFT like features
SURF describes image faster than SIFT by 3 times SURF is not as well as SIFT on invariance to illumination change and viewpoint change

32 Descriptors Histograms of Oriented Gradients Descriptors
SIFT David Lowe 1999 GLOH Mikolajczyk K., Schmid C 2005 SURF Bay H., Ess A., Tuytelaars T., Van Gool L 2008 Gradient Base Descriptors was the significant breakthru in image descriptors with the introduction of SIFT descriptor by David Lowe in 1999

33 Outline Overview Image Descriptors Video Descriptors
Histograms of Oriented Gradients Descriptors Shape Descriptors Color Descriptors Video Descriptors 33

34 Descriptors

35 Descriptors - Shape Context
=? Assume we have a good edge detector Take a patch of edges? Not invariant to small deformations in the shape

36 Descriptors - Shape Context
Quantize the edges surface using a log-polar binning In each bin, sum the number of edge points

37 Descriptors - Shape Context

38 Descriptors - Shape Context

39 Complex Notion of Similarity
A good similarity measure between objects in images or actions in videos is important in many Computer Vision tasks like – object recognition, detection, retrieval, action recognition, etc. In many cases the notion of similarity between objects or actions can be quite complex, as you can see in the following 4 images. You can all probably see that these are four images of hearts, and you do it quite easily despite the extreme differences in appearance between the objects - These images do not share similar edges / intensity / color / texture patterns. So what makes them look similar to us? Here is another example (peace symbols). Again, these images contain completely different local patterns - in one image there are people standing on a mountain and in the other there are skulls. So, what makes these images similar is the fact that their geometric layout of local repetitive image patterns is similar, while the image patterns themselves can be completely different. (these different patterns are repeated locally in a similar geometric layout across those images. ) In other words, these images share local self-similarity of image patterns, where the image patterns themselves are NOT shared across these images. We would like to use this Self-Similarity property in order to be able to match such challenging images.

40 The Local Self-Similarity Descriptor
Input image Correlation surface Image descriptor Now let’s see how we can exploit this property to a method for matching such challenging objects in images. By correlating … using a simple SSD and transforming the distances to similarities, we obtain a correlation surface around that point. This surface represents how similar is the patch to its surrounding patches. Next we quantize the correlation surface using a log-polar binning, such that farther bins from the point are larger than closer ones. Thus we obtain a compact descriptor vector that captures the self-similarity property at a point, and it can be computed at each and every pixel in the image.

41 The Local Self-Similarity Descriptor
3 2 1 2 3 1 Now let’s see what does this descriptor give us. Consider the following images. Let’s focus on three points at corresponding locations on these objects. Note that although these objects are composed of completely different patterns, their Self-Similarity descriptors at matching locations are very similar. 3 2 1

42 The Local Self-Similarity Descriptor
Input image Correlation surface Image descriptor MAX Properties & Benefits: A unified treatment of repetitive patterns, color, texture, edges Captures the shape of a local region Invariant to appearance Accounts for small local affine & non-rigid deformations Let’s see properties and benefits: This descriptor allows us to treat uniformly various types of image patterns: we already have seen how it can capture repetitive patterns, it can also capture a region of uniform color, a textured region, and an edge pattern up to some variation in the orientation of the edge. Captures the shape of the local region, regardless if it is a color region, textured region or an edge. This is opposed to common local region descriptors that are based on affine – due to the log-polar quantization, similar to the shape context descriptor, non-rigid – this is achieved by taking the maximal value in each bin, therefore being invariant to the exact location of the peak within the bin. Then we take the Maximal value in each log-polar cell in order to be robust to small local affine deformations. Color Texture Edges

43 Template image:

44 Descriptors Shape Descriptors Allows measuring of shape similarity
Shape Context Belongie S., Malik J., Puzicha J. Shape Matching and Object Recognition Using Shape Contexts. PAMI, 2002. Local Self-Similarity Shechtman E., Irani M. Matching Local Self-Similarities across Images and Videos. CVPR, 2007. Geometric Blur rg A. C., Malik J. Geometric Blur for Template Matching. CVPR, 2001. Outperform the commonly used SIFT in object classification task Horster E., Greif T., Lienhart R., Slaney M. Comparing local feature descriptors in pLSA-based image models. 

45 Outline Overview Image Descriptors Video Descriptors
Histograms of Oriented Gradients Descriptors Shape Descriptors Color Descriptors Video Descriptors 45

46 Color Descriptors

47 Color Descriptors Color spaces RGB HSV Opponent
There are many descriptor evaluating and comparison woks done in the last few year. We will show you today what we think is the most relevant and resent work. “Evaluating Color Descriptors for Object and Scene Recognition” Which test deferent kind of color based descriptors in the task of Object and Scene Recognition A common framework for Object and Scene Recognition is to: Extract features from the images. Compute descriptors for the features create “codebook” or “bag of features” for each image. Use standard machine learning algorithms for Recognition. Opponent

48 Color Descriptors Opponent color space
intensity information is represented by channel O3 color information is represented by channel O1 and O2 O1 and O2 are invariant to offset

49 Color Descriptors RGB color histogram Opponent O1, O2 Color moments
Use all generalized color moments up to the second degree and the first order. Gives information on the distribution of the colors.

50 Color Descriptors RGB-SIFT descriptors are computed for every RGB channel independently Normalize each channel separately Invariant to light color change rg-SIFT - SIFT descriptors over to r and g channels of the normalized-RGB space (2x128 dimensions per descriptor) OpponentSIFT - describes all the channels in the opponent color space C-SIFT - Use O1/O3 and O2/O3 of the opponent color space (2x128 dimensions per descriptor) Scale-invariant with respect to light intensity. Due to the definition of the color space, the offset does not cancel out when taking the derivative G. J. Burghouts and J. M. Geusebroek Performance evaluation of local color invariants 2009

51 Color Descriptors Studies the invariance properties and the distinctiveness of color descriptors Light color change Light intensity change Light color change and shift Light intensity shift Light intensity shift and change We wish to test deferent descriptors for: Light intensity change. Light intensity shift Light intensity change and shift Light color change Light color change and shift – the most complex model

52 Color Descriptors

53 Color Descriptors

54 Color Descriptors Increased invariance can reduce discriminative power

55 Color Descriptors Descriptor performance on image benchmark

56 Color Descriptors

57 Descriptors How to chose your descriptor?
What is the similarity that you need for your application?

58 Descriptors

59 Descriptors Name Capture SIFT Gradient histograms Texture, gradients
GLOH Variant of SIFT, log-polar descriptor SURF Faster variant of SIFT with lower performance Shape Context Histogram of edges, good for shapes description Shape, edges Self-Similarity Higher level shape description, Invariant to appearance Shape RGB-SIFT SIFT descriptors are computed for every RGB channel independently C-SIFT SIFT base on the opponent color space, shown to be better then SIFT for object and scene recognition Texture, gradients, color

60 Outline Overview Image Descriptors Video Descriptors
Histograms of Oriented Gradients Descriptors Shape Descriptors Color Descriptors Video Descriptors

61 Video Descriptors Application: Action recognition
Video: More then just a sequence of images Want to capture temporal information

62 Video Descriptors Space-Time SIFT 64-directions histogram
64-directions histogram P. Scovanner, S. Ali, M. Shah A 3-dimensional sift descriptor and its application to action recognition

63 Video Descriptors Actions as Space-Time Shapes

64 3D Shape Context Represent an action in a video sequence by a 3D point cloud extracted by sampling 2D silhouettes over time M. Grundmann, F. Meier, and I. Essa (2008) “3D Shape Context and Distance Transform for Action Recognition”

65 The Local Self-Similarity Descriptor in Video
Input video Video descriptor Correlation volume x y time space-time patch space-time region Action detection

66 Video Descriptors On Space-Time Interest Points; Ivan Laptev
Local image features provide compact and abstract representations of images, eg: corners Extend the concept of a spatial corner detector to a spatio-temporal corner detector

67 Space-Time Interest Points
Consider a synthetic sequence of a ball moving towards a wall and colliding with it An interest point is detected at the collision point

68 Space-Time Interest Points
Consider a synthetic sequence of 2 balls moving towards each other Different interest points are calculated at different spatial and temporal scales coarser scale

69 Conclusion The problem we are trying to solve is similarity between images and videos. Descriptors provide a solution

70 Conclusion Tradeoff between keeping the geometric structure and obtaining invariance properties (perturbations & rotations). Tradeoff between preserving information and obtaining invariance.

71 Thank You


Download ppt "Image and video descriptors"

Similar presentations


Ads by Google