Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …

Similar presentations


Presentation on theme: "Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …"— Presentation transcript:

1 Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …

2 Computer Vision 2 Jan 16/18-Introduction Jan 23/25CamerasRadiometry Jan 30/Feb1Sources & ShadowsColor Feb 6/8Linear filters & edgesTexture Feb 13/15Multi-View GeometryStereo Feb 20/22Optical flowProject proposals Feb27/Mar1Affine SfMProjective SfM Mar 6/8Camera CalibrationSegmentation Mar 13/15Springbreak Mar 20/22FittingProb. Segmentation Mar 27/29Silhouettes and Photoconsistency Linear tracking Apr 3/5Project UpdateNon-linear Tracking Apr 10/12Object Recognition Apr 17/19Range data Apr 24/26Final project Tentative class schedule

3 Computer Vision 3 Last class: Recognition by matching templates Classifiers PCA LDA decision boundaries, not prob.dens. dimensionality reduction maximize discrimination

4 Computer Vision 4 Last class: Recognition by matching templates Neural Networks Support Vector Machines Universal approximation property Optimal separating hyperplane (OSH) support vectors Convex problem! also for non-linear boundaries

5 Computer Vision 5 SVMs for 3D object recognition - Consider images as vectors - Compute pairwise OSH using linear SVM - Support vectors are representative views of the considered object (relative to other) - Tournament like classification - Competing classes are grouped in pairs - Not selected classes are discarded - Until only one class is left - Complexity linear in number of classes - No pose estimation (Pontil & Verri PAMI’98)

6 Computer Vision 6 Vision applications Reliable, simple classifier, –use it wherever you need a classifier Commonly used for face finding Pedestrian finding –many pedestrians look like lollipops (hands at sides, torso wider than legs) most of the time –classify image regions, searching over scales –But what are the features? –Compute wavelet coefficients for pedestrian windows, average over pedestrians. If the average is different from zero, probably strongly associated with pedestrian

7 Computer Vision 7 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

8 Computer Vision 8 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

9 Computer Vision 9 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

10 Computer Vision 10 Latest results on Pedestrian Detection: Viola, Jones and Snow’s paper (ICCV’03: Marr prize) Combine static and dynamic features cascade for efficiency (4 frames/s) 5 best out of 55k (AdaBoost) 5 best static out of 28k (AdaBoost) some positive examples used for training

11 Computer Vision 11 Dynamic detection false detection: typically 1/400,000 (=1 every 2 frames for 360x240)

12 Computer Vision 12 Static detection

13 Computer Vision 13 Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.

14 Computer Vision 14 Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins

15 Computer Vision 15 Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

16 Computer Vision 16 Probabilistic interpretation Write Assume Likelihood of image given pattern

17 Computer Vision 17 Possible alternative strategies Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities

18 Computer Vision 18 Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

19 Computer Vision 19 Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

20 Computer Vision 20 Example Training examples Test image

21 Computer Vision 21

22 Computer Vision 22

23 Computer Vision 23 Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be

24 Computer Vision 24 Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

25 Computer Vision 25 Detection This means we compare

26 Computer Vision 26 Issues Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize

27 Computer Vision 27 Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

28 Computer Vision 28 Pruning Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Example: finding people without clothes on –find skin –find extended skin regions –construct groups that pass local classifiers (i.e. lower arm, upper arm) –give these to broader scale classifiers (e.g. girdle)

29 Computer Vision 29 Pruning Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries.

30 Computer Vision 30 Horses

31 Computer Vision 31 Hidden Markov Models Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example

32 Computer Vision 32 Hidden Markov Models Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements

33 Computer Vision 33 HMM’s - dynamics

34 Computer Vision 34 HMM’s - the Joint and Inference

35 Computer Vision 35 Trellises Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log- posterior Trellis makes this easy, as follows.

36 Computer Vision 36

37 Computer Vision 37 Fitting an HMM I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc.

38 Computer Vision 38 HMM’s for sign language understanding-1 Build an HMM for each word

39 Computer Vision 39 HMM’s for sign language understanding-2 Build an HMM for each word Then build a language model

40 Computer Vision 40 Figure from “Real time American sign language recognition using desk and wearable computer based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE User gesturing For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun ), Starner and Pentland’s displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.

41 Computer Vision 41 HMM’s can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.

42 Computer Vision 42

43 Computer Vision 43 Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE

44 Computer Vision 44 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint Tuytelaars and Van Gool, BMVC2000

45 Computer Vision 45 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination

46 Computer Vision 46 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background

47 Computer Vision 47 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions

48 Computer Vision 48 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions  Use local invariant features Invariant features = features that are preserved under a specific group of transformations Robust to occlusions and changes in background Robust to changes in viewpoint and illumination

49 Computer Vision 49 Affine geometric deformations Linear photometric changes Transformations for planar objects

50 Computer Vision 50 Local invariant features ‘Affine invariant neighborhood’

51 Computer Vision 51 Local invariant features

52 Computer Vision 52 Local invariant features Geometry-based region extraction –Curved edges –Straight edges Intensity-based region extraction

53 Computer Vision 53 Geometry-based method (curved edges)

54 Computer Vision 54 Geometry-based method (curved edges) 1.Harris corner detection

55 Computer Vision 55 Geometry-based method (curved edges) 2.Canny edge detection

56 Computer Vision 56 Geometry-based method (curved edges) 3.Evaluation relative affine invariant parameter along two edges

57 Computer Vision 57 Geometry-based method (curved edges) 4.Construct 1-dimensional family of parallelogram shaped regions

58 Computer Vision 58 Geometry-based method (curved edges) f 5.Select parallelograms based on local extrema of invariant function

59 Computer Vision 59 Geometry-based method (curved edges) 5.Select parallelograms based on local extrema of invariant function

60 Computer Vision 60 Geometry-based method (straight edges) Relative affine invariant parameters are identically zero!

61 Computer Vision 61 Geometry-based method (straight edges) 1.Harris corner detection

62 Computer Vision 62 Geometry-based method (straight edges) 2.Canny edge detection

63 Computer Vision 63 Geometry-based method (straight edges) 3.Fit lines to edges

64 Computer Vision 64 Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions

65 Computer Vision 65 Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions

66 Computer Vision 66 Intensity based method 1.Search intensity extrema 2.Observe intensity profile along rays 3.Search maximum of invariant function f(t) along each ray 4.Connect local maxima 5.Fit ellipse 6.Double ellipse size

67 Computer Vision 67 Intensity based method

68 Computer Vision 68 Comparison Intensity-based method More robust Geometry-based method Less computations More environments

69 Computer Vision 69 Robustness “Correct” detection of single environment cannot be guaranteed –Non-planar region –Noise, quantization errors –Non-linear photometric distortion –Perspective-distortion –… All regions of an object / image should be considered simultaneously

70 Computer Vision 70 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants e.g. Search for corresponding regions

71 Computer Vision 71 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants 3.Search for corresponding regions based on Mahalanobis distance 4.Check cross-correlation (after normalization) 5.Check consistency of correspondences Search for corresponding regions

72 Computer Vision 72 Semi-local constraints = check consistency of correspondences Epipolar constraint ( RANSAC ) based on 7 points Geometric constraints Photometric constraints based on a combination of only 2 regions

73 Computer Vision 73 Experimental validation degrees symmetric correct Number of matches

74 Computer Vision 74 Experimental validation symmetric correct scale Number of matches error

75 Computer Vision 75 Experimental validation symmetric correct Number of matches illumination reference

76 Computer Vision 76 Object recognition and localization ‘Appearance’-based approach = objects are modeled by a set of reference images Voting principle based on number of similar regions More invariance = requires less reference images

77 Computer Vision 77 Object recognition and localization

78 Computer Vision 78 Object recognition and localization

79 Computer Vision 79 Wide-baseline stereo

80 Computer Vision 80 Wide-baseline stereo

81 Computer Vision 81 Wide-baseline stereo

82 Computer Vision 82 = Searching of ‘similar’ images in a database based on image content Local features Similarity = images contain the same object or the same scene Voting principle –Based on the number of similar regions Content-based image retrieval from database

83 Computer Vision 83 Database ( > 450 images) Search image Content-based image retrieval from database

84 Computer Vision 84 Content-based image retrieval from database

85 Computer Vision 85 Content-based image retrieval from database

86 Computer Vision 86 Application: virtual museum guide

87 Computer Vision 87 Next class: Range data Reading: Chapter 21

88 Computer Vision 88 Talk 4pm tomorrow: Jean Ponce Three-Dimensional Computer Vision: Challenges and Opportunities Jean Ponce (ponce@cs.uiuc.edu) University of Illinois at Urbana-Champaign Ecole Normale Supirieure, Paris http://www-cvr.ai.uiuc.edu/ponce_grp/ Abstract: This talk addresses two of the main challenges of computer vision: automatically recognizing three-dimensional (3D) object categories in photographs despite potential within-class variations, viewpoint changes, occlusion, and clutter; and recovering accurate models of 3D shapes observed in multiple images. I will first present a new approach to 3D object recognition that exploits local, semi-local, and global constraints to learn visual models of texture, object, and scene categories, and identify instances of these models in photographs. I will then discuss a novel algorithm that uses the geometric and photometric constraints associated with multiple calibrated photographs to construct high-fidelity solid models of complex 3D shapes in the form of carved visual hulls. I will conclude with a brief discussion of new application domains and wide open research issues. Joint work with Yasutaka Furukawa, Akash Kushal, Svetlana Lazebnik, Kenton McHenry, Fred Rothganger, and Cordelia Schmid.ponce@cs.uiuc.edu http://www-cvr.ai.uiuc.edu/ponce_grp/


Download ppt "Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …"

Similar presentations


Ads by Google