Download presentation
Presentation is loading. Please wait.
1
Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …
2
Computer Vision 2 Jan 16/18-Introduction Jan 23/25CamerasRadiometry Jan 30/Feb1Sources & ShadowsColor Feb 6/8Linear filters & edgesTexture Feb 13/15Multi-View GeometryStereo Feb 20/22Optical flowProject proposals Feb27/Mar1Affine SfMProjective SfM Mar 6/8Camera CalibrationSegmentation Mar 13/15Springbreak Mar 20/22FittingProb. Segmentation Mar 27/29Silhouettes and Photoconsistency Linear tracking Apr 3/5Project UpdateNon-linear Tracking Apr 10/12Object Recognition Apr 17/19Range data Apr 24/26Final project Tentative class schedule
3
Computer Vision 3 Last class: Recognition by matching templates Classifiers PCA LDA decision boundaries, not prob.dens. dimensionality reduction maximize discrimination
4
Computer Vision 4 Last class: Recognition by matching templates Neural Networks Support Vector Machines Universal approximation property Optimal separating hyperplane (OSH) support vectors Convex problem! also for non-linear boundaries
5
Computer Vision 5 SVMs for 3D object recognition - Consider images as vectors - Compute pairwise OSH using linear SVM - Support vectors are representative views of the considered object (relative to other) - Tournament like classification - Competing classes are grouped in pairs - Not selected classes are discarded - Until only one class is left - Complexity linear in number of classes - No pose estimation (Pontil & Verri PAMI’98)
6
Computer Vision 6 Vision applications Reliable, simple classifier, –use it wherever you need a classifier Commonly used for face finding Pedestrian finding –many pedestrians look like lollipops (hands at sides, torso wider than legs) most of the time –classify image regions, searching over scales –But what are the features? –Compute wavelet coefficients for pedestrian windows, average over pedestrians. If the average is different from zero, probably strongly associated with pedestrian
7
Computer Vision 7 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE
8
Computer Vision 8 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE
9
Computer Vision 9 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE
10
Computer Vision 10 Latest results on Pedestrian Detection: Viola, Jones and Snow’s paper (ICCV’03: Marr prize) Combine static and dynamic features cascade for efficiency (4 frames/s) 5 best out of 55k (AdaBoost) 5 best static out of 28k (AdaBoost) some positive examples used for training
11
Computer Vision 11 Dynamic detection false detection: typically 1/400,000 (=1 every 2 frames for 360x240)
12
Computer Vision 12 Static detection
13
Computer Vision 13 Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.
14
Computer Vision 14 Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins
15
Computer Vision 15 Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
16
Computer Vision 16 Probabilistic interpretation Write Assume Likelihood of image given pattern
17
Computer Vision 17 Possible alternative strategies Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities
18
Computer Vision 18 Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
19
Computer Vision 19 Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE
20
Computer Vision 20 Example Training examples Test image
21
Computer Vision 21
22
Computer Vision 22
23
Computer Vision 23 Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be
24
Computer Vision 24 Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
25
Computer Vision 25 Detection This means we compare
26
Computer Vision 26 Issues Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize
27
Computer Vision 27 Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE
28
Computer Vision 28 Pruning Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Example: finding people without clothes on –find skin –find extended skin regions –construct groups that pass local classifiers (i.e. lower arm, upper arm) –give these to broader scale classifiers (e.g. girdle)
29
Computer Vision 29 Pruning Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries.
30
Computer Vision 30 Horses
31
Computer Vision 31 Hidden Markov Models Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example
32
Computer Vision 32 Hidden Markov Models Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements
33
Computer Vision 33 HMM’s - dynamics
34
Computer Vision 34 HMM’s - the Joint and Inference
35
Computer Vision 35 Trellises Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log- posterior Trellis makes this easy, as follows.
36
Computer Vision 36
37
Computer Vision 37 Fitting an HMM I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc.
38
Computer Vision 38 HMM’s for sign language understanding-1 Build an HMM for each word
39
Computer Vision 39 HMM’s for sign language understanding-2 Build an HMM for each word Then build a language model
40
Computer Vision 40 Figure from “Real time American sign language recognition using desk and wearable computer based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE User gesturing For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun ), Starner and Pentland’s displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.
41
Computer Vision 41 HMM’s can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.
42
Computer Vision 42
43
Computer Vision 43 Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE
44
Computer Vision 44 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint Tuytelaars and Van Gool, BMVC2000
45
Computer Vision 45 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination
46
Computer Vision 46 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background
47
Computer Vision 47 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions
48
Computer Vision 48 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions Use local invariant features Invariant features = features that are preserved under a specific group of transformations Robust to occlusions and changes in background Robust to changes in viewpoint and illumination
49
Computer Vision 49 Affine geometric deformations Linear photometric changes Transformations for planar objects
50
Computer Vision 50 Local invariant features ‘Affine invariant neighborhood’
51
Computer Vision 51 Local invariant features
52
Computer Vision 52 Local invariant features Geometry-based region extraction –Curved edges –Straight edges Intensity-based region extraction
53
Computer Vision 53 Geometry-based method (curved edges)
54
Computer Vision 54 Geometry-based method (curved edges) 1.Harris corner detection
55
Computer Vision 55 Geometry-based method (curved edges) 2.Canny edge detection
56
Computer Vision 56 Geometry-based method (curved edges) 3.Evaluation relative affine invariant parameter along two edges
57
Computer Vision 57 Geometry-based method (curved edges) 4.Construct 1-dimensional family of parallelogram shaped regions
58
Computer Vision 58 Geometry-based method (curved edges) f 5.Select parallelograms based on local extrema of invariant function
59
Computer Vision 59 Geometry-based method (curved edges) 5.Select parallelograms based on local extrema of invariant function
60
Computer Vision 60 Geometry-based method (straight edges) Relative affine invariant parameters are identically zero!
61
Computer Vision 61 Geometry-based method (straight edges) 1.Harris corner detection
62
Computer Vision 62 Geometry-based method (straight edges) 2.Canny edge detection
63
Computer Vision 63 Geometry-based method (straight edges) 3.Fit lines to edges
64
Computer Vision 64 Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions
65
Computer Vision 65 Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions
66
Computer Vision 66 Intensity based method 1.Search intensity extrema 2.Observe intensity profile along rays 3.Search maximum of invariant function f(t) along each ray 4.Connect local maxima 5.Fit ellipse 6.Double ellipse size
67
Computer Vision 67 Intensity based method
68
Computer Vision 68 Comparison Intensity-based method More robust Geometry-based method Less computations More environments
69
Computer Vision 69 Robustness “Correct” detection of single environment cannot be guaranteed –Non-planar region –Noise, quantization errors –Non-linear photometric distortion –Perspective-distortion –… All regions of an object / image should be considered simultaneously
70
Computer Vision 70 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants e.g. Search for corresponding regions
71
Computer Vision 71 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants 3.Search for corresponding regions based on Mahalanobis distance 4.Check cross-correlation (after normalization) 5.Check consistency of correspondences Search for corresponding regions
72
Computer Vision 72 Semi-local constraints = check consistency of correspondences Epipolar constraint ( RANSAC ) based on 7 points Geometric constraints Photometric constraints based on a combination of only 2 regions
73
Computer Vision 73 Experimental validation degrees symmetric correct Number of matches
74
Computer Vision 74 Experimental validation symmetric correct scale Number of matches error
75
Computer Vision 75 Experimental validation symmetric correct Number of matches illumination reference
76
Computer Vision 76 Object recognition and localization ‘Appearance’-based approach = objects are modeled by a set of reference images Voting principle based on number of similar regions More invariance = requires less reference images
77
Computer Vision 77 Object recognition and localization
78
Computer Vision 78 Object recognition and localization
79
Computer Vision 79 Wide-baseline stereo
80
Computer Vision 80 Wide-baseline stereo
81
Computer Vision 81 Wide-baseline stereo
82
Computer Vision 82 = Searching of ‘similar’ images in a database based on image content Local features Similarity = images contain the same object or the same scene Voting principle –Based on the number of similar regions Content-based image retrieval from database
83
Computer Vision 83 Database ( > 450 images) Search image Content-based image retrieval from database
84
Computer Vision 84 Content-based image retrieval from database
85
Computer Vision 85 Content-based image retrieval from database
86
Computer Vision 86 Application: virtual museum guide
87
Computer Vision 87 Next class: Range data Reading: Chapter 21
88
Computer Vision 88 Talk 4pm tomorrow: Jean Ponce Three-Dimensional Computer Vision: Challenges and Opportunities Jean Ponce (ponce@cs.uiuc.edu) University of Illinois at Urbana-Champaign Ecole Normale Supirieure, Paris http://www-cvr.ai.uiuc.edu/ponce_grp/ Abstract: This talk addresses two of the main challenges of computer vision: automatically recognizing three-dimensional (3D) object categories in photographs despite potential within-class variations, viewpoint changes, occlusion, and clutter; and recovering accurate models of 3D shapes observed in multiple images. I will first present a new approach to 3D object recognition that exploits local, semi-local, and global constraints to learn visual models of texture, object, and scene categories, and identify instances of these models in photographs. I will then discuss a novel algorithm that uses the geometric and photometric constraints associated with multiple calibrated photographs to construct high-fidelity solid models of complex 3D shapes in the form of carved visual hulls. I will conclude with a brief discussion of new application domains and wide open research issues. Joint work with Yasutaka Furukawa, Akash Kushal, Svetlana Lazebnik, Kenton McHenry, Fred Rothganger, and Cordelia Schmid.ponce@cs.uiuc.edu http://www-cvr.ai.uiuc.edu/ponce_grp/
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.