Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …

Slides:



Advertisements
Similar presentations
Computer Vision - A Modern Approach Set: Recognition by relations Slides by D.A. Forsyth Matching by relations Idea: –find bits, then say object is present.
Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Detecting Faces in Images: A Survey
Recognition by finding patterns
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Announcements Final Exam May 13th, 8 am (not my idea).
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
1 CS6825: Recognition 8. Hidden Markov Models 2 Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events HMMs allow you.
Matching with Invariant Features
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Lecture 5 Template matching
Announcements Final Exam May 16 th, 8 am (not my idea). Practice quiz handout 5/8. Review session: think about good times. PS5: For challenge problems,
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
1 Interest Operators Find “interesting” pieces of the image –e.g. corners, salient regions –Focus attention of algorithms –Speed up computation Many possible.
Computer Vision Fitting Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel, A. Zisserman,...
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
1 Interest Operator Lectures lecture topics –Interest points 1 (Linda) interest points, descriptors, Harris corners, correlation matching –Interest points.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Multi-view stereo Many slides adapted from S. Seitz.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views
Computer Vision Segmentation Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel,...
Feature tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on good features.
Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.
Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Slide credits for this chapter: Frank Dellaert, Forsyth & Ponce, Paul Viola, Christopher Rasmussen.
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Stepan Obdrzalek Jirı Matas
Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …
Template matching and object recognition
CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.
Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, …
Computer Vision Non-linear tracking Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, M. Isard, T. Darrell …
Some slides and illustrations from D. Forsyth, …
1 CS6825: Recognition – a sample of Applications.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Computer Vision Segmentation Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel,...
Face Recognition: An Introduction
1 Interest Operators Find “interesting” pieces of the image Multiple possible uses –image matching stereo pairs tracking in videos creating panoramas –object.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Recognition by finding patterns We have seen very.
Object Recognition in Images Slides originally created by Bernd Heisele.
Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,
Vision Overview  Like all AI: in its infancy  Many methods which work well in specific applications  No universal solution  Classic problem: Recognition.
Computer Vision Set: Object Recognition Slides by C.F. Olson 1 Object Recognition.
Features, Feature descriptors, Matching Jana Kosecka George Mason University.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Local features: detection and description
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
Paper Presentation: Shape and Matching
Brief Review of Recognition + Context
Recognition and Matching based on local invariant features
Presentation transcript:

Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …

Computer Vision 2 Jan 16/18-Introduction Jan 23/25CamerasRadiometry Jan 30/Feb1Sources & ShadowsColor Feb 6/8Linear filters & edgesTexture Feb 13/15Multi-View GeometryStereo Feb 20/22Optical flowProject proposals Feb27/Mar1Affine SfMProjective SfM Mar 6/8Camera CalibrationSegmentation Mar 13/15Springbreak Mar 20/22FittingProb. Segmentation Mar 27/29Silhouettes and Photoconsistency Linear tracking Apr 3/5Project UpdateNon-linear Tracking Apr 10/12Object Recognition Apr 17/19Range data Apr 24/26Final project Tentative class schedule

Computer Vision 3 Last class: Recognition by matching templates Classifiers PCA LDA decision boundaries, not prob.dens. dimensionality reduction maximize discrimination

Computer Vision 4 Last class: Recognition by matching templates Neural Networks Support Vector Machines Universal approximation property Optimal separating hyperplane (OSH) support vectors Convex problem! also for non-linear boundaries

Computer Vision 5 SVMs for 3D object recognition - Consider images as vectors - Compute pairwise OSH using linear SVM - Support vectors are representative views of the considered object (relative to other) - Tournament like classification - Competing classes are grouped in pairs - Not selected classes are discarded - Until only one class is left - Complexity linear in number of classes - No pose estimation (Pontil & Verri PAMI’98)

Computer Vision 6 Vision applications Reliable, simple classifier, –use it wherever you need a classifier Commonly used for face finding Pedestrian finding –many pedestrians look like lollipops (hands at sides, torso wider than legs) most of the time –classify image regions, searching over scales –But what are the features? –Compute wavelet coefficients for pedestrian windows, average over pedestrians. If the average is different from zero, probably strongly associated with pedestrian

Computer Vision 7 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Computer Vision 8 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Computer Vision 9 Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. Int. Conf. Computer Vision, 1998, copyright 1998, IEEE

Computer Vision 10 Latest results on Pedestrian Detection: Viola, Jones and Snow’s paper (ICCV’03: Marr prize) Combine static and dynamic features cascade for efficiency (4 frames/s) 5 best out of 55k (AdaBoost) 5 best static out of 28k (AdaBoost) some positive examples used for training

Computer Vision 11 Dynamic detection false detection: typically 1/400,000 (=1 every 2 frames for 360x240)

Computer Vision 12 Static detection

Computer Vision 13 Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.

Computer Vision 14 Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins

Computer Vision 15 Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Computer Vision 16 Probabilistic interpretation Write Assume Likelihood of image given pattern

Computer Vision 17 Possible alternative strategies Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities

Computer Vision 18 Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Computer Vision 19 Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Computer Vision 20 Example Training examples Test image

Computer Vision 21

Computer Vision 22

Computer Vision 23 Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be

Computer Vision 24 Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

Computer Vision 25 Detection This means we compare

Computer Vision 26 Issues Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize

Computer Vision 27 Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

Computer Vision 28 Pruning Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Example: finding people without clothes on –find skin –find extended skin regions –construct groups that pass local classifiers (i.e. lower arm, upper arm) –give these to broader scale classifiers (e.g. girdle)

Computer Vision 29 Pruning Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries.

Computer Vision 30 Horses

Computer Vision 31 Hidden Markov Models Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example

Computer Vision 32 Hidden Markov Models Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements

Computer Vision 33 HMM’s - dynamics

Computer Vision 34 HMM’s - the Joint and Inference

Computer Vision 35 Trellises Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log- posterior Trellis makes this easy, as follows.

Computer Vision 36

Computer Vision 37 Fitting an HMM I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc.

Computer Vision 38 HMM’s for sign language understanding-1 Build an HMM for each word

Computer Vision 39 HMM’s for sign language understanding-2 Build an HMM for each word Then build a language model

Computer Vision 40 Figure from “Real time American sign language recognition using desk and wearable computer based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE User gesturing For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun ), Starner and Pentland’s displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.

Computer Vision 41 HMM’s can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.

Computer Vision 42

Computer Vision 43 Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE

Computer Vision 44 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint Tuytelaars and Van Gool, BMVC2000

Computer Vision 45 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination

Computer Vision 46 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background

Computer Vision 47 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions

Computer Vision 48 Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions  Use local invariant features Invariant features = features that are preserved under a specific group of transformations Robust to occlusions and changes in background Robust to changes in viewpoint and illumination

Computer Vision 49 Affine geometric deformations Linear photometric changes Transformations for planar objects

Computer Vision 50 Local invariant features ‘Affine invariant neighborhood’

Computer Vision 51 Local invariant features

Computer Vision 52 Local invariant features Geometry-based region extraction –Curved edges –Straight edges Intensity-based region extraction

Computer Vision 53 Geometry-based method (curved edges)

Computer Vision 54 Geometry-based method (curved edges) 1.Harris corner detection

Computer Vision 55 Geometry-based method (curved edges) 2.Canny edge detection

Computer Vision 56 Geometry-based method (curved edges) 3.Evaluation relative affine invariant parameter along two edges

Computer Vision 57 Geometry-based method (curved edges) 4.Construct 1-dimensional family of parallelogram shaped regions

Computer Vision 58 Geometry-based method (curved edges) f 5.Select parallelograms based on local extrema of invariant function

Computer Vision 59 Geometry-based method (curved edges) 5.Select parallelograms based on local extrema of invariant function

Computer Vision 60 Geometry-based method (straight edges) Relative affine invariant parameters are identically zero!

Computer Vision 61 Geometry-based method (straight edges) 1.Harris corner detection

Computer Vision 62 Geometry-based method (straight edges) 2.Canny edge detection

Computer Vision 63 Geometry-based method (straight edges) 3.Fit lines to edges

Computer Vision 64 Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions

Computer Vision 65 Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions

Computer Vision 66 Intensity based method 1.Search intensity extrema 2.Observe intensity profile along rays 3.Search maximum of invariant function f(t) along each ray 4.Connect local maxima 5.Fit ellipse 6.Double ellipse size

Computer Vision 67 Intensity based method

Computer Vision 68 Comparison Intensity-based method More robust Geometry-based method Less computations More environments

Computer Vision 69 Robustness “Correct” detection of single environment cannot be guaranteed –Non-planar region –Noise, quantization errors –Non-linear photometric distortion –Perspective-distortion –… All regions of an object / image should be considered simultaneously

Computer Vision 70 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants e.g. Search for corresponding regions

Computer Vision 71 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants 3.Search for corresponding regions based on Mahalanobis distance 4.Check cross-correlation (after normalization) 5.Check consistency of correspondences Search for corresponding regions

Computer Vision 72 Semi-local constraints = check consistency of correspondences Epipolar constraint ( RANSAC ) based on 7 points Geometric constraints Photometric constraints based on a combination of only 2 regions

Computer Vision 73 Experimental validation degrees symmetric correct Number of matches

Computer Vision 74 Experimental validation symmetric correct scale Number of matches error

Computer Vision 75 Experimental validation symmetric correct Number of matches illumination reference

Computer Vision 76 Object recognition and localization ‘Appearance’-based approach = objects are modeled by a set of reference images Voting principle based on number of similar regions More invariance = requires less reference images

Computer Vision 77 Object recognition and localization

Computer Vision 78 Object recognition and localization

Computer Vision 79 Wide-baseline stereo

Computer Vision 80 Wide-baseline stereo

Computer Vision 81 Wide-baseline stereo

Computer Vision 82 = Searching of ‘similar’ images in a database based on image content Local features Similarity = images contain the same object or the same scene Voting principle –Based on the number of similar regions Content-based image retrieval from database

Computer Vision 83 Database ( > 450 images) Search image Content-based image retrieval from database

Computer Vision 84 Content-based image retrieval from database

Computer Vision 85 Content-based image retrieval from database

Computer Vision 86 Application: virtual museum guide

Computer Vision 87 Next class: Range data Reading: Chapter 21

Computer Vision 88 Talk 4pm tomorrow: Jean Ponce Three-Dimensional Computer Vision: Challenges and Opportunities Jean Ponce University of Illinois at Urbana-Champaign Ecole Normale Supirieure, Paris Abstract: This talk addresses two of the main challenges of computer vision: automatically recognizing three-dimensional (3D) object categories in photographs despite potential within-class variations, viewpoint changes, occlusion, and clutter; and recovering accurate models of 3D shapes observed in multiple images. I will first present a new approach to 3D object recognition that exploits local, semi-local, and global constraints to learn visual models of texture, object, and scene categories, and identify instances of these models in photographs. I will then discuss a novel algorithm that uses the geometric and photometric constraints associated with multiple calibrated photographs to construct high-fidelity solid models of complex 3D shapes in the form of carved visual hulls. I will conclude with a brief discussion of new application domains and wide open research issues. Joint work with Yasutaka Furukawa, Akash Kushal, Svetlana Lazebnik, Kenton McHenry, Fred Rothganger, and Cordelia