Recognition of 3D Objects or, 3D Recognition of Objects Alec Rivers.

Slides:

Advertisements

Similar presentations

Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.

Advertisements

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Top-Down & Bottom-Up Segmentation

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Face Alignment with Part-Based Modeling

2D matching part 2 Review of alignment methods and

Learning Visual Similarity Measures for Comparing Never Seen Objects Eric Nowak, Frédéric Jurie CVPR 2007.

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Robust and large-scale alignment Image from

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.

TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.

Distinctive Image Feature from Scale-Invariant KeyPoints

Abstract Extracting a matte by previous approaches require the input image to be pre-segmented into three regions (trimap). This pre-segmentation based.

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

Announcements Readings for today:

Fitting a Model to Data Reading: 15.1,

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer vision.

3D LayoutCRF Derek Hoiem Carsten Rother John Winn.

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Professor: S. J. Wang Student : Y. S. Wang

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Associative Hierarchical CRFs for Object Class Image Segmentation

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

1 Scale and Rotation Invariant Matching Using Linearly Augmented Tree Hao Jiang Boston College Tai-peng Tian, Stan Sclaroff Boston University.

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

MultiModality Registration Using Hilbert-Schmidt Estimators By: Srinivas Peddi Computer Integrated Surgery II April 6 th, 2001.

John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Photoconsistency constraint C2 q C1 p l = 2 l = 3 Depth labels If this 3D point is visible in both cameras, pixels p and q should have similar intensities.

SIFT Scale-Invariant Feature Transform David Lowe

LOCUS: Learning Object Classes with Unsupervised Segmentation

Brief Review of Recognition + Context

“Traditional” image segmentation

Presentation transcript:

Recognition of 3D Objects or, 3D Recognition of Objects Alec Rivers

Overview 3D object recognition was dead, now it’s coming back – These papers are within the last 2 years Doesn’t really work yet, but it’s just a beginning

Papers The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects – CVPR D LayoutCRF for Multi-View Object Class Recognition and Segmentation – CVPR D Generic Object Categorization, Localization and Pose Estimation – ICCV 2007

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects John Winn Microsoft Research Cambridge Jamie Shotton University of Cambridge

Introduction Needed to understand next paper – It’s 2D What does it try to solve? – Recognize one class of object at one pose and one scale, but with occlusions Does it work? – Yes, really well, especially given occlusions

Introduction What is interesting about it? – Segments objects – Interesting methods No sliding windows – Multiple instances for free

Overview Instead of sparse parts at features, use a densely covering part grid [Fischler & Elschlager 73] [Winn & Shotton 06]

Recognizing New Image – Overview Walk through an example

Recognizing a New Image – Overview 1. Pixels guess their part

Recognizing a New Image – Overview 2. Maximize layout consistency

Layout Consistency Defined pairwise between two pixels: P I, P J => Bool Means pixels I, J could be part of one instance Toy example: Object: 1,2,3,4,5 Image: 2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0

Layout Consistency Defined pairwise between two pixels: P I, P J => Bool Means pixels I, J could be part of one instance Toy example: Object: 1,2,3,4,5 Image: 2,3,4,5,0,0,1,2,3,4,5,2,3,4,5,0,0 occlusion instance 2instance 3instance 1

Layout Consistency In 2D, consistent IFF their relative assignments could exist in a deformed regular grid Formally:

Overview 2. Maximize layout consistency

Layout Consistency 3. Find consistent regions; create instances Possible due to layout inconsistency at occluding borders

Overview 1. Pixels guess parts 2. Maximize layout consistency 3. Create instances [Winn & Shotton 06]

Implementation Details Trained on manually segmented data Crux of algorithm is conditional distribution – Like a probability for each possibility, or a score Algorithm is just finding maximum

Part Appearance Each pixel prefers parts that match surrounding image data Randomized decision trees – Multiple trees, each trained on a subset of the data – Node is maximal-information-gain binary test on two nearby pixels’ intensities – Leaf of node is histogram of part possibilities – Actual preference is average over all trees

Deformed Training Part Labelings Fits parts tighter 1. Label by grid 2. Learn from data 3. Apply to data 4. Set guesses as truth 5. Relearn

Part Layout Preference for layout consistency plus additional pairwise costs: Helps remove noise Align edges along image edges

Part Layout Return to toy example Just appearance: 1,2,0,4,5,0,0,1,2,3,3,4,0,0,1,0 With layout costs: 1,2,3,4,5,0,0,1,2,3,3,4,0,0,0,0 instance 2instance 1

Instance Layout Apply weak force trying to keep parts at sane positions relative to instance data (centroid, L/R flip) Toy example: 0,1,1,1,1,1,2,3,4,5 is bad!

Implementation Theoretically, finding global maximum of This is “MAP” estimation – MAP = Maximum A Posteriori In reality, using tricks to find a local maximum – α-expansion, annealed expansion move

Approximating MAP Estimation Global maximum is intractable α-expansion – Start with given configuration – For a given new label, ask each pixel: do you want to switch? – Can be solved efficiently with graph cuts Repeat over all part labels Annealed expansion move – Relabel grid, but offset to avoid local maxima

Results

Oh, snap!

Thoughts Bottom-up system is great – No sliding windows – Multiple instances for free Information about segment boundaries: occlusion vs. completion – Reason about complete segment boundaries?

John Winn 3D LayoutCRF for Multi-View Object Class Recognition and Segmentation Derek Hoiem Carnegie Mellon University Carsten Rother Microsoft Research Cambridge

Introduction What does it try to solve? – Extend LayoutCRF to be pose and scale invariant Does it work? – Improvements to LayoutCRF work; 3D information does little What is interesting about it? – One method for combining 2D methods with a 3D framework – The improvements to 2D are good

Overview Generate rough 3D model of class Parts created over 3D model

Overview Probability distribution

Refinements Part layout, instance layout take into account 3D position

Refinements New term: Instance cost

Instance Cost Eliminates false positives – LayoutCRF: object-background cost Explain multiple groups with one instance

Refinements New term: Instance appearance

Instance appearance Learn color distribution for each instance Separate groups of pixels: definitely object, definitely background Use these to learn colors Apply cost to non-standard-color pixels This would fail…

Implementation Details Parts are learned separately for each 45 o viewing range, and for different scales Instance layout is also discretized by viewpoint

Results – Comparison to LCRF A little better (+ 8% recall) BUT they actually turn off 3D information for this comparison Better segmentation

Results – PASCAL % precision-recall – Previous best: 45% – But, reduced test set Without 3D: -5% Without color: -5%

Thoughts Color, instance costs very nice Shoehorns LCRF into 3D without much success LCRF is already somewhat viewpoint-invariant: segments can stretch

3D Generic Object Categorization, Localization and Pose Estimation Silvio Savarese University of Illinois at Urbana-Champaign Fei-Fei Li Princeton University

Introduction What does it try to solve? – Multiclass pose-invariant, scale-invariant object recognition Does it work? – Not well. But it may be due to implementation Why is it interesting? – Attempt learn actual 3D structure of an object – Interesting data structure for 3D info

Overview – Data Structure Decompose object into large parts; find “canonical view” Relate parts by mutual appearance

Related Work – Aspect Graphs Represent stable views rather than parts Image [Khoh & Kovesi, 99] Aspect graph of a cube:

Data Structure for Cube Left Front Right Bottom Top Back

Related Work Constellation models Similar, but wraps around in 3D vs.

Implementation – Links Link from canonical P I to P J consists of Matrix defines transformation to observe P J when P I is viewed canonically A IJ is skew, t IJ is translation

Implementation – Links H IJ Part J canonical view Part I canonical view

Implementation – Links Part J canonical view H JI Part I canonical view

Overview Learn data structure from images (unsupervised) Apply to new image by recognizing parts and selecting model that best accounts for their appearances

Implementation – Learning Parts Tricky implementation! Part = collection of SIFT features For each pair of images of the same instance: 1. Find set M of shared SIFT features 2. RANSAC M to find a group of pairs that transform together 3. Group close-together parts of M into candidate parts

Background: What is RANSAC? Finds subset of data that is accounted for by some model; ignores outliers 1. Guess points 2. Fit model 3. Select matching points 4. Calculate error Repeat!

RANSAC In our case: find points for which a homographic transformation of the points in image I yield the points in image J

Implementation – Canonical Views Goal: front-facing view of part Construct directed graph – Direction means “more front-facing” Traverse to find canonical view How to go from pairwise-defined to graph?

Implementation Upshot: a collection of parts with canonical views and links

Recognizing a New Image 1. Extract SIFT features 2. Use scanning windows to get 5 best canonical part matches 3. For every pair of found parts, for each model, score how well the model accounts for their relative appearances 4. Select the model with the best score

Results Not stellar New test set – Overfit? – Comparison?

Results

Thoughts Low performance may make it useless as a system, but the data structure is very nice Implementation has a lot of tricky parts – Doesn’t seem to select great canonical parts – I wonder if there’s a simpler way – Are SIFT features the right choice?

Extremely Confusing Figure “Each dashed box indicates a particular view. A subset of the canonical parts is presented for each view. Part relationships are denoted by arrows.”

Overall Conclusions 3D is just starting out. Doesn’t work too well right now, but neither did MV at the beginning. LayoutCRF: – Nice method to learn 2D patches 3D Object Categorization: – Nice conceptual model relating 3D parts Possible to combine strengths of both?