Jo˜ao Carreira, Abhishek Kar, Shubham Tulsiani and Jitendra Malik University of California, Berkeley CVPR2015 Virtual View Networks for Object Reconstruction.

Slides:



Advertisements
Similar presentations
Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
Advertisements

Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Simultaneous surveillance camera calibration and foot-head homology estimation from human detection 1 Author : Micusic & Pajdla Presenter : Shiu, Jia-Hau.
Presented by Xinyu Chang
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Patch to the Future: Unsupervised Visual Prediction
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Automatic Feature Extraction for Multi-view 3D Face Recognition
Kiyoshi Irie, Tomoaki Yoshida, and Masahiro Tomono 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Computer Vision Group University of California Berkeley Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint.
The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
A Study of Approaches for Object Recognition
Direct Methods for Visual Scene Reconstruction Paper by Richard Szeliski & Sing Bing Kang Presented by Kristin Branson November 7, 2002.
Face Recognition Based on 3D Shape Estimation
3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views
Distinctive Image Feature from Scale-Invariant KeyPoints
Scale Invariant Feature Transform (SIFT)
Image Matching via Saliency Region Correspondences Alexander Toshev Jianbo Shi Kostas Daniilidis IEEE Conference on Computer Vision and Pattern Recognition.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Shape Classification Using the Inner-Distance Haibin Ling David W. Jacobs IEEE TRANSACTION ON PATTERN ANAYSIS AND MACHINE INTELLIGENCE FEBRUARY 2007.
PhD Thesis. Biometrics Science studying measurements and statistics of biological data Most relevant application: id. recognition 2.
Creating and Exploring a Large Photorealistic Virtual Space INRIA / CSAIL / Adobe First IEEE Workshop on Internet Vision, associated with CVPR 2008.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Performance Evaluation of Grouping Algorithms Vida Movahedi Elder Lab - Centre for Vision Research York University Spring 2009.
The Three R’s of Vision Jitendra Malik.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Efficient Algorithms for Robust Feature Matching Mount, Netanyahu and Le Moigne November 7, 2000 Presented by Doe-Wan Kim.
3D Fingertip and Palm Tracking in Depth Image Sequences
A General Framework for Tracking Multiple People from a Moving Camera
Xiaoguang Han Department of Computer Science Probation talk – D Human Reconstruction from Sparse Uncalibrated Views.
Dense correspondences across scenes and scales Tal Hassner The Open University of Israel CVPR’14 Tutorial on Dense Image Correspondences for Computer Vision.
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Detection, Segmentation and Fine-grained Localization
Scale-less Dense Correspondences Tal Hassner The Open University of Israel ICCV’13 Tutorial on Dense Image Correspondences for Computer Vision.
Wenqi Zhu 3D Reconstruction From Multiple Views Based on Scale-Invariant Feature Transform.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
1 Image Matching using Local Symmetry Features Daniel Cabrini Hauagge Noah Snavely Cornell University.
COS429 Computer Vision =++ Assignment 4 Cloning Yourself.
Demosaicking for Multispectral Filter Array (MSFA)
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Implicit Active Shape Models for 3D Segmentation in MR Imaging M. Rousson 1, N. Paragio s 2, R. Deriche 1 1 Odyssée Lab., INRIA Sophia Antipolis, France.
Presented by David Lee 3/20/2006
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Deformation Modeling for Robust 3D Face Matching Xioguang Lu and Anil K. Jain Dept. of Computer Science & Engineering Michigan State University.
Spectral Methods for Dimensionality
Presented by David Lee 3/20/2006
Nonparametric Semantic Segmentation
Feature description and matching
Unsupervised Face Alignment by Robust Nonrigid Mapping
Modeling the world with photos
Finding Clusters within a Class to Improve Classification Accuracy
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
KFC: Keypoints, Features and Correspondences
Outline Background Motivation Proposed Model Experimental Results
Presentation transcript:

Jo˜ao Carreira, Abhishek Kar, Shubham Tulsiani and Jitendra Malik University of California, Berkeley CVPR2015 Virtual View Networks for Object Reconstruction

Outline Introduction Related work Virtual View Networks Reconstruction Experiments Conclusions

Structure from motion(SfM) multiple images with overlapping fields of view(O) Only a image or images but viewpoints far apart(X)

Similar objects

Virtual View generation

Goal Reconstruct an object from a single image using SfM on virtual views Training set : Using 2D data set (PASCAL VOC data)

challenges Align the target object with every different object in a collection increasing robustness to the multitude of noise sources we have by extrapolating synthetic inliers using domain knowledge making the resulting reconstructions more specific,by emphasizing collection objects more similar to the target object

Related work

2D alignment approaches  class-specific sparse ones: keypoints [11,5]  class-agnostic dense ones: SIFTflow [36,42]  Denser uniform grid of points inside each object (>100) [5] P. N. Belhumeur, D.W. Jacobs, D. Kriegman, and N. Kumar.Localizing parts of faces using a consensus of exemplars. InCVPR, pages 545–552. IEEE, 2011 [11] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models-their training and application. CVIU, 61(1):38–59, [36] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, [42] M. Rubinstein, A. Joulin, J. Kopf, and C. Liu. Unsupervised joint object discovery and segmentation in internet images. In CVPR, pages 1939–1946, 2013.

Virtual View Networks

a collection of training images: Keypoints for each image : rotation matrices: grid of 2D locations for matching in each image:

Network Construction match separately each object in the collection to a fixed number of nearest neighbors in pose space (30- NN) with distance cost of matching points: point u match v in i and j images : weighting parameter

thin plate splines

Docking to the Network Docking target image to network: Change to fit to map between the 4 corners of the bounding boxes of the target object and a docking object (since no keypoints)

Fast Alignment at Test Time Assuming KNN matching is used and hence we only need to identify points in training objects having minimum geodesics to points in target objects Precompute the KNN matchings using network distances between all pairs of objects in the network Simply selecting the outgoing edges having minimum weight Assuming there is at most a single edge from a test object point to each network node in docking objects

we manage to align a test object to a collection of roughly 1000 objects having points in around half a second on a modern desktop computer, instead of in more than a minute using Dijkstra algorithm.

Reconstruction

challenges Integrating sparse far apart views of the target object noise in virtual views synthesized from training objects

Synthetic Inlier Extrapolation we sample a constant number of equally spaced points along 2D lines connecting all pairs of ground truth keypoints in the training images

Blue: reconstructing points from an image Red: reconstructing points from an image its mirrored version Green:extrapolated synthetic inliers

Observation matrix

Building up Target Specificity strategies for increasing the specificity of the reconstruction Resampling: simply resampling important instances’s rows in the observation matrix xy-Snapping: The points from the target object are the only ones we can trust blindly.We enforce this as post-processing by snapping the points of the target object back to their original coordinates in a reference image after reconstruction

Experiments

All experiments used PASCAL VOC [16], where there are segmentations and around 10 keypoints available for all objects in each class [20] 9087 fully visible objects 80% training data and 20% test data built virtual view networks on the training images and their mirrored versions [16] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge.International journal of computer vision, 88(2):303–338,2010. [20] B. Hariharan, P. Arbel´aez, L. Bourdev, S. Maji, and J. Malik. Semantic contours from inverse detectors. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 991–998. IEEE, 2011.

2D Alignment resized the image of each object to be 150 pixels tall computed a grid of features by AlexNet convolutional network[34] 640 dimensional feature vectors at each grid location SIFT features

Average per-pair matching error each ground truth keypoint u on the test image C is the set of corresponding points on a training image according to the matching is the position of ground truth keypoint u on the training image

Euclidean and Network Distance

Pose Prediction and Alignment

Reconstruction automatic pose prediction SIFTflow

Conclusions Introduced a framework for shape reconstruction from a single image of a target object a method for 2D alignment that builds a network over the image collection in order to achieve wide viewpoint variation