From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell.

Slides:

Advertisements

Similar presentations

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Advertisements

Joint Face Alignment The Recognition Pipeline

Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI.

Change Detection C. Stauffer and W.E.L. Grimson, “Learning patterns of activity using real time tracking,” IEEE Trans. On PAMI, 22(8): , Aug 2000.

Recognizing Surfaces using Three-Dimensional Textons Thomas Leung and Jitendra Malik Computer Science Division University of California at Berkeley.

Sami Romdhani Volker Blanz Thomas Vetter University of Freiburg

Vision Sensing. Multi-View Stereo for Community Photo Collections Michael Goesele, et al, ICCV 2007 Venus de Milo.

Large Scale Visual Recognition Challenge (ILSVRC) 2013: Detection spotlights.

Robust 3D Head Pose Classification using Wavelets by Mukesh C. Motwani Dr. Frederick C. Harris, Jr., Thesis Advisor December 5 th, 2002 A thesis submitted.

Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.

Robust Object Tracking via Sparsity-based Collaborative Model

Rob Fergus Courant Institute of Mathematical Sciences New York University A Variational Approach to Blind Image Deconvolution.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Problem: SVM training is expensive – Mining for hard negatives, bootstrapping Solution: LDA (Linear Discriminant Analysis). – Extremely fast training,

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

The Viola/Jones Face Detector (2001)

Multimodal Templates for Real-Time Detection of Texture-less Objects in Heavily Cluttered Scenes Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart,

Advanced Computer Vision Introduction Goal and objectives To introduce the fundamental problems of computer vision. To introduce the main concepts and.

LYU0603 A Generic Real-Time Facial Expression Modelling System Supervisor: Prof. Michael R. Lyu Group Member: Cheung Ka Shun ( ) Wong Chi Kin ( )

Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.

Face Recognition Based on 3D Shape Estimation

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

Object recognition under varying illumination. Lighting changes objects appearance.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.

Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Matthias Wimmer, Bernd Radig, Michael Beetz Chair for Image Understanding Computer Science Technische Universität München Adaptive.

Bag-of-Words based Image Classification Joost van de Weijer.

The Three R’s of Vision Jitendra Malik.

Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.

Online Tracking of Outdoor Lighting Variations for Augmented Reality with Moving Cameras Yanli Liu 1,2 and Xavier Granier 2,3,4 1: College of Computer.

Accidental pinhole and pinspeck cameras: revealing the scene outside the picture A. Torralba and W. T. Freeman Proceedings of 25 IEEE Conference on Computer.

MESA LAB Two papers in icfda14 Guimei Zhang MESA LAB MESA (Mechatronics, Embedded Systems and Automation) LAB School of Engineering, University of California,

Interactively Modeling with Photogrammetry Pierre Poulin Mathieu Ouimet Marie-Claude Frasson Dép. Informatique et recherche opérationnelle Université de.

A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.

Scale-less Dense Correspondences Tal Hassner The Open University of Israel ICCV’13 Tutorial on Dense Image Correspondences for Computer Vision.

Computer Vision Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications –building representations.

Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.

1 Webcam Mouse Using Face and Eye Tracking in Various Illumination Environments Yuan-Pin Lin et al. Proceedings of the 2005 IEEE Y.S. Lee.

Object Detection with Discriminatively Trained Part Based Models

Object Recognition in Images Slides originally created by Bernd Heisele.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Determining the location and orientation of webcams using natural scene variations Nathan Jacobs.

Recognition Using Visual Phrases

Yizhou Yu Texture-Mapping Real Scenes from Photographs Yizhou Yu Computer Science Division University of California at Berkeley Yizhou Yu Computer Science.

Tal Amir Advanced Topics in Computer Vision May 29 th, 2015 COUPLED MOTION- LIGHTING ANALYSIS.

Hybrid Classiﬁers for Object Classiﬁcation with a Rich Background M. Osadchy, D. Keren, and B. Fadida-Specktor, ECCV 2012 Computer Vision and Video Analysis.

Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Presenter: Jae Sung Park

Jo˜ao Carreira, Abhishek Kar, Shubham Tulsiani and Jitendra Malik University of California, Berkeley CVPR2015 Virtual View Networks for Object Reconstruction.

Object detection with deformable part-based models

University of Ioannina

An Additive Latent Feature Model

Rogerio Feris 1, Ramesh Raskar 2, Matthew Turk 1

Recognition: Face Recognition

Real-Time Human Pose Recognition in Parts from Single Depth Image

HOGgles Visualizing Object Detection Features

Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”

Image Based Modeling and Rendering (PI: Malik)

Brief Review of Recognition + Context

Object Detection + Deep Learning

Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,

Pose Estimation for non-cooperative Spacecraft Rendevous using CNN

Outline Background Motivation Proposed Model Experimental Results

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

Presentation transcript:

From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell

Back pack model Test Images Conventional Approach : collect training images Bicycle model Our Approach : generate 2D detector directly from 3D model Virtual Back pack model Virtual Bicycle model Back pack model Bicycle model Adapt

Motivation

Where is “mug”? Detector fails! Solution? Solution #1: Get more training data from the web: Problem: limited data, rare poses

Where is “mug”? Detector fails! Solution? many models available any pose possible! Solution #1: Get more training data from the web: Solution #2: Synthetic data from crowdsourced CAD models …but, can we generate realistic images?

real or fake?

real fake

Kate Saenko, UMass Lowell CAD models + Photorealistic Rendering Example images Project Cars

Bad news  -photorealism is hard to achieve -must model BRDF, albedo, lighting, etc. -also model scene: cast shadows, etc. -most free models don’t Good news -we don’t need photorealism! -model the information we have, ignore the rest* -model: 3D shape, pose -ignore: scene, textures, color *obviously, this will not work for all objects, but it will work for many

Back pack model Test Images Conventional Approach : collect training images Bicycle model Our Approach : generate 2D detector directly from 3D model Virtual Back pack model Virtual Bicycle model Back pack model Bicycle model Adapt

Virtual Data Generation

Kate Saenko, UMass Lowell Trimble 3D Warehouse: 1-12

Example models Collected models for 31 objects in “Office” dataset Selected 2 models per category manually from first page of results

Kate Saenko, UMass Lowell Virtual data generation For each CAD model, 15 random poses were generated by rotating the original 3D model by a random angle from 0 to 20 degrees in each of the three axes Note, we do not attempt to model multiview components in this paper, so pose variation is intentionally small Then two types of background and texture are applied to each pose to get the ﬁnal 2D virtual image: Virtual or Virtual-Gray

Two types of virtual data generated “Virtual”“Virtual-Gray” random background and texture from ImageNetgray texture and white background

Back pack model Test Images Conventional Approach : collect training images Bicycle model Our Approach : generate 2D detector directly from 3D model Virtual Back pack model Virtual Bicycle model Back pack model Bicycle model Adapt

Virtual-to-Real Adaptation

Goal: model what we have, ignore the rest color 3D shape light direction light spectrum specularity viewpoint background camera noise, etc… Question: How to ignore difference between resulting non- realistic images and real test images? Answer: domain adaptation

Kate Saenko, UMass Lowell Detection approach We start with the standard discriminative scoring function for image I and some region b Keep regions with high score In our case, is learned via Linear Discriminant Analysis (LDA) features are HOG (could use others)

Kate Saenko, UMass Lowell Hariharan, Malik and Ramanan, ECCV 2012 Inspired by this work Assume generative model: Class-conditional feature distribution Assume covariance S is same for all classes Then classifier is efficiently computed as Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classiﬁcation. In Computer Vision–ECCV 2012.

Main idea Source points x: labeled virtual images Target points u: unlabeled real-domain images

Main idea (a) source (d) decorrelated target

Kate Saenko, UMass Lowell Main idea if T is target covariance, S is source covariance, then Note, this corresponds to different whitening operation Also, if source and target are the same, this reduces to

Kate Saenko, UMass Lowell Main idea if T is target covariance, S is source covariance, then Note, this corresponds to different whitening operation Also, if source and target are the same, this reduces to

Effect of mismatched statistics

Kate Saenko, UMass Lowell Synthetic vs. Real Bike model example Synthetic “bike” performs better on webcam target images!

Experiments

Data *[Saenko et al, ECCV2010] Training: Test: webcam

Results: compare to [10] [10] Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classiﬁcation. In Computer Vision–ECCV 2012.

Results: compare to [7] [10] Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classiﬁcation. In Computer Vision–ECCV [7] Daniel Goehring, Judy Hoffman, Erik Rodner, Kate Saenko, and Trevor Darrell. Inter- active adaptation of real-time object detectors. In International Conference on Robotics and Automation (ICRA), 2014.

Training on Imagenet

Related Work

Kate Saenko, UMass Lowell Related work: Synthetic data in Computer Vision Kaneva’11: local features Optic Flow benchmark Evaluation of Image Features Using a Photorealistic VirtualWorld Biliana Kaneva, Antonio Torralba, William T. Freeman, ICCV 2011

Kate Saenko, UMass Lowell Related work: CAD models for Object Detection Stark, Michael, Michael Goesele, and Bernt Schiele. "Back to the Future: Learning Shape Models from 3D CAD Data." BMVC. Vol. 2. No limited to cars Wohlkinger, Walter, et al. "3DNet: Large-scale object class recognition from CAD models." Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, Point cloud test data

Future work

Kate Saenko, UMass Lowell Next Steps: more factors Generate multi-component models – Pose variation – Intra-category variation Analytic model of illumination? – illumination manifolds can be computed from just 3 linearly independent sources (Nayar and Murase ‘96) Model of other reconstructive factors?

Kate Saenko, UMass Lowell Next steps: “go deep” rCNN 60,000 images of 20 categories

Thanks!