Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell.

Similar presentations


Presentation on theme: "From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell."— Presentation transcript:

1 From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell

2 Back pack model Test Images Conventional Approach : collect training images Bicycle model Our Approach : generate 2D detector directly from 3D model Virtual Back pack model Virtual Bicycle model Back pack model Bicycle model Adapt

3 Motivation

4 Where is “mug”? Detector fails! Solution? Solution #1: Get more training data from the web: Problem: limited data, rare poses

5 Where is “mug”? Detector fails! Solution? many models available any pose possible! Solution #1: Get more training data from the web: Solution #2: Synthetic data from crowdsourced CAD models …but, can we generate realistic images?

6 real or fake?

7

8 real fake

9 Kate Saenko, UMass Lowell CAD models + Photorealistic Rendering Example images Project Cars

10 Bad news  -photorealism is hard to achieve -must model BRDF, albedo, lighting, etc. -also model scene: cast shadows, etc. -most free models don’t Good news -we don’t need photorealism! -model the information we have, ignore the rest* -model: 3D shape, pose -ignore: scene, textures, color *obviously, this will not work for all objects, but it will work for many

11 Back pack model Test Images Conventional Approach : collect training images Bicycle model Our Approach : generate 2D detector directly from 3D model Virtual Back pack model Virtual Bicycle model Back pack model Bicycle model Adapt

12 Virtual Data Generation

13 Kate Saenko, UMass Lowell Trimble 3D Warehouse: 1-12

14 Example models Collected models for 31 objects in “Office” dataset Selected 2 models per category manually from first page of results

15 Kate Saenko, UMass Lowell Virtual data generation For each CAD model, 15 random poses were generated by rotating the original 3D model by a random angle from 0 to 20 degrees in each of the three axes Note, we do not attempt to model multiview components in this paper, so pose variation is intentionally small Then two types of background and texture are applied to each pose to get the final 2D virtual image: Virtual or Virtual-Gray

16 Two types of virtual data generated “Virtual”“Virtual-Gray” random background and texture from ImageNetgray texture and white background

17 Back pack model Test Images Conventional Approach : collect training images Bicycle model Our Approach : generate 2D detector directly from 3D model Virtual Back pack model Virtual Bicycle model Back pack model Bicycle model Adapt

18 Virtual-to-Real Adaptation

19 Goal: model what we have, ignore the rest color 3D shape light direction light spectrum specularity viewpoint background camera noise, etc… Question: How to ignore difference between resulting non- realistic images and real test images? Answer: domain adaptation

20 Kate Saenko, UMass Lowell Detection approach We start with the standard discriminative scoring function for image I and some region b Keep regions with high score In our case, is learned via Linear Discriminant Analysis (LDA) features are HOG (could use others)

21 Kate Saenko, UMass Lowell Hariharan, Malik and Ramanan, ECCV 2012 Inspired by this work Assume generative model: Class-conditional feature distribution Assume covariance S is same for all classes Then classifier is efficiently computed as Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classification. In Computer Vision–ECCV 2012.

22 Main idea Source points x: labeled virtual images Target points u: unlabeled real-domain images

23 Main idea (a) source (d) decorrelated target

24 Kate Saenko, UMass Lowell Main idea if T is target covariance, S is source covariance, then Note, this corresponds to different whitening operation Also, if source and target are the same, this reduces to

25 Kate Saenko, UMass Lowell Main idea if T is target covariance, S is source covariance, then Note, this corresponds to different whitening operation Also, if source and target are the same, this reduces to

26 Effect of mismatched statistics

27 Kate Saenko, UMass Lowell Synthetic vs. Real Bike model example Synthetic “bike” performs better on webcam target images!

28 Experiments

29 Data *[Saenko et al, ECCV2010] Training: Test: webcam

30 Results: compare to [10] [10] Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classification. In Computer Vision–ECCV 2012.

31 Results: compare to [7] [10] Bharath Hariharan, Jitendra Malik, and Deva Ramanan. Discriminative decorrelation for clustering and classification. In Computer Vision–ECCV 2012. [7] Daniel Goehring, Judy Hoffman, Erik Rodner, Kate Saenko, and Trevor Darrell. Inter- active adaptation of real-time object detectors. In International Conference on Robotics and Automation (ICRA), 2014.

32 Training on Imagenet

33 Related Work

34 Kate Saenko, UMass Lowell Related work: Synthetic data in Computer Vision Kaneva’11: local features Optic Flow benchmark Evaluation of Image Features Using a Photorealistic VirtualWorld Biliana Kaneva, Antonio Torralba, William T. Freeman, ICCV 2011

35 Kate Saenko, UMass Lowell Related work: CAD models for Object Detection Stark, Michael, Michael Goesele, and Bernt Schiele. "Back to the Future: Learning Shape Models from 3D CAD Data." BMVC. Vol. 2. No. 4. 2010. limited to cars Wohlkinger, Walter, et al. "3DNet: Large-scale object class recognition from CAD models." Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012. Point cloud test data

36 Future work

37 Kate Saenko, UMass Lowell Next Steps: more factors Generate multi-component models – Pose variation – Intra-category variation Analytic model of illumination? – illumination manifolds can be computed from just 3 linearly independent sources (Nayar and Murase ‘96) Model of other reconstructive factors?

38 Kate Saenko, UMass Lowell Next steps: “go deep” rCNN 60,000 images of 20 categories

39 Thanks!


Download ppt "From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains Baochen Sun and Kate Saenko UMass Lowell."

Similar presentations


Ads by Google