3D Scene Models 6.870 Object recognition and scene understanding Krista Ehinger.

Slides:



Advertisements
Similar presentations
Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.
Advertisements

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.
The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
RGB-D object recognition and localization with clutter and occlusions Federico Tombari, Samuele Salti, Luigi Di Stefano Computer Vision Lab – University.
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Dynamic Occlusion Analysis in Optical Flow Fields
Image Segmentation Image segmentation (segmentace obrazu) –division or separation of the image into segments (connected regions) of similar properties.
Tour Into the Picture: Using a Spidery Mesh to Make Animation from a Single Image Youichi Horry, Ken-ichi Anjyo, and Kiyoshi Arai SIGGRAPH 1997 presented.
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
1Ellen L. Walker Edges Humans easily understand “line drawings” as pictures.
A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.
Single-view metrology
Contents Description of the big picture Theoretical background on this work The Algorithm Examples.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Scene Modeling for a Single View : Computational Photography Alexei Efros, CMU, Fall 2005 René MAGRITTE Portrait d'Edward James …with a lot of slides.
High-Quality Video View Interpolation
Lecture 20: Light, color, and reflectance CS6670: Computer Vision Noah Snavely.
Color a* b* Brightness L* Texture Original Image Features Feature combination E D 22 Boundary Processing Textons A B C A B C 22 Region Processing.
Lecture 2: Image filtering
Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University.
Automatic Photo Pop-up Derek Hoiem Alexei A. Efros Martial Hebert.
December 2, 2014Computer Vision Lecture 21: Image Understanding 1 Today’s topic is.. Image Understanding.
Scene Modeling for a Single View : Computational Photography Alexei Efros, CMU, Spring 2010 René MAGRITTE Portrait d'Edward James …with a lot of.
Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.
Single-view metrology
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Image-based modeling Digital Visual Effects, Spring 2008 Yung-Yu Chuang 2008/5/6 with slides by Richard Szeliski, Steve Seitz and Alexei Efros.
Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram.
Information Extraction from Cricket Videos Syed Ahsan Ishtiaque Kumar Srijan.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
Single-view 3D Reconstruction Computational Photography Derek Hoiem, University of Illinois 10/18/12 Some slides from Alyosha Efros, Steve Seitz.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute, CMU Presenter: Derek Hoiem CS 598, Spring 2009 Jan 29,
An Interactive Background Blurring Mechanism and Its Applications NTU CSIE 1 互動式背景模糊.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.
Single-view 3D Reconstruction Computational Photography Derek Hoiem, University of Illinois 10/12/10 Some slides from Alyosha Efros, Steve Seitz.
Soccer Video Analysis EE 368: Spring 2012 Kevin Cheng.
1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.
Single-view 3D Reconstruction
(c) 2000, 2001 SNU CSE Biointelligence Lab Finding Region Another method for processing image  to find “regions” Finding regions  Finding outlines.
An Interactive Background Blurring Mechanism and Its Applications NTU CSIE Yan Chih-Yu Advisor: Wu Ja-Ling, Ph.D. 1.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
12/15/ Fall 2006 Two-Point Perspective Single View Geometry Final Project Faustinus Kevin Gozali an extended tour.
REU WEEK III Malcolm Collins-Sibley Mentor: Shervin Ardeshir.
Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.
Photo-Quality Enhancement based on Visual Aesthetics S. Bhattacharya*, R. Sukthankar**, M.Shah* *University of Central Florida, **Intel labs & CMU.
Problem Set 2 Reconstructing a Simpler World COS429 Computer Vision Due October (one week from today)13 th.
Representation in Vision Derek Hoiem CS 598, Spring 2009 Jan 22, 2009.
Computer Vision Computer Vision based Hole Filling Chad Hantak COMP December 9, 2003.
Processing Images and Video for An Impressionist Effect Automatic production of “painterly” animations from video clips. Extending existing algorithms.
YOUICHI HORRY, KEN-ICHI ANJYO, KIYOSHI ARAI PROCEEDINGS OF THE 24TH ANNUAL CONFERENCE ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES Tour Into the Picture.
Motion and optical flow
A Plane-Based Approach to Mondrian Stereo Matching
Single-view metrology
It’s a 3D World, After All Alyosha Efros CMU.
Scene Modeling for a Single View
Computational Photography Derek Hoiem, University of Illinois
Object detection as supervised classification
Common Classification Tasks
Vehicle Segmentation and Tracking in the Presence of Occlusions
RGB-D Image for Scene Recognition by Jiaqi Guo
Cascaded Classification Models
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
Finding Line and Curve Segments from Edge Images
“Traditional” image segmentation
Directional Occlusion with Neural Network
Presentation transcript:

3D Scene Models Object recognition and scene understanding Krista Ehinger

Questions What makes a good 3D scene model? How accurate does it need to be? How far can you get with automatic surface detection? Where do you need human input?

Modelling the scene Real scenes have way too many surfaces

Modelling the scene Option 1: Diorama world

Tour Into the Picture (TIP)‏ Model the scene as 5 planes + foreground objects Easy implementation: planes/objects defined by humans Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

TIP Implementation User defines vanishing point, rear wall of the scene (inner rectangle)‏ Given some assumptions about the camera, position/size of all planes can be computed... Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

Defining the box Define planes: Floor -> y=0, Ceiling -> y=H Given horizon (vanishing point), corners of floor, ceiling can be computed from 2D image position Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

Defining the box Once the positions of the planes are known, compute the texture of the planes Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

What about foreground objects? Assume a quadrangle attached to floor, compute attachment points, upper points Hierarchical model of foreground objects Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

Extracting foreground objects Foreground objects removed, added to mask Holes in background filled in using photo completion software Y. Horry, K.I. Anjyo and K. Arai. "Tour Into the Picture: Using a spidery mesh user interface to make animation from a single image". ACM SIGGRAPH 1997

TIP Demonstration

TIP Discussion Pros:  Accurate model (due to human input)‏  Deals with foreground objects, occlusions Cons:  Requires human input, not automatic  Model too simple for many real-world scenes

Modelling the scene Option 2: Pop-up book world

Automatic Photo Pop-Up Three classes of surface: ground, sky, vertical Not just a box: can model more kinds of scenes Automatic classification, no labeling D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

Photo Pop-Up Implementation Pixels -> superpixels -> constellations Automatic labeling of constellations as ground, vertical, or sky Define angles of vertical planes (using attachment to ground)‏ Map textures to vertical planes (as in TIP)‏ D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

Superpixels, constellations Superpixels are neighboring pixels that have nearly the same color (Tao et al, 2001)‏ Superpixels assigned to constellations according to how likely they are to share a label (ground, vertical, sky) based on difference between feature vectors

Feature vectors Color features: RGB, hue, saturation Texture features: Difference of oriented Gaussians, Textons Location (absolute and percentile)‏ N superpixels in constellation Line and intersection detectors Not used: constellation shape (contiguous, N sides), some texture features

Training process For each of 82 labeled training images  Compute superpixels, features, pairwise likelihoods  Form a set of N constellations (N = 3 to 25), each labeled with ground truth  Compute constellation features Compute constellation label, homogeneity likelihood:

Training process Adaboost weak classifiers learn to estimate whether superpixels have same label (based on feature vector)‏ Another set of Adaboost week classifiers learns constellation label, homogeneity likelihood (expressed as percent ground, vertical, sky, mixed)‏ Emphasis on classifying larger constellations

Building the 3D model Along vertical/ground boundary, fit line segments (Hough transform) – goal is to find simplest shape (fewest lines)‏ Project lines up from corners of boundary lines, cut and fold D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

Photo Pop-Up Demonstration D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

Photo Pop-Up Discussion Pros:  Automatic  Can handle a variety of scenes, not just boxes Cons:  No handling of foreground objects  Misclassification leads to very strange models  Only 2 kinds of surface: ground, vertical D. Hoiem, A.A. Efros, and M. Hebert, "Automatic Photo Pop-up", ACM SIGGRAPH 2005.

Modelling the scene Option 3: Actually try to model surface angles

3D Scene Structure from Still Image Compute surface normal for each surface No right-angle assumptions; surfaces can have any angle Automatic (trained on images with known depth maps)‏

3D Scene Implementation Segment image into superpixels Estimate surface normal of each superpixel (using Markov Random Field model)‏ Optional: Detect and extract foreground objects Map textures to planes Original imageModeled depth map A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007

Image features Superpixel features (xi)‏  Color and texture features as in Photo Pop-Up  Vector also includes features of neighboring superpixels Boundary features (xij)‏  Color difference, texture difference, edge detector

Markov Random Field Model First term: model planes in terms of image features of superpixels Second term: model planes in terms of pairs of superpixels, with constraints... A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007

Model constraints Connected structure: except where there is an occlusion, neighboring superpixels are likely to be connected Coplanar structure: except where there are folds, neighboring superpixels are likely to lie on the same plane Co-linearity: long straight lines in the image correspond to straight lines in 3D

Foreground objects Automatically-detected foreground objects may be removed from model (for example: pedestrians, using Dalal & Triggs detector)‏ Detected objects add 3D cues (pedestrians are basically vertical, occlude other surfaces)‏

3D Scene Demonstration

Results A. Saxena, M. Sun, A. Y. Ng. "Learning 3-D Scene Structure from a Single Still Image". In ICCV workshop on 3D Representation for Recognition (3dRR-07), 2007

3D Scene Discussion Pros:  Handles a variety of scene types  Fairly accurate (about 2/3 of scenes correct)‏  Automatic  Handles foreground objects Cons:  Still fails on 1/3 of scenes

Discussion Simple 3D models are adequate for many scenes You can get pretty far without human input (but still would be better results with human annotation of scenes) Extensions?  Use photo completion techniques to handle occlusions?  Massive training sets -> better 3D models?