Peter Henry1, Michael Krainin1, Evan Herbst1,

Slides:



Advertisements
Similar presentations
3D Model Matching with Viewpoint-Invariant Patches(VIP) Reporter :鄒嘉恆 Date : 10/06/2009.
Advertisements

Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.
RGB-D object recognition and localization with clutter and occlusions Federico Tombari, Samuele Salti, Luigi Di Stefano Computer Vision Lab – University.
For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.
Object Space EWA Surface Splatting: A Hardware Accelerated Approach to High Quality Point Rendering Liu Ren Hanspeter Pfister Matthias Zwicker CMU.
SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.
Hilal Tayara ADVANCED INTELLIGENT ROBOTICS 1 Depth Camera Based Indoor Mobile Robot Localization and Navigation.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
Recent work in image-based rendering from unstructured image collections and remaining challenges Sudipta N. Sinha Microsoft Research, Redmond, USA.
Nikolas Engelhard 1, Felix Endres 1, Jürgen Hess 1, Jürgen Sturm 2, Wolfram Burgard 1 1 University of Freiburg, Germany 2 Technical University Munich,
Probabilistic Robotics
Parallel Tracking and Mapping for Small AR Workspaces Vision Seminar
Probabilistic Robotics
A Global Linear Method for Camera Pose Registration
Xiaofeng Ren: Intel Labs Seattle Dieter Fox: UW and Intel Labs Seattle Kurt Konolige: Willow Garage Jana Kosecka: George Mason University.
Robust and large-scale alignment Image from
Toward Object Discovery and Modeling via 3-D Scene Comparison Evan Herbst, Peter Henry, Xiaofeng Ren, Dieter Fox University of Washington; Intel Research.
Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.
Multi-view stereo Many slides adapted from S. Seitz.
Synchronization and Calibration of Camera Networks from Silhouettes Sudipta N. Sinha Marc Pollefeys University of North Carolina at Chapel Hill, USA.
Visual Odometry for Ground Vehicle Applications David Nister, Oleg Naroditsky, James Bergen Sarnoff Corporation, CN5300 Princeton, NJ CPSC 643, Presentation.
Constructing immersive virtual space for HAI with photos Shingo Mori Yoshimasa Ohmoto Toyoaki Nishida Graduate School of Informatics Kyoto University GrC2011.
Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.
3-D Mapping With an RGB-D Camera
Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: † Simon Winder, *Gang Hua, † Rick Szeliski † =MS Research, *=MS.
Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.
The 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems October 11-15, 2009 St. Louis, USA.
Exploiting Segmentation for Robust 3D Object Matching Michael Krainin, Kurt Konolige, and Dieter Fox.
Automatic Registration of Color Images to 3D Geometry Computer Graphics International 2009 Yunzhen Li and Kok-Lim Low School of Computing National University.
KinectFusion : Real-Time Dense Surface Mapping and Tracking IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technology Proceedings.
A General Framework for Tracking Multiple People from a Moving Camera
Xiaoguang Han Department of Computer Science Probation talk – D Human Reconstruction from Sparse Uncalibrated Views.
MESA LAB Multi-view image stitching Guimei Zhang MESA LAB MESA (Mechatronics, Embedded Systems and Automation) LAB School of Engineering, University of.
International Conference on Computer Vision and Graphics, ICCVG ‘2002 Algorithm for Fusion of 3D Scene by Subgraph Isomorphism with Procrustes Analysis.
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
I 3D: Interactive Planar Reconstruction of Objects and Scenes Adarsh KowdleYao-Jen Chang Tsuhan Chen School of Electrical and Computer Engineering Cornell.
Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.
Example: line fitting. n=2 Model fitting Measure distances.
Young Ki Baik, Computer Vision Lab.
Computer Vision Lab Seoul National University Keyframe-Based Real-Time Camera Tracking Young Ki BAIK Vision seminar : Mar Computer Vision Lab.
Stereo Many slides adapted from Steve Seitz.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image image 1image.
Asian Institute of Technology
Visual SLAM Visual SLAM SPL Seminar (Fri) Young Ki Baik Computer Vision Lab.
IIIT Hyderabad Learning Semantic Interaction among Graspable Objects Swagatika Panda, A.H. Abdul Hafez, C.V. Jawahar Center for Visual Information Technology,
The 18th Meeting on Image Recognition and Understanding 2015/7/29 Depth Image Enhancement Using Local Tangent Plane Approximations Kiyoshi MatsuoYoshimitsu.
IIIT HYDERABAD Image-based walkthroughs from partial and incremental scene reconstructions Kumar Srijan Syed Ahsan Ishtiaque C. V. Jawahar Center for Visual.
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
Removing Moving Objects from Point Cloud Scenes Krystof Litomisky and Bir Bhanu International Workshop on Depth Image Analysis November 11, 2012.
Scene Reconstruction Seminar presented by Anton Jigalin Advanced Topics in Computer Vision ( )
Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.
3D reconstruction from uncalibrated images
CSE-473 Project 2 Monte Carlo Localization. Localization as state estimation.
CSE 185 Introduction to Computer Vision Feature Matching.
Visual Odometry David Nister, CVPR 2004
Vision-based SLAM Enhanced by Particle Swarm Optimization on the Euclidean Group Vision seminar : Dec Young Ki BAIK Computer Vision Lab.
Visual Odometry for Ground Vehicle Applications David Nistér, Oleg Naroditsky, and James Bergen Sarnoff Corporation CN5300 Princeton, New Jersey
Fast SLAM Simultaneous Localization And Mapping using Particle Filter A geometric approach (as opposed to discretization approach)‏ Subhrajit Bhattacharya.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Center for Machine Perception Department of Cybernetics Faculty of Electrical Engineering Czech Technical University in Prague Segmentation Based Multi-View.
CSE-473 Mobile Robot Mapping.
Semi-Global Matching with self-adjusting penalties
Paper – Stephen Se, David Lowe, Jim Little
Recognizing Deformable Shapes
Nonparametric Semantic Segmentation
Approximate Models for Fast and Accurate Epipolar Geometry Estimation
SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks Paper by John McCormac, Ankur Handa, Andrew Davison, and Stefan Leutenegger.
Combining Geometric- and View-Based Approaches for Articulated Pose Estimation David Demirdjian MIT Computer Science and Artificial Intelligence Laboratory.
Stereo vision Many slides adapted from Steve Seitz.
INDOOR DENSE DEPTH MAP AT DRONE HOVERING
Presentation transcript:

RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments Peter Henry1, Michael Krainin1, Evan Herbst1, Xiaofeng Ren2, and Dieter Fox1,2 1University of Washington Computer Science and Engineering 2Intel Labs Seattle

RGB-D Data Recently Microsoft Kinect has been in the news ($150) We have been using a PrimeSense reference design (over the past year) 640 x 480 color and dense depth at 30 hz Projecting IR texture onto scene to obtain dense depth values even in textureless areas (as opposed to passive stereo) Limitations: FOV (60 deg) Depth range (5 m, up to 10 with less accuracy) Depth accuracy (vary over ~3cm range at 3m) Resolution 640x480 for images is low Motion blur / rolling shutter / depth-to-image registration

Here is a flythrough of one of our maps Applications Robotic navigation and manipulation Semantic mapping / object recognition Remote telepresense / search + rescue Architectural models and remodeling How did we get this map?

Related Work SLAM Loop Closure Detection Photo collections [Davison et al, PAMI 2007] (monocular) [Konolige et al, IJR 2010] (stereo) [Pollefeys et al, IJCV 2007] (multi-view stereo) [Borrmann et al, RAS 2008] (3D laser) [May et al, JFR 2009] (ToF sensor) Loop Closure Detection [Nister et al, CVPR 2006] [Paul et al, ICRA 2010] Photo collections [Snavely et al, SIGGRAPH 2006] [Furukawa et al, ICCV 2009] Todo: No titles Todo: Pictures per section Monocular SLAM Scale ambiguity (We need large scale environments) Robust to lack of visual features Resulting map dense Stereo SLAM We have lower resolution Robust to frames with no visual features 3D Laser slam We have limited field of view, limited depth range, limited depth accuracy Want to take advantage of visual features Loop Closure Detection Useful for global consistency We want a metrically accurate map (need to correct after detection) Photo collections Sparse features Filling in with multi-view stereo doesn’t work in textureless areas Computationally expensive “MonoSLAM: real-time single camera SLAM” [Davison et al, PAMI 2007] “View-Based Maps” [Konolige et al, IJR 2010 “Detailed Real-Time Urban 3D Reconstruction from Video” [Pollefeys et al, IJCV 2007] “Globally consistent 3D mapping with scan matching” [Borrmann et al, RAS 2008] “Three-dimensional mapping with time-of-flight cameras” [May et al, JFR 2009] “Scalable Recognition with a Vocabulary Tree” [Nister et al, CVPR 2006] “FAB-MAP 3D: Topological Mapping with Spatial and Visual Appearance” [Paul et al, ICRA 2010] “Photo Tourism: Exploring Photo Collections in 3D” [Snavely et al, SIGGRAPH 2006] “Reconstructing Building Interiors from Images” [Furukawa et al, ICCV 2009]

System Overview Frame-to-frame alignment Loop closure detection and global consistency Map representation - Implemented in ROS (Willow Garage), c++ An outline of our system Alignment RANSAC (visual) ICP (shape) Loop Closure Detect Correct Surfel Maps Fuse all measurements Examine components in more detail…

Alignment (3D SIFT RANSAC) SIFT features (from image) in 3D (from depth) RANSAC alignment requires no initial estimate Requires sufficient matched features - Use SiftGPU (150 ms per frame for features, 80ms for RANSAC) Also SURF and FAST / Calonder implemented

RANSAC Failure Case

Alignment (ICP) Does not need visual features or color Requires sufficient geometry and initial estimate -Say iterative closest point Limited FOV, needs geometry, needs good initial estimate, slower

ICP Failure Case

Joint Optimization (RGBD-ICP) Fixed feature matches Dense point-to-plane -Details Point-to-plane requires Levenburg-Marquardt but fewer iterations KD-Tree for nearest neighbor Downsample source cloud + target cloud (10% still works) Discard percentage of outliers / and those over max distance Discard boundary points

Loop Closure Sequential alignments accumulate error Revisiting a previous location results in an inconsistent map

Loop Closure Detection Detect by running RANSAC against previous frames Pre-filter options (for efficiency): Only a subset of frames (keyframes) Every nth frame New Keyframe when RANSAC fails directly against previous keyframe Only keyframes with similar estimated 3D pose Place recognition using vocabulary tree Place recognition is since paper publication

Loop Closure Correction TORO [Grisetti 2007]: Constraints between camera locations in pose graph Maximum likelihood global camera poses Put pose graph picture here (eliminate SBA). Move SBA to the end or an end animation. Title: Global Alignment. Global optimization. Global consistency.

Map Representation: Surfels Surface Elements [Pfister 2000, Weise 2009, Krainin 2010] Circular surface patches Accumulate color / orientation / size information Incremental, independent updates Incorporate occlusion reasoning 750 million points reduced to 9 million surfels Mention: as a post processing for final representation Pfister: Rendering Weise: In human hand modeling Krainin: In-robot hand modeling, ICRA workshops, International Journal of Robotics Research (to appear) Each frame contains up to 300,000 3d points We have TOO MUCH data Alternatives: Downsample: Lose data Mesh: incremental updates difficult. Overlap makes updates potentially inefficient Not parallel (GPU) Volumetric: Memory difficult to obtain detail at the size of our environments

Experiment 1: RGBD-ICP 16 locations marked in environment (~7m apart) Camera on tripod carried between and placed at markers Error between ground truth distance and the accumulated distance from frame-to-frame alignment α weight 0.0 (ICP) 0.2 0.5 0.8 1.0 (RANSAC) Mean error (m) 0.22 0.19 0.16 0.18 1.29

Mean Translational Error (m) Mean Angular Error (deg) Experiment 2: Robot Arm Camera held in WAM arm for ground truth pose Method Mean Translational Error (m) Mean Angular Error (deg) RANSAC .168 4.5 ICP .144 3.9 RGBD-ICP (α=0.5) .123 3.3 RGBD-ICP + TORO .069 2.8

Map Accuracy: Architectural Drawing 114m long way 2130 frames

Map Accuracy: 2D Laser Map A second building Walls are truly curved 70m sides 2625 frames

Conclusion Kinect-style depth cameras have recently become available as consumer products RGB-D Mapping can generate rich 3D maps using these cameras RGBD-ICP combines visual and shape information for robust frame-to-frame alignment Global consistency achieved via loop closure detection and optimization (RANSAC, TORO, SBA) Surfels provide a compact map representation

Open Questions Which are the best features to use? How to find more loop closure constraints between frames? What is the right representation (point clouds, surfels, meshes, volumetric, geometric primitives, objects)? How to generate increasingly photorealistic maps? Autonomous exploration for map completeness? Can we use these rich 3D maps for semantic mapping?

Thank You!