Peter Henry1, Michael Krainin1, Evan Herbst1,

Name: Peter Henry1, Michael Krainin1, Evan Herbst1,
Uploaded: 2017-07-22T13:11:42+00:00
Duration: PTM12S6
Channel: Angelina Anderson
Description: Peter Henry1, Michael Krainin1, Evan Herbst1,

RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments
Peter Henry1, Michael Krainin1, Evan Herbst1, Xiaofeng Ren2, and Dieter Fox1,2 1University of Washington Computer Science and Engineering 2Intel Labs Seattle

RGB-D Data Recently Microsoft Kinect has been in the news ($150)
We have been using a PrimeSense reference design (over the past year) 640 x 480 color and dense depth at 30 hz Projecting IR texture onto scene to obtain dense depth values even in textureless areas (as opposed to passive stereo) Limitations: FOV (60 deg) Depth range (5 m, up to 10 with less accuracy) Depth accuracy (vary over ~3cm range at 3m) Resolution 640x480 for images is low Motion blur / rolling shutter / depth-to-image registration

Here is a flythrough of one of our maps
Applications Robotic navigation and manipulation Semantic mapping / object recognition Remote telepresense / search + rescue Architectural models and remodeling How did we get this map?

Related Work SLAM Loop Closure Detection Photo collections
[Davison et al, PAMI 2007] (monocular) [Konolige et al, IJR 2010] (stereo) [Pollefeys et al, IJCV 2007] (multi-view stereo) [Borrmann et al, RAS 2008] (3D laser) [May et al, JFR 2009] (ToF sensor) Loop Closure Detection [Nister et al, CVPR 2006] [Paul et al, ICRA 2010] Photo collections [Snavely et al, SIGGRAPH 2006] [Furukawa et al, ICCV 2009] Todo: No titles Todo: Pictures per section Monocular SLAM Scale ambiguity (We need large scale environments) Robust to lack of visual features Resulting map dense Stereo SLAM We have lower resolution Robust to frames with no visual features 3D Laser slam We have limited field of view, limited depth range, limited depth accuracy Want to take advantage of visual features Loop Closure Detection Useful for global consistency We want a metrically accurate map (need to correct after detection) Photo collections Sparse features Filling in with multi-view stereo doesn’t work in textureless areas Computationally expensive “MonoSLAM: real-time single camera SLAM” [Davison et al, PAMI 2007] “View-Based Maps” [Konolige et al, IJR 2010 “Detailed Real-Time Urban 3D Reconstruction from Video” [Pollefeys et al, IJCV 2007] “Globally consistent 3D mapping with scan matching” [Borrmann et al, RAS 2008] “Three-dimensional mapping with time-of-flight cameras” [May et al, JFR 2009] “Scalable Recognition with a Vocabulary Tree” [Nister et al, CVPR 2006] “FAB-MAP 3D: Topological Mapping with Spatial and Visual Appearance” [Paul et al, ICRA 2010] “Photo Tourism: Exploring Photo Collections in 3D” [Snavely et al, SIGGRAPH 2006] “Reconstructing Building Interiors from Images” [Furukawa et al, ICCV 2009]

System Overview Frame-to-frame alignment
Loop closure detection and global consistency Map representation - Implemented in ROS (Willow Garage), c++ An outline of our system Alignment RANSAC (visual) ICP (shape) Loop Closure Detect Correct Surfel Maps Fuse all measurements Examine components in more detail…

Alignment (3D SIFT RANSAC)
SIFT features (from image) in 3D (from depth) RANSAC alignment requires no initial estimate Requires sufficient matched features - Use SiftGPU (150 ms per frame for features, 80ms for RANSAC) Also SURF and FAST / Calonder implemented

RANSAC Failure Case

Alignment (ICP) Does not need visual features or color
Requires sufficient geometry and initial estimate -Say iterative closest point Limited FOV, needs geometry, needs good initial estimate, slower

ICP Failure Case

Joint Optimization (RGBD-ICP)
Fixed feature matches Dense point-to-plane -Details Point-to-plane requires Levenburg-Marquardt but fewer iterations KD-Tree for nearest neighbor Downsample source cloud + target cloud (10% still works) Discard percentage of outliers / and those over max distance Discard boundary points

Loop Closure Sequential alignments accumulate error
Revisiting a previous location results in an inconsistent map

Loop Closure Detection
Detect by running RANSAC against previous frames Pre-filter options (for efficiency): Only a subset of frames (keyframes) Every nth frame New Keyframe when RANSAC fails directly against previous keyframe Only keyframes with similar estimated 3D pose Place recognition using vocabulary tree Place recognition is since paper publication

Loop Closure Correction
TORO [Grisetti 2007]: Constraints between camera locations in pose graph Maximum likelihood global camera poses Put pose graph picture here (eliminate SBA). Move SBA to the end or an end animation. Title: Global Alignment. Global optimization. Global consistency.

Map Representation: Surfels
Surface Elements [Pfister 2000, Weise 2009, Krainin 2010] Circular surface patches Accumulate color / orientation / size information Incremental, independent updates Incorporate occlusion reasoning 750 million points reduced to 9 million surfels Mention: as a post processing for final representation Pfister: Rendering Weise: In human hand modeling Krainin: In-robot hand modeling, ICRA workshops, International Journal of Robotics Research (to appear) Each frame contains up to 300,000 3d points We have TOO MUCH data Alternatives: Downsample: Lose data Mesh: incremental updates difficult. Overlap makes updates potentially inefficient Not parallel (GPU) Volumetric: Memory difficult to obtain detail at the size of our environments

Experiment 1: RGBD-ICP 16 locations marked in environment (~7m apart)
Camera on tripod carried between and placed at markers Error between ground truth distance and the accumulated distance from frame-to-frame alignment α weight 0.0 (ICP) 0.2 0.5 0.8 1.0 (RANSAC) Mean error (m) 0.22 0.19 0.16 0.18 1.29

Mean Translational Error (m) Mean Angular Error (deg)
Experiment 2: Robot Arm Camera held in WAM arm for ground truth pose Method Mean Translational Error (m) Mean Angular Error (deg) RANSAC .168 4.5 ICP .144 3.9 RGBD-ICP (α=0.5) .123 3.3 RGBD-ICP + TORO .069 2.8

Map Accuracy: Architectural Drawing
114m long way 2130 frames

Map Accuracy: 2D Laser Map
A second building Walls are truly curved 70m sides 2625 frames

Conclusion Kinect-style depth cameras have recently become available as consumer products RGB-D Mapping can generate rich 3D maps using these cameras RGBD-ICP combines visual and shape information for robust frame-to-frame alignment Global consistency achieved via loop closure detection and optimization (RANSAC, TORO, SBA) Surfels provide a compact map representation

Open Questions Which are the best features to use?
How to find more loop closure constraints between frames? What is the right representation (point clouds, surfels, meshes, volumetric, geometric primitives, objects)? How to generate increasingly photorealistic maps? Autonomous exploration for map completeness? Can we use these rich 3D maps for semantic mapping?

Thank You!

Peter Henry1, Michael Krainin1, Evan Herbst1,

Similar presentations

Presentation on theme: "Peter Henry1, Michael Krainin1, Evan Herbst1,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Peter Henry1, Michael Krainin1, Evan Herbst1,

Similar presentations

Presentation on theme: "Peter Henry1, Michael Krainin1, Evan Herbst1,"— Presentation transcript:

Similar presentations

About project

Feedback