B659: Principles of Intelligent Robot Motion Spring 2013 Kris Hauser

Slides:



Advertisements
Similar presentations
CSE473/573 – Stereo and Multiple View Geometry
Advertisements

KinectFusion: Real-Time Dense Surface Mapping and Tracking
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Computer vision: models, learning and inference
Stereo.
Camera calibration and epipolar geometry
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Last Time Pinhole camera model, projection
Stanford CS223B Computer Vision, Winter 2005 Lecture 6: Stereo 2 Sebastian Thrun, Stanford Rick Szeliski, Microsoft Hendrik Dahlkamp and Dan Morris, Stanford.
Robotic Mapping: A Survey Sebastian Thrun, 2002 Presentation by David Black-Schaffer and Kristof Richmond.
Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2005 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
MSU CSE 240 Fall 2003 Stockman CV: 3D to 2D mathematics Perspective transformation; camera calibration; stereo computation; and more.
Direct Methods for Visual Scene Reconstruction Paper by Richard Szeliski & Sing Bing Kang Presented by Kristin Branson November 7, 2002.
Uncalibrated Geometry & Stratification Sastry and Yang
Multi-view stereo Many slides adapted from S. Seitz.
Face Recognition Based on 3D Shape Estimation
The plan for today Camera matrix
Fitting a Model to Data Reading: 15.1,
Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.
CSE473/573 – Stereo Correspondence
Stockman MSU/CSE Math models 3D to 2D Affine transformations in 3D; Projections 3D to 2D; Derivation of camera matrix form.
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2006 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.
Project 4 Results Representation – SIFT and HoG are popular and successful. Data – Hugely varying results from hard mining. Learning – Non-linear classifier.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.
Automatic Camera Calibration
CSE 185 Introduction to Computer Vision
Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.
Structure from images. Calibration Review: Pinhole Camera.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Automatic Registration of Color Images to 3D Geometry Computer Graphics International 2009 Yunzhen Li and Kok-Lim Low School of Computing National University.
KinectFusion : Real-Time Dense Surface Mapping and Tracking IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technology Proceedings.
Perception Introduction Pattern Recognition Image Formation
ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.
3D Sensing and Reconstruction Readings: Ch 12: , Ch 13: , Perspective Geometry Camera Model Stereo Triangulation 3D Reconstruction by.
CS654: Digital Image Analysis Lecture 8: Stereo Imaging.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
Stereo Many slides adapted from Steve Seitz.
CS 4487/6587 Algorithms for Image Analysis
Cmput412 3D vision and sensing 3D modeling from images can be complex 90 horizon 3D measurements from images can be wrong.
Binocular Stereo #1. Topics 1. Principle 2. binocular stereo basic equation 3. epipolar line 4. features and strategies for matching.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image image 1image.
Computer Vision, Robert Pless
Asian Institute of Technology
Visual SLAM Visual SLAM SPL Seminar (Fri) Young Ki Baik Computer Vision Lab.
Peripheral drift illusion. Multiple views Hartley and Zisserman Lowe stereo vision structure from motion optical flow.
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
Single-view geometry Odilon Redon, Cyclops, 1914.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
stereo Outline : Remind class of 3d geometry Introduction
Project 2 due today Project 3 out today Announcements TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA.
Image-Based Rendering Geometry and light interaction may be difficult and expensive to model –Think of how hard radiosity is –Imagine the complexity of.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
Processing visual information for Computer Vision
Visual Sensing and Perception
Paper – Stephen Se, David Lowe, Jim Little
Motion and Optical Flow
Real Time Dense 3D Reconstructions: KinectFusion (2011) and Fusion4D (2016) Eleanor Tursman.
Common Classification Tasks
What have we learned so far?
Multiple View Geometry for Robotics
Filtering Things to take away from this lecture An image as a function
Filtering An image as a function Digital vs. continuous images
Calibration and homographies
Stereo vision Many slides adapted from Steve Seitz.
Presentation transcript:

B659: Principles of Intelligent Robot Motion Spring 2013 Kris Hauser 3D Sensing and Mapping B659: Principles of Intelligent Robot Motion Spring 2013 Kris Hauser

Agenda A high-level overview of visual sensors and perception algorithms Core concepts Camera / projective geometry Point clouds Occupancy grids Iterative closest points algorithm

Proprioceptive: sense one’s own body Motor encoders (absolute or relative) Contact switches (joint limits) Inertial: sense accelerations of a link Accelerometers Gyroscopes Inertial Measurement Units (IMUs) Visual: sense 3D scene with reflected light RGB: cameras (monocular, stereo) Depth: lasers, radar, time-of-flight, stereo+projection Infrared, etc Tactile: sense forces Contact switches Force sensors Pressure sensors Other Motor current feedback: sense effort GPS Sonar

Sensing vs Perception Sensing: acquisition of signals from hardware Perception: processing of “raw” signals into “meaningful” representations Example: Reading pixels from a camera is sensing. Declaring “it’s a rounded shape with skin color”, “it’s a face”, or “it’s a smiling face” are different levels of perception. Especially at lower levels of perception like signal processing, the line is blurry, and the often the processed results can be essentially considered “sensed”. Digits

3D Perception Topics Sensors: visible-light cameras, depth sensors, laser sensors (Some) perception tasks: Stereo reconstruction Object recognition 3D mapping Object pose recognition Key issues: How to represent and optimize camera transforms? How to fit models in the presence of noise? How to represent large 3D models?

Visual Sensors Visible-light cameras: cheap, low power, high resolution, high frame rates Data: 2D field of RGB pixels Stereo cameras Depth field sensors Two major types: infrared pattern projection (Kinect, ASUS, PrimeSense) and time-of-flight Data: 2D field of depth values (SwissRanger) Sweeping laser sensors Data: 1D field of depth values Hokuyo, SICK, Velodyne Can be mounted on a tilt/spin mount to get 3D field of view Bumblebee stereo camera ASUS Xtion depth sensor Hokuyo laser sensor

Sensors vary in strengths / weaknesses Velodyne (DARPA grand challenge) 1.3 million readings/s $75k price tag

Image formation Light bounces off an object, passes through a lens and lands on a CCD pixel on the image plane Depth of focus Illumination and aperture Color: accomplished through use of filters, e.g. Bayer filter Each channel’s in-between pixels are interpolated

Idealized projective geometry Let: Zim: distance from image plane to focal point along the depth axis (X,Y,Z): point in 3D space relative to focal point, Z > 0 Then, image space point is: Xc = Zim X / Z Yc = Zim Y / Z …Which get scaled and offset to get pixel coordinates x Image plane (X,Z) (Xc,Zim) Focal point z

Issues with real sensors Motion blur Distortion caused by lenses Non-square pixels Exposure Noise Film grain Salt-and-pepper noise Shot noise Motion blur Distortion Exposure

Calibration Determine camera’s intrinsic parameters Focal length Field of view Pixel dimensions Radial distortion That determine the mapping from image pixels to an idealized pinhole camera Rectification

Stereo vision processing Dense reconstruction Given two rectified images, find binocular disparity at each pixel Take a small image patch around each pixel in the left image, search for the best horizontal shifted copy in the right image What size patch? What search size? What matching criterion? Works best for highly textured scenes

Point Clouds Unordered list of 3D points P={p1,…,pn} Each point optionally annotated by: Color (RGB) Sensor reading ID# (why?) Estimated surface normal (nx,ny,nz) No information about objects, occlusions, topology Point Cloud Library (PCL) http://pointclouds.org

3D Mapping Each frame of a depth sensor gives a narrow snapshot of the world geometry from a given position 3D mapping is the process of stitching multiple views into a global model

Three scenarios: Consider two raw point clouds P1 and P2 from cameras with transformations T1 and T2. Goal: build point cloud P in frame T1 (assume identity) Case 1: relative transformations known Simple union P = P1  (T2-1  P2) Case 2: small transformation Pose registration problem Vast majority of points correspond between scenes Case 3: large transformation Significant fraction of points do not correspond, lighting differences, more occlusions Pose registration must be more robust to outliers

Case 2: Small transformations Visual odometry: estimate relative motion of subsequent frames using optical flow Define feature points in P1 (e.g., corner detector) Estimate a transformation of an image patch around feature that best matches P2 (defines optical flow field) Transformation: translation, rotation, scale Fit T2 to match these feature transforms

Case 3: Large transformations Iterative closest point algorithm Input: initial guess for T2 Repeat until convergence: Find nearest neighbor pairings between P1 and T2P2 Select those pairs that fall below some distance threshold (outlier rejection) Assign an error metric and optimize T2 to minimize this metric

Case 3: Large transformations Iterative closest point algorithm Input: initial guess for T2 Repeat until convergence: Find nearest neighbor pairings between P1 and T2P2 Select those pairs that fail to satisfy some distance criteria (outlier rejection) Assign an error metric and optimize T2 to minimize this metric What metric? What criteria? How to minimize?

What metrics for matching? Position Surface normal Color Nearest neighbor methods Fast data structures, e.g. K-D trees For large scans usually want a constant sized subsample Projection-based methods Render scene from perspective of T1 to determine a match Very fast (used in Kinect Fusion algorithm) Only uses position information

What criteria for outlier rejection? Distance too large (e.g., top X%) Inconsistencies with neighboring pairs On boundary of scan

How to optimize? Want to find rotation R, translation t of T2 to minimize some error function Sum of squared point-to-point differences 𝐸 𝑅,𝑡 = 𝑖=1 𝑛 𝑝 𝑖 −(𝑅 𝑞 𝑖 +𝑡) 2 Closed-form solution (SVD) Very fast per step Sum of squared point-to-plane differences 𝐸 𝑅,𝑡 = 𝑖=1 𝑛 𝑛 𝑖 𝑇 𝑝 𝑖 − 𝑛 𝑖 𝑇 (𝑅 𝑞 𝑖 +𝑡) 2 Must use numerical methods Deal with rotation variable Tends to lead ICP to converge with fewer iterations

Other applications of ICP Fitting 3D triangulated models to point clouds for object recognition / pose estimation

Stitching multiple scans In its most basic form, multi-view 3D mapping is simply a repetition of the two-camera case But two major issues: Drift and “closing the loop” Point clouds become massive after many scans This time

Point cloud growth problem With N points, at F frames/sec, and T seconds of run time, NFT points are gathered Kinect: N=307200, F=30, T=60 => 552,960,000 points With RGB in 4 bytes, XYZ in 12 bytes => 8 GB / min Solutions: Forget earlier scans (short term memory) Build persistent, “collapsed” representation of environment geometry Polygon meshes Occupancy grids Key issue: how to estimate with low # of points and update later?

Occupancy grids Store a grid G with a fixed minimum resolution Mark which cells (voxels) are occupied by a point Representation size is independent of T Two options for updating on a new scan Compute ICP to align current scan to prior scan, then add points to G Modify ICP to work directly with the representation G

Probabilistic Occupancy Grids with Ray Casting Scans are noisy, so simply adding points is likely to overestimate occupied cells Ray casting approaches: Each cell has a probability of being free/occupied/unseen Each scan defines a line segment that passes through free space and ends in an occupied cell (or near one) Walk along the segment, increasing P(free(c)) of each encountered cell c, and finally increase P(occupied(c)) for the terminal cell c

Probabilistic Occupancy Grids with Ray Casting Scans are noisy, so simply adding points is likely to overestimate occupied cells Ray casting approaches: Each cell has a probability of being free/occupied/unseen Each scan defines a line segment that passes through free space and ends in an occupied cell (or near one) Walk along the segment, increasing P(free(c)) of each encountered cell c, and finally increase P(occupied(c)) for the terminal cell c

Compact geometry representations within a cell On-line averaging On-line least squares estimation of fitting plane

Handling large 3D grids Problem: tabular 3D grid storage increases with O(N3) 10243 is 1Gb Solutions Store hash table only of occupied cells Octree data structure OctoMap Library (http://octomap.sourceforge.net)

Dynamic Environments Real environments have people, animals, objects that move around Two options: Map static parts by assuming dynamic objects will average out as noise over time (probabilistic occupancy grids) Segment (and possibly model) dynamic objects

Related topics Sensor fusion Object segmentation and recognition Simultaneous Localization and Mapping (SLAM) Next-best-view planning

Next time Kalman filtering Welch and Bishop (2001) Principles Ch. 8 Zeeshan