O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera.

Slides:

Advertisements

Similar presentations

Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.

Advertisements

Hilal Tayara ADVANCED INTELLIGENT ROBOTICS 1 Depth Camera Based Indoor Mobile Robot Localization and Navigation.

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Parallel Tracking and Mapping for Small AR Workspaces Vision Seminar

Silvina Rybnikov Supervisors: Prof. Ilan Shimshoni and Prof. Ehud Rivlin HomePage:

Monte Carlo Localization for Mobile Robots Karan M. Gupta 03/10/2004

Chapter 6 Feature-based alignment Advanced Computer Vision.

Robust Object Tracking via Sparsity-based Collaborative Model

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Robust and large-scale alignment Image from

Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.

A Study of Approaches for Object Recognition

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Scale Invariant Feature Transform

Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.

Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Miguel Lourenço, João P. Barreto, Abed Malti Institute for Systems and Robotics, Faculty of Science and Technology University of Coimbra, Portugal Feature.

Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.

1Jana Kosecka, CS 223b EM and RANSAC EM and RANSAC.

SLAM: Simultaneous Localization and Mapping: Part II BY TIM BAILEY AND HUGH DURRANT-WHYTE Presented by Chang Young Kim These slides are based on: Probabilistic.

Registration for Robotics Kurt Konolige Willow Garage Stanford University Patrick Mihelich JD Chen James Bowman Helen Oleynikova Freiburg TORO group: Giorgio.

Computer vision: models, learning and inference

Chapter 6 Feature-based alignment Advanced Computer Vision.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

SLAM (Simultaneously Localization and Mapping)

1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.

Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.

The Brightness Constraint

Flow Separation for Fast and Robust Stereo Odometry [ICRA 2009]

IMAGE MOSAICING Summer School on Document Image Processing

Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)

Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

Example: line fitting. n=2 Model fitting Measure distances.

CSCE 643 Computer Vision: Structure from Motion

Visual SLAM Visual SLAM SPL Seminar (Fri) Young Ki Baik Computer Vision Lab.

21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.

Real-Time Simultaneous Localization and Mapping with a Single Camera (Mono SLAM) Young Ki Baik Computer Vision Lab. Seoul National University.

Peter Henry1, Michael Krainin1, Evan Herbst1,

Geometric Transformations

Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.

3D reconstruction from uncalibrated images

COMP 417 – Jan 12 th, 2006 Guest Lecturer: David Meger Topic: Camera Networks for Robot Localization.

CSE 185 Introduction to Computer Vision Feature Matching.

Visual Odometry David Nister, CVPR 2004

Local features: detection and description

Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.

776 Computer Vision Jan-Michael Frahm Spring 2012.

SLAM Techniques -Venkata satya jayanth Vuddagiri 1.

11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

SLAM : Simultaneous Localization and Mapping

776 Computer Vision Jan-Michael Frahm Spring 2012.

SIFT Scale-Invariant Feature Transform David Lowe

Paper – Stephen Se, David Lowe, Jim Little

TP12 - Local features: detection and description

+ SLAM with SIFT Se, Lowe, and Little Presented by Matt Loper

Approximate Models for Fast and Accurate Epipolar Geometry Estimation

Feature description and matching

3D Photography: Epipolar geometry

Features Readings All is Vanity, by C. Allan Gilbert,

CSE 455 – Guest Lectures 3 lectures Contact Interest points 1

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Computational Photography

Lecture 15: Structure from motion

Presentation transcript:

o Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera and Model-based Robot Vision Andrew Comport o Visual SLAM for Spatially Aware Robots Walterio Mayol-Cuevas o Outdoor Visual SLAM for Robotics Kurt Konolige o Advanced Vision in Deformable Environments Adrien Bartoli Tutorial organized by Andrew Comport and Adrien Bartoli Nice, September 22

Visual SLAM and Spatial Awareness SLAM= Simultaneous Localisation and Mapping o An overview of some methods currently used for SLAM using computer vision. o Recent work on enabling more stable and/or robust mapping in real- time. o Work aiming to provide better scene understanding in the context of SLAM: Spatial Awareness. o Here we concentrate on “Small” working areas where GPS, odometry and other traditional sensors are not operational or available.

Spatial Awareness o SA: A key cognitive competence that permits efficient motion and task planning. n Even from early age we use spatial awareness: the toy has not vanished it is behind the sofa. n I can point to where the entrance to the building is but cant tell how many doors are from here to there. SLAM offers a rigorous way to implement and manage SA

Wearable personal assistants Mayol, Davison and Murray 2003 Video at

SLAM o Key historical reference: n Smith, R.C.and Cheeseman, P. "On the Representation and Estimation of Spatial Uncertainty". The International Journal of Robotics Research 5 (4): o Proposed a stochastic framework to maintain the relationship (uncertainties) between features in the map. o “Our knowledge of the spatial relationships among objects is inherently uncertain. A manmade object does not match its geometric model exactly because of manufacturing tolerances. Even if it did, a sensor could not measure the geometric features, and thus locate the object exactly, because of measurement errors. And even if it could, a robot using the sensor cannot manipulate the object exactly as intended, because of hand positioning errors…”[Smith,Self,Cheesman 1986]

SLAM o A problem that has been identified for several years, central in mobile robot navigation and branching into other fields like wearable computing and augmented reality.

SLAM – Simultaneous Localisation And Mapping camera 3D points (features) camera moved perspective projection predict location Aim to: Localise camera (6DOF – Rotation and Translation from reference view) Simultaneously estimate 3D map of features (e.g. 3D points) update positions update location Implemented using: Extended Kalman Filter, Particle filters, SIFT, Edglets, etc.

State representation as in [Davison 2003]

SLAM with first order uncertainty representation as in [Davison 2003]

Challenges for visual SLAM o On the computer vision side, improving data association: n Ensuring a match is a true positive. o Representations and parameterizations to enhance mapping while within real-time. o Alternative frameworks for mapping: n Can we extend area of operation? n Better scene understanding.

For data association, earlier approach o Small (e.g. 11x11) image patches around salient points to represent features. o Normalized Cross Correlation (NCC) to detect features. o Small patches + accurate search regions lead to fast camera pose estimation. o Depth by projecting hypothesis at different depths. See: A. Davison, Real-Time Simultaneous Localisation and Mapping with a Single Camera, ICCV 2003.

However o Simple patches are insufficient for large view point or scale variations. o Small patches help speed but prone to mismatch. o Search regions can’t always be trusted (camera occlusion, motion blur). Possible solutions: Use better feature description or Other types of features e.g. edge information.

SIFT [D. Lowe, IJCV 2004] Find maxima in scale space to locate keypoint. … 128 elements vector Around keypoint, build invariant local descriptor using gradient histograms. If for tracking, this may be wasteful!

Uses SIFT-like descriptors (histogram of gradients) around Harris corners. Get scale from SLAM = “predictive SIFT”. [Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07]

Video at

[Eade and Drummond, BMVC2006] Edglets: Locally straight section of gradient Image. Parameterized as 3D point + direction. Avoid regions of conflict (e.g. close parallel edges). Deal with multiple matches through robust estimation. Video at

RANSAC [Fischler and Bolles 1981] Random Sampling ANd Consensus Gross “outliers” Least squares fit RANSAC fit Select random sample of points. Propose a model (hypothesis) based on sample. Assess fitness of hypothesis to rest of data. Repeat until max number of iterations or fitness threshold reached. Keep best hypothesis and potentially refine hypothesis with all inliers.

OK but… o Having rich descriptors or even multiple kinds of features may still lead to wrong data associations (mismatches). o If we pass to the SLAM system every measurement we think is good it can be catastrophic. o Better to be able to recover from failure than to think it won’t fail!

[Williams, Smith and Reid ICRA2007] Camera relocalization using small 2D patches + RANSAC to compute pose. Adds a “supervisor” between visual measurements and SLAM system. Use 3 point algorithm -> up to 4 possible poses. Verify using Matas’ T d,d test.

Also see recent work [Williams, Klein and Reid ICCV2007] using randomised trees rather than simple 2D patches. Carry on Is lost? Select 3 matches Compute pose Consistent? yes no In brief, while within real-time limit do: [Williams, Smith and Reid ICRA2007] Video at

Relocalisation based on appearance hashing o Use a hash function to index similar descriptors (Brown et al 2005). o Fast and memory efficient (only an index needs to be saved per descriptor). Chekhlov et al 2008 Quantize result of Haar masks Video at:

Parallel Tracking and Mapping [Klein and Murray, Parallel Tracking and Mapping for Small AR Workspaces Proc. International Symposium on Mixed and Augmented Reality. 2007] o Decouple Mapping from Tracking, run them in separate threads on multi-core CPU. o Mapping is based on key-frames, processed using batch Bundle Adjustment. o Map is intialised from a stereo pair (using 5-Point Algorithm). o Initialised new points with epipolar search. o Large numbers (thousands) of points can be mapped in a small workspace.

[Klein and Murray, 2007] Parallel Tracking and Mapping CPU1 CPU2 Detect Features Compute Camera Pose Draw Graphics Update Map Detect Features Compute Camera Pose Draw Graphics … … … Video at

So far we have mentioned that o Maps are sparse collections of low-level features: n Points (Davison et al., Chekhlov et al.) n Edgelets (Eade and Drummond) n Lines (Smith et al., Gee and Mayol-Cuevas) o Full correlation between features and camera n Maintain full covariance matrix n Loop closure: effects of measurements propagated to all features in map o Increase in state size limits number of features

o Emphasis on localization and less on the mapping output. o SLAM should avoid making “beautiful” maps (there are other better methods for that!). o Very few examples exist on improving the awareness element, e.g. Castle and Murray BMVC 07 on known object recognition within SLAM. Commonly in Visual SLAM

Better spatial awareness through higher level structural inference Types of Structure Coplanar points → planes Collinear edgelets → lines Intersecting lines → junctions Our Contribution Method for augmenting SLAM map with planar and line structures. Evaluation of method in simulated scene: discover trade-off between efficiency and accuracy.

Discovering structure within SLAM Gee, Checkhlov, Calway and Mayol-Cuevas, 2008

Plane Parameters: Plane Representation Camera Plane (x,y,z) c(θ1,φ1)c(θ1,φ1) c(θ2,φ2)c(θ2,φ2) normal Basis vectors: World O Gee et al 2007

1.Discover planes using RANSAC over thresholded subset of map 2.Initialise plane in state using best-fit plane parameters found from SVD of inliers 3.Augment state covariance, P, with new plane Plane Initialisation O World P= Append measurement covariance R 0 to covariance matrix Multiplication with Jacobian populates cross-covariance terms State size increases by 7 after adding plane Gee et al 2007

State size decreases by 1 after adding point to plane Fix points in plane: reduces state size by 2 for each fixed point Add point to planeAdd other points to planeState size is smaller than original state if >7 points are added to plane d 1.Decide whether point lies on plane 2.Add point by projecting onto plane and transforming state and covariance 3.Decide whether to fix point on plane Adding Points to Plane O σ max s World Gee et al 2007

Plane Observation 1.Cannot make direct observation of plane 2.Transform points to 3D world space 3.Project points into image and match with predicted observations 4.Covariance matrix embodies constraints between plane, camera and points World Gee et al 2007

Discovering planes in SLAM Gee et al Video at:

Discovering planes in SLAM Gee et al Video at:

Mean error & State reduction, planes Average 30 runs Gee at al 2008

Discovering 3D lines Video at:

An example application Chekhlov et al Video at

Other interesting recent work o Active search and matching: or know what to measure. n Davison ICCV 2005 and Chli and Davison ECCV 2008 o Submapping: managing better the scalability problem. n Clemente et al RSS 2007 n Eade and Drummond BMVC 2008 o And the work presented in this tutorial: n Randomised trees: Vincent Lepetit n SFM: Andrew Comport

Software tools: o o o o

Recommended intro reading: o Yaakov Bar-Shalom, X. Rong Li, Thiagalingam Kirubarajan, Estimation with Applications to Tracking and Navigation, Wiley-Interscience, o Hugh Durrant-Whyte and Tim Bailey, Simultaneous Localisation and Mapping (SLAM): Part I The Essential Algorithms. Robotics and Automation Magazine, June, o Tim Bailey and Hugh Durrant-Whyte, Simultaneous Localisation and Mapping (SLAM): Part II State of the Art. Robotics and Automation Magazine, September, o Andrew Davison, Ian Reid, Nicholas Molton and Olivier Stasse MonoSLAM: Real-Time Single Camera SLAM, IEEE Trans. PAMI o Andrew Calway, Andrew Davison and Walterio Mayol-Cuevas, Slides of Tutorial on Visual SLAM, BMVC 2007 avaliable at:

Some Challenges o Deal with larger maps. o Obtain maps that are task-meaningful (manipulation, AR, metrology). o Use different feature kinds on an informed way. o Benefit from other approaches such as SFM but keep efficiency. o Incorporate semantics and beyond-geometric scene understanding. Fin