O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera.

o Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera and Model-based Robot Vision Andrew Comport o Visual SLAM for Spatially Aware Robots Walterio Mayol-Cuevas o Outdoor Visual SLAM for Robotics Kurt Konolige o Advanced Vision in Deformable Environments Adrien Bartoli Tutorial organized by Andrew Comport and Adrien Bartoli Nice, September 22

Visual SLAM and Spatial Awareness SLAM= Simultaneous Localisation and Mapping o An overview of some methods currently used for SLAM using computer vision. o Recent work on enabling more stable and/or robust mapping in real- time. o Work aiming to provide better scene understanding in the context of SLAM: Spatial Awareness. o Here we concentrate on “Small” working areas where GPS, odometry and other traditional sensors are not operational or available.

Spatial Awareness o SA: A key cognitive competence that permits efficient motion and task planning. n Even from early age we use spatial awareness: the toy has not vanished it is behind the sofa. n I can point to where the entrance to the building is but cant tell how many doors are from here to there. SLAM offers a rigorous way to implement and manage SA

Wearable personal assistants Mayol, Davison and Murray 2003 Video at http://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.02/Videos/wearableslam2.mpghttp://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.02/Videos/wearableslam2.mpg

SLAM o Key historical reference: n Smith, R.C.and Cheeseman, P. "On the Representation and Estimation of Spatial Uncertainty". The International Journal of Robotics Research 5 (4): 56-68. 1986. o Proposed a stochastic framework to maintain the relationship (uncertainties) between features in the map. o “Our knowledge of the spatial relationships among objects is inherently uncertain. A manmade object does not match its geometric model exactly because of manufacturing tolerances. Even if it did, a sensor could not measure the geometric features, and thus locate the object exactly, because of measurement errors. And even if it could, a robot using the sensor cannot manipulate the object exactly as intended, because of hand positioning errors…”[Smith,Self,Cheesman 1986]

SLAM o A problem that has been identified for several years, central in mobile robot navigation and branching into other fields like wearable computing and augmented reality.

SLAM – Simultaneous Localisation And Mapping camera 3D points (features) camera moved perspective projection predict location Aim to: Localise camera (6DOF – Rotation and Translation from reference view) Simultaneously estimate 3D map of features (e.g. 3D points) update positions update location Implemented using: Extended Kalman Filter, Particle filters, SIFT, Edglets, etc.

State representation as in [Davison 2003]

SLAM with first order uncertainty representation as in [Davison 2003]

Challenges for visual SLAM o On the computer vision side, improving data association: n Ensuring a match is a true positive. o Representations and parameterizations to enhance mapping while within real-time. o Alternative frameworks for mapping: n Can we extend area of operation? n Better scene understanding.

For data association, earlier approach o Small (e.g. 11x11) image patches around salient points to represent features. o Normalized Cross Correlation (NCC) to detect features. o Small patches + accurate search regions lead to fast camera pose estimation. o Depth by projecting hypothesis at different depths. See: A. Davison, Real-Time Simultaneous Localisation and Mapping with a Single Camera, ICCV 2003.

However o Simple patches are insufficient for large view point or scale variations. o Small patches help speed but prone to mismatch. o Search regions can’t always be trusted (camera occlusion, motion blur). Possible solutions: Use better feature description or Other types of features e.g. edge information.

SIFT [D. Lowe, IJCV 2004] Find maxima in scale space to locate keypoint. … 128 elements vector Around keypoint, build invariant local descriptor using gradient histograms. If for tracking, this may be wasteful!

Uses SIFT-like descriptors (histogram of gradients) around Harris corners. Get scale from SLAM = “predictive SIFT”. [Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07]

Video at http://www.cs.bris.ac.uk/Publications/attachment-delivery.jsp?id=9http://www.cs.bris.ac.uk/Publications/attachment-delivery.jsp?id=9

[Eade and Drummond, BMVC2006] Edglets: Locally straight section of gradient Image. Parameterized as 3D point + direction. Avoid regions of conflict (e.g. close parallel edges). Deal with multiple matches through robust estimation. Video at http://mi.eng.cam.ac.uk/~ee231/bmvcmovie.avihttp://mi.eng.cam.ac.uk/~ee231/bmvcmovie.avi

RANSAC [Fischler and Bolles 1981] Random Sampling ANd Consensus Gross “outliers” Least squares fit RANSAC fit Select random sample of points. Propose a model (hypothesis) based on sample. Assess fitness of hypothesis to rest of data. Repeat until max number of iterations or fitness threshold reached. Keep best hypothesis and potentially refine hypothesis with all inliers.

OK but… o Having rich descriptors or even multiple kinds of features may still lead to wrong data associations (mismatches). o If we pass to the SLAM system every measurement we think is good it can be catastrophic. o Better to be able to recover from failure than to think it won’t fail!

[Williams, Smith and Reid ICRA2007] Camera relocalization using small 2D patches + RANSAC to compute pose. Adds a “supervisor” between visual measurements and SLAM system. Use 3 point algorithm -> up to 4 possible poses. Verify using Matas’ T d,d test.

Also see recent work [Williams, Klein and Reid ICCV2007] using randomised trees rather than simple 2D patches. Carry on Is lost? Select 3 matches Compute pose Consistent? yes no In brief, while within real-time limit do: [Williams, Smith and Reid ICRA2007] Video at http://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.04/Videos/relocalisation_icra_07.mpghttp://www.robots.ox.ac.uk/ActiveVision/Projects/Vslam/vslam.04/Videos/relocalisation_icra_07.mpg

Relocalisation based on appearance hashing o Use a hash function to index similar descriptors (Brown et al 2005). o Fast and memory efficient (only an index needs to be saved per descriptor). Chekhlov et al 2008 Quantize result of Haar masks Video at: http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000939http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000939

Parallel Tracking and Mapping [Klein and Murray, Parallel Tracking and Mapping for Small AR Workspaces Proc. International Symposium on Mixed and Augmented Reality. 2007] o Decouple Mapping from Tracking, run them in separate threads on multi-core CPU. o Mapping is based on key-frames, processed using batch Bundle Adjustment. o Map is intialised from a stereo pair (using 5-Point Algorithm). o Initialised new points with epipolar search. o Large numbers (thousands) of points can be mapped in a small workspace.

[Klein and Murray, 2007] Parallel Tracking and Mapping CPU1 CPU2 Detect Features Compute Camera Pose Draw Graphics Update Map Detect Features Compute Camera Pose Draw Graphics … … … Video at http://www.robots.ox.ac.uk/ActiveVision/Videos/index.htmlhttp://www.robots.ox.ac.uk/ActiveVision/Videos/index.html

So far we have mentioned that o Maps are sparse collections of low-level features: n Points (Davison et al., Chekhlov et al.) n Edgelets (Eade and Drummond) n Lines (Smith et al., Gee and Mayol-Cuevas) o Full correlation between features and camera n Maintain full covariance matrix n Loop closure: effects of measurements propagated to all features in map o Increase in state size limits number of features

o Emphasis on localization and less on the mapping output. o SLAM should avoid making “beautiful” maps (there are other better methods for that!). o Very few examples exist on improving the awareness element, e.g. Castle and Murray BMVC 07 on known object recognition within SLAM. Commonly in Visual SLAM

Better spatial awareness through higher level structural inference Types of Structure Coplanar points → planes Collinear edgelets → lines Intersecting lines → junctions Our Contribution Method for augmenting SLAM map with planar and line structures. Evaluation of method in simulated scene: discover trade-off between efficiency and accuracy.

Discovering structure within SLAM Gee, Checkhlov, Calway and Mayol-Cuevas, 2008

Plane Parameters: Plane Representation Camera Plane (x,y,z) c(θ1,φ1)c(θ1,φ1) c(θ2,φ2)c(θ2,φ2) normal Basis vectors: World O Gee et al 2007

1.Discover planes using RANSAC over thresholded subset of map 2.Initialise plane in state using best-fit plane parameters found from SVD of inliers 3.Augment state covariance, P, with new plane Plane Initialisation O World P= Append measurement covariance R 0 to covariance matrix Multiplication with Jacobian populates cross-covariance terms State size increases by 7 after adding plane Gee et al 2007

State size decreases by 1 after adding point to plane Fix points in plane: reduces state size by 2 for each fixed point Add point to planeAdd other points to planeState size is smaller than original state if >7 points are added to plane d 1.Decide whether point lies on plane 2.Add point by projecting onto plane and transforming state and covariance 3.Decide whether to fix point on plane Adding Points to Plane O σ max s World Gee et al 2007

Plane Observation 1.Cannot make direct observation of plane 2.Transform points to 3D world space 3.Project points into image and match with predicted observations 4.Covariance matrix embodies constraints between plane, camera and points World Gee et al 2007

Discovering planes in SLAM Gee et al. 2007 Video at: http://www.cs.bris.ac.uk/~geehttp://www.cs.bris.ac.uk/~gee

Mean error & State reduction, planes Average 30 runs Gee at al 2008

Discovering 3D lines Video at: http://www.cs.bris.ac.uk/~geehttp://www.cs.bris.ac.uk/~gee

An example application Chekhlov et al. 2007 Video at http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000745http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000745

Other interesting recent work o Active search and matching: or know what to measure. n Davison ICCV 2005 and Chli and Davison ECCV 2008 o Submapping: managing better the scalability problem. n Clemente et al RSS 2007 n Eade and Drummond BMVC 2008 o And the work presented in this tutorial: n Randomised trees: Vincent Lepetit n SFM: Andrew Comport

Software tools: o http://www.doc.ic.ac.uk/~ajd/Scene/index.html http://www.doc.ic.ac.uk/~ajd/Scene/index.html o http://www.robots.ox.ac.uk/~gk/PTAM/ http://www.robots.ox.ac.uk/~gk/PTAM/ o http://www.openslam.org/ http://www.openslam.org/ o http://www.robots.ox.ac.uk/~SSS06/ http://www.robots.ox.ac.uk/~SSS06/

Recommended intro reading: o Yaakov Bar-Shalom, X. Rong Li, Thiagalingam Kirubarajan, Estimation with Applications to Tracking and Navigation, Wiley-Interscience, 2001. o Hugh Durrant-Whyte and Tim Bailey, Simultaneous Localisation and Mapping (SLAM): Part I The Essential Algorithms. Robotics and Automation Magazine, June, 2006. o Tim Bailey and Hugh Durrant-Whyte, Simultaneous Localisation and Mapping (SLAM): Part II State of the Art. Robotics and Automation Magazine, September, 2006. o Andrew Davison, Ian Reid, Nicholas Molton and Olivier Stasse MonoSLAM: Real-Time Single Camera SLAM, IEEE Trans. PAMI 2007. o Andrew Calway, Andrew Davison and Walterio Mayol-Cuevas, Slides of Tutorial on Visual SLAM, BMVC 2007 avaliable at: http://www.cs.bris.ac.uk/Research/Vision/Realtime/bmvctutorial/

Some Challenges o Deal with larger maps. o Obtain maps that are task-meaningful (manipulation, AR, metrology). o Use different feature kinds on an informed way. o Benefit from other approaches such as SFM but keep efficiency. o Incorporate semantics and beyond-geometric scene understanding. Fin

O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera.

Similar presentations

Presentation on theme: "O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera.

Similar presentations

Presentation on theme: "O Introduction: Robot Vision Philippe Martinet o Unifying Vision and Control Selim Benhimane o Efficient Keypoint Recognition Vincent Lepetit o Multi-camera."— Presentation transcript:

Similar presentations

About project

Feedback