Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian.

Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian Tan, Minh-Tri Pham, Liang-Tien Chia) Center for Multimedia & Network Technology (CeMNet) School of Computer Engineering Nanyang Technological University, Singapore

Urban Landmarks Those easy to recognize Those that aren’t
© Anirudh Koul © kevincole © qureyoon

“Back-to-Basics” Map Reading!
An image or images taken from a single location, at probe time A plan-view outline map Won’t consider GPS GPS reception bad in high- rise urban areas GPS can be jammed or spoofed

Related Work and Differences
Appearance-based matching in urban areas Robertson & Cipolla BMVC04, Yeh et al. CVPR04, Zhang & Košecká 3DPVT06 General wide-baseline stereo / multi-view (but not targeted for searching through significant-sized datasets) Bay et al. CVPR05, Mičušík et al. CVPR08, Schindler et al. 3DPVT06, Schmid & Zisserman IJCV00, Werner & Zisserman ECCV02 Key differences here: No prior appearance information Only a 2D plan-view geometric map available No stereo / multi-view Images are taken from single location

A Geometric Matching Paradigm
Assume buildings are vertical planar extrusions Match building corners in map  vertical corner lines in rectified image Significant building corners Not façade details / painted edges Geometric Signature

2D Geometric Image Features
Basic Lines (2D)

2½D Geometric Image Features
David Marr’s bottom-up visual perception framework Image Primal Sketch  2½D Sketch  3D model Augmented Lines (2D + adjacent 3D normals)

David Marr’s bottom-up visual perception framework Image Primal Sketch  2½D Sketch  3D model Elemental Planes (2D + fixed depth ratios of vertical boundaries)

David Marr’s bottom-up visual perception framework Image Primal Sketch  2½D Sketch  3D model Structural Fragments (piecewise 3D structures with unknown scales)

Geometric Signatures – Uniqueness Analysis Under Ideal Conditions
Structural Fragments (3D structure with unknown scale) Strong match Elemental Planes (2D + fixed depth ratios) Basic Lines (2D) Augmented Lines (2D + 3D normals) Poor match

Overview of Localization Method
BOTTOM-UP Query image Extract vertical corners + normals Recover elemental planes with 3D normals Link into plan-view structural fragments (modulo similarity) Calibration from vanishing points TOP-DOWN 2D map Camera pose Geometric hashing lookup for correspondence candidates Voting-based estimate of optimal camera pose

Estimation of Quasi-Manhattan Vanishing Points
Use EM algorithm (Schindler et al. 3DPVT 2006) Details in paper Image rectification  3D verticals become || to image y-axis

Vertical Corner Line Hypothesis (VCLH)
Hypotheses for corners of buildings Based on heuristics 3 Categories: Uni-Normal Augmented Line Bi-Normal Augmented Line Basic line

Elemental Planes Elemental Plane:
2 VCLHs connected by groups of collinear horizontal edges Same plane normals on linked sides Invariant Depth Ratio:

Structural Fragments Structural fragment
Sequence of adjacent elemental planes sharing bi-normal VCLHs Full 3D structure (modulo scale)

More Examples Elemental Planes Structural Fragments

Matching with Structural Fragments
Exhaustive testing: Correspondence structural fragment of l planes  l linked building edges Best-fit matching with error Consensus support C from other VCLHs Vote in pose-space accumulator array Vote score: Complexity: O(n), n = # of building corners in map 8s per search on Matlab

Matching Example with Structural Fragments
Inconsistent matches Consistent matches

Experiments – Dataset I
Bronx neighborhood of Woodstock Google Street View images (total 212) 53 unique locations, 4 images per location (shown in quads) Manually created building outline plan view map 111 buildings with 885 corners

Experiments – Dataset II
Singapore government housing (HDB) estate Self-collected images (total 120) 30 unique locations, 4 images per location Manually created building outline plan view map 20 mega buildings with 659 corners

% of test probes where correct pose is better than this rank
Matching Results Compare probe signature to signatures at 3600 grid locations, and sort matching scores Find rank of ground truth % of test probes where correct pose is better than this rank Selectivity of 0-10% Match ranks

Dataset II Example Correct Matches
Example results for matching 3D models are only used for visualizing results

Observations This is a start to solving a challenging problem
difficult even for humans Results are mixed: Selectivity is very high 57-70% of correct poses within top-1% selectivity (36 out of 3600) But need to be higher to be end-usable Yet in ideal conditions signatures appear very discriminative Main challenges False VCLH negatives (some) building corners not detected due to poor resolution, etc. False VCLH positives (many) Windows / other façade features often misdetected as corners Architectural designs are seldom perfect extrusions Overhangs, balconies, fire escapes, etc.

Concluding Remarks Geometric features can be powerful for discriminating locations Do not always have to rely on prior appearance data Intelligent extension to geometric 2½D features 2D  2D+normals  2D+depth ratios  3D (mod scale) Informal test in ideal conditions show excellent discriminating power Key challenge lies in more robust image analysis Needs robustness to noise and minor deviations from map Future Work Use existing results to bootstrap more advanced (and costly) registration techniques E.g. top-down bundle adjustment working directly on raw image intensities, rather than detected edgels

Credits Joint work with Arridhana Ciptadi Wei-Chian Tan Minh-Tri Pham
Clement Liang-Tien Chia Thanks Teck-Khim Ng Zahoor Zafrulla Rudianto Sugiyarto Research Sponsor Project Tacrea Grant Defence Science & Technology Agency (DSTA), Singapore

Thank You

Scene Assumptions Quasi-Manhattan World
Vertical direction is orthogonal to all horizontal directions Horizontal directions need not be orthogonal to each other Vertical Extrusion Model Each building is a vertical extrusion of a ground-plane cross section Implies buildings have simple vertical planar facades

Case 1: Matching Basic Line VCLHs
Possible Procedure: 3 pairwise basic line correspondences between image and map  unique basis for pose RANSAC Randomly select any 3 pairs of line correspondences Verify with other lines Exhaustive search computational cost: n is the number of building corners in a map

Case 2: Matching Augmented Line VCLHs
Possible Procedure: 2 pairwise augmented line correspondences for unique pose Pose basis: 2 positions + 1 normal Other excess normals used as quick compatibility test RANSAC Verify with other lines only when compatibility tests passed Exhaustive search computational cost: k<1 is the combined factor for reduced cost of quick compatibility test * unknown % of bases

Case 3: Matching with Elemental Planes
Possible Procedure: Exhaustive testing: Correspondence = 1 elemental plane to 1 map building edge Basis for computing pose Voting-based pose estimation Vote in pose-space accumulator array Each vote score from consensus support of other VCLHs Computational cost: m = n is the number of map building edges

Geometric Hashing Geometric hashing explored as a rapid shortlisting stage Get correspondence candidates between detected structural fragments and map building contours Offline Preprocessing Phase: For each building contour in map, use each contour edge as basis Other corners of the building are transformed into the canonical frame Each corner logs (building_id, basis) in related hash bin Online Lookup Phase: For each structural fragment in image, select an elemental plane as basis For other VCLHs in the structural fragment, look up hash bins and vote for building and basis. Building + basis with high votes are correspondence candidates

Dataset I Results – Geometric Hashing
Use large hash bins to minimize missing correct matches (false negatives) Shortlisting selectivity Speed Without hashing: 8 seconds per image on Matlab With hashing: 4 seconds (actual hashing lookup <1s) # of links per structural fragment 3 4 5 Avg # of correspondence candidates 212.98 91.06 113.33 Selectivity (out of 885) 24.1% 10.3% 12.8%

How discriminative are geometric signatures comprising 2½D features? Compute matching score: Signature at an arbitrary spot to other signatures at all locations Under ideal noise-free conditions All building corners are perfectly detectable Generate probe signatures directly from plan-view map Informal analysis Expect uniqueness to be dependent on specific location and design of buildings

Structural Fragments (3D structure with unknown scale) Strong match Elemental Planes (2D + fixed depth ratios) Basic Lines (2D) Augmented Lines (2D + 3D normals) Poor match

Reduction to 1D Image Matching
Vertical lines parallel to image y-axis Problem reduces to: matching points in a 2D space to a 1D point signature 2D camera extrinsic parameters: 2D position of optical center Angle of optical axis Signatures normalized to unit focal length

Camera Calibration from Vanishing Points
Manhattan world 3 v.p.’s  3 orthogonal directions Also computes focal length and principal point Quasi-Manhattan world Ground-vertical v.p. orthogonal to other v.p.’s Assume principal point at image center Obtain 3D v.p. directions w.r.t. camera reference frame In reality, we integrate calibration as part of the iterative v.p. estimation process

Potential Future Directions
Exploit localized architectural design “language”? priors to improve geometric feature detection in poor quality images predict occluded parts of higher order geometric features that form the local architectural “vocabulary” Investigate if reasonable to have prior distribution that buildings close by have similar geometric designs

Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian.

Similar presentations

Presentation on theme: "Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian.

Similar presentations

Presentation on theme: "Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian."— Presentation transcript:

Similar presentations

About project

Feedback