Presentation on theme: "Multi-View Stereo for Community Photo Collections"— Presentation transcript:
1Multi-View Stereo for Community Photo Collections Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steven M. Seitz
2photos varies substantially in lighting, foreground clutter, scale due to various cameras, time, weather
3Images of Notre Dame (a variation in sampling rate of more than 1,000)
4Images taken in the wild—wide variety Lots of photographersDifferent camerasSampling ratesOcclusionDifferent time of day, weatherPost processing
5The problem statement Design an adaptive view selection process Given the massive number of images, find a compatible subsetMulti View Stereo (MVS)Reconstruct robust & accurate depth maps from this subset
6Previous workGlobal View Selection assume a relatively uniform viewpoint distribution and simply choose the k nearest images from each reference view Local View Selection use shiftable windows in time to adaptively choose frames to match
7CPC non-uniformly distributed in 7D viewpoint (translation, rotation, focal length) spacerepresents an extreme case of unorganized images setsAlgorithm overview:- Calibrating Internet Photos- Global View Selection- Local View Selection- Multi-View Stereo Reconstruction
8Calibrating Internet Photos PTLens extracts camera and lens information and corrects for radial distortion based on a database of camera and lens propertiesDiscard images cannot be correctedRemaining images entered into a robust, metric structure-from-motion (SfM) system (uses SIFT feature detector)- generate a sparse scene reconstruction from the matched features- list of images where feature was detectedRemove Radiometric Distortions- all input images into a linear radiometric space (sRGB color space)
9Global View SelectionFor each reference view R, global view selection seeks a set N of neighboring views that are good candidates for stereo matching in terms of scene content, appearance, and scale SIFT selects features with similar appearance - Shared feature points: collocation problem - Scale invariance: stereo matching problem need a measurement to deal these two problems !
10Global score gR for each view V within a candidate neighborhood N (which includes R) FV: set of feature points in View VFV ∩ FR: common feature points of View V and RwN(f): measure angular separation of two views, the larger, the more separated in angulationws(f): measures similarity in scale of two views, the larger, the more similar in scale
11Calculating wN(f) αmax set to 10 degrees α is the angle between the lines of sight from Vi and Vj to fαmax set to 10 degrees
12Calculating ws(f) r = sR(f) / sV(f) sR(f): diameter of a sphere centered at f whose projected diameter in view V equals the pixel spacing in V- favors the case 1 ≤ r <2
13Add scores of all feature points for all view V and select top N Rescaling views If scaleR(Vmin) is smaller than 0.6 (threshold), which means 5x5 R vs 3x3 V, need rescaleFind lowest resolution view Vmin, resample RResample view whose scaleR(V) > 1.2 to match the scale of R
14Local View SelectionGlobal view selection determines a set N of good matching candidates for a reference view RSelect a smaller set A∈N (|A|=4) of active views for stereo matching at a particular location in the reference view
16Stereo Matching Use nxn window centered on point in R Goal: To maximize photometric consistency of this patch to its projections into the neighboring viewsScene Geometry ModelPhotometric Model
17Scene Geometry Model Window centered at pixel (s, t) oR is the center of projection of view RrR(s,t) is the normalized ray direction through the pixelReference view corresponds to a point xR(s,t) at a distance h(s,t) along the viewing ray rR(s,t)
19Photometric ModelSimple model for reflectance effects—a color scale factor ck for each patch projected into the k-th neighboring viewModels Lambertian reflectance for constant illumination over planar surfacesFails for shadow boundaries, caustics, specular highlights, bumpy surfaces
32Reconstructing Building Interiors from Images Yasutaka Furukawa Brian Curless Steven M. Seitz University of Washington, Seattle, USARichard SzeliskiMicrosoft Research, Redmond, USA
33Reconstruction & Visualization of Architectural Scenes Manual (semi-automatic)Google Earth & Virtual EarthFaçade [Debevec et al., 1996]CityEngine [Müller et al., 2006, 2007]AutomaticGround-level images [Cornelis et al., 2008, Pollefeys et al., 2008]Aerial images [Zebedin et al., 2008]Google EarthVirtual EarthMüller et al.Zebedin et al.
34Reconstruction & Visualization of Architectural Scenes Manual (semi-automatic)Google Earth & Virtual EarthFaçade [Debevec et al., 1996]CityEngine [Müller et al., 2006, 2007]AutomaticGround-level images [Cornelis et al., 2008, Pollefeys et al., 2008]Aerial images [Zebedin et al., 2008]Google EarthVirtual EarthMüller et al.Zebedin et al.
35Reconstruction & Visualization of Architectural Scenes Little attention paid to indoor scenesGoogle EarthVirtual EarthMüller et al.Zebedin et al.
36Our Goal Fully automatic system for indoors/outdoors Reconstructs a simple 3D model from imagesProvides real-time interactive visualization
37Challenges - Reconstruction Multi-view stereo (MVS) typically produces a dense modelWe want the model to beSimple for real-time interactive visualization of a large scene (e.g., a whole house)Accurate for high-quality image-based rendering
38Challenges – Indoor Reconstruction Texture-poor surfacesComplicated visibilityTexture-poor surfaces: hard for MVS; Complicated visibility: Blockage, depthmapPrevalence of thin structures(doors, walls, tables)
39Outline System pipeline (system contribution) Algorithmic details (technical contribution)Experimental resultsConclusion and future work
50Outline System pipeline (system contribution) Algorithmic details (technical contribution)Experimental resultsConclusion and future work
51Axis-aligned Depth-map Merging Basic framework is similar to volumetric MRF [Vogiatzis 2005, Sinha 2007, Zach 2007, Hernández 2007]First explain the algorithm then differences from competing approaches.
52Axis-aligned Depth-map Merging Basic framework is similar to volumetric MRF [Vogiatzis 2005, Sinha 2007, Zach 2007, Hernández 2007]First explain the algorithm then differences from competing approaches.
53Axis-aligned Depth-map Merging Basic framework is similar to volumetric MRF [Vogiatzis 2005, Sinha 2007, Zach 2007, Hernández 2007]First explain the algorithm then differences from competing approaches.
55Key Feature 1 - Penalty terms Binary penaltyBinary encodes smoothness & data Unary is often constant (inflation)
56Key Feature 1 - Penalty terms Binary penaltyBinary encodes smoothness & dataUnary is often constant (inflation)
57Key Feature 1 - Penalty terms Binary penaltyBinary encodes smoothness & dataUnary is often constant (inflation)
58Key Feature 1 - Penalty terms Binary is smoothness defined as neighboring voxels having the same labelBinary penaltyBinary encodes smoothness & dataUnary is often constant (inflation)Binary is smoothnessUnary encodes data
59Axis-aligned Depth-map Merging Align voxel grid with the dominant axesData term (unary)Put here how typical approaches do, and why they do not workTalk about 4d neighborhood,Put texts at the bottom of the figures
60Axis-aligned Depth-map Merging Align voxel grid with the dominant axesData term (unary)Smoothness (binary)Put here how typical approaches do, and why they do not workTalk about 4d neighborhood,Put texts at the bottom of the figures
61Axis-aligned Depth-map Merging Align voxel grid with the dominant axesData term (unary)Smoothness (binary)Put here how typical approaches do, and why they do not workTalk about 4d neighborhood,Put texts at the bottom of the figures
62Axis-aligned Depth-map Merging Align voxel grid with the dominant axesData term (unary)Smoothness (binary)Graph-cutsPut here how typical approaches do, and why they do not workTalk about 4d neighborhood,Put texts at the bottom of the figures
63Outline System pipeline (system contribution) Algorithmic details (technical contribution)Experimental resultsConclusion and future work
66Conclusion & Future Work Fully automated 3D reconstruction/visualization system for architectural scenesNovel depth-map merging to produce piece-wise planar axis-aligned model with sub-voxel accuracyFuture workRelax Manhattan-world assumptionLarger scenes (e.g., a whole building)
68KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera
69 KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera*, 1Microsoft Research
70A) Depth Map Conversion Reduce noise and calibrate with the inferred camera intrinsic matrix to get the point cloud position in camera coordinate. C.Tomasi, R. Manduchi, "Bilateral Filtering for gray and color images", Sixth International Conference on Computer Vision, pp , New Delhi, India, 1998.
71B) Camera Tracking(ICP)  Zhang, Zhengyou (1994). "Iterative point matching for registration of free-form curves and surfaces". International Journal of Computer Vision (Springer)
72C) Volumetric Integration Signed distance field: divided the world into voxels, each one saves the nearest distance to a surface.2D example B. Curless and M. Levoy. A volumetric method for building complex models from range images. ACM Trans. Graph., 1996.
77OutcomeA system that can reconstruct 3D geometry from large, unorganized collections of photographsUses new distributed computer vision algorithms for image matching and 3D reconstructionAlgorithms designed to maximize parallelism at each state of the pipelineAlgorithms designed to scale well with size of problemAlgorithms designed to scale well with amount of available computation.
78Challenges Images collected from photo sharing websites Images are unstructuredImages taken in no specific orderno control over distributions of camera viewpointsImages are uncalibratedDifferent photographersDifferent camerasLittle knowledge of camera settings for each imageScale of project 2-3 orders of magnitude larger than used with prior methodsAlgorithms must be fast to complete reconstruction in one day
79Applications Government sector uses city models Urban planning and visualizationAcademic disciplines use city modelsHistoryArcheologyGeographyConsumer mapping technologyGoogle EarthGPS navigation systemsOnline Map sites
80Recover 3D Geometry (x, y, z) = (x/z, y/z) Given scene geometry and camera geometry, we can predict where the 2D projections of each point should be in each image. Compare these projections to the original measurements.Scene geometry represented as 3D pointsCamera geometry represented as 3D position and orientation for each cameraEquations:(x, y, z) = (x/z, y/z)
81Correspondence Problem Definition: Automatically estimate 2D correspondence between input imagesDetect most distinctive, repeatable features in each imageMatch features across image pairs by finding similar looking features using approximate nearest neighbors searchFor each pair of images, insert the features of one image into a k-d treeUse features from second image as queries.For each query, if the nearest neighbor is sufficiently far away from the next nearest neighbor, declare a match.Clean up matchesRigid scenes have strong geometric constrains on the locations of matching features3x3 Fundamental Matrix, F, such that corresponding points xij, xik from images j and k satisfy:
82City Scale MatchingGoal: Find correspondence spanning entire collectionSolve using graph estimation problem“Match Graph”Graph vertices = imagesGraph edge exists between two vertices iff they are looking at the same part of the scene and have a sufficient number of feature matchesMultiround schemeIn each round, propose a set of edges in the match graphWhole Image SimilarityQuery ExpansionVerify each edge through feature matching
83City Scale Matching: Whole Image Similarity Used for first round edge proposalMetric to compute overall similarity of two imagesCluster features into visual wordsVisual words weighted using Term Frequency Inverse Document Frequency methodApply document retrieval algorithms to match data setsEach photo represented as sparse histogram of visual wordsCompare histograms by taking inner productFor each image, determine k1 + k2 most similar imagesVerify top k1 imagesResult: sparsely connected match graphGoal: minimize connected componentsFor each image, consider next k2 images and verify pairs which straddle different connected components
84City Scale Matching: Query Expansion Result from first round: sparse match graph, insufficiently dense to produce good reconstructionDefinition, Query Expansion: find all vertices within two steps of the query vertexIf vertices i and k connected to j, propose i and k also connectedVerify edge (i, k)
85City Scale Matching: Implementation Pre-processingVerificationTrack GenerationSystem runs on cluster of computers (“nodes”)“Master node” makes job scheduling decisions
86Implementation: Pre-processing Images distributed to cluster nodes in chunks of fixed sizeNode down-samples images to fixed sizeNode extracts features
87Implementation: Verification Use whole image similarity for first two roundsUse query expansion for remaining roundsSolve with greedy bin-packing algorithmBin = set of jobs sent to a nodeDrawback: requires multiple sweeps over remaining image pairsSolution: consider only fixed sized subset of image pairs for scheduling
88Implementation: Track Generation Definition: A group of features corresponding to a single 3D pointCombine all pairwise matching information to generate consistent tracks across imagesSolved by finding connected components in a graphVertex = features in imagesEdge = connect matching features
89Recover camera posesFind and reconstruct skeletal set, minimal subset of photographs capturing essential geometry of a sceneAdd remaining images to the scene by estimating each camera’s pose with respect to known 3D points matched to the image
90Multiview Stereo Estimate depths for every pixel in every image Merge resulting 3D points into a single modelScale exceeds MVS algorithms abilityGroup photos into clusters that each reconstruct part of the scene