Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey of Object Classification in 3D Range Scans

Similar presentations


Presentation on theme: "Survey of Object Classification in 3D Range Scans"— Presentation transcript:

1 Survey of Object Classification in 3D Range Scans
Allan Zelener The graduate center, cuny January 8th 2015

2 Overview Introduction Urban object classification
Problem definition: Object Recognition, Object Classification, and Semantic Segmentation Problem domains: LiDAR scanners for outdoor scenes and RGB-D sensors for indoor scenes Urban object classification Case study: Vehicle object detection and classification Indoor object classification Cluttered scenes with large variety of objects Related Works Comparison and Conclusions Criteria for evaluation: Classification accuracy, range of classes, use of data Context through structured prediction and learned 3D feature representations

3 Object Recognition Query Matches (Decreasing score order) Model Scene
Lai and Fox (IJRR 08) Mian, Bennamoun, Owens (IJCV 09)

4 Object Classification
Segmentation or sliding template used to find candidate regions for classification Feature based classification may be invariant to pose and intra-class variation More compressed representation than entire database of object models Detection and recognition may still work better in practice for controlled applications Golovinskiy, Kim, and Funkhouser (ICCV 2009)

5 Semantic Segmentation
Every point in the scene is labeled, including both objects of interest and background Typically a joint optimization of segmentation and classification Formally utilizes context in MRF/CRF model, where by context we mean nearby regions Wu, Lenz, and Saxena (RSS 2014)

6 LiDAR Scans for Outdoor/Urban Scenes
Long range sensors for outdoor scenes Fast scans at low resolution or slow scans at high resolution, depending on number of individual sensors Moving sensors and registration from multiple scans result in unstructured point cloud data with no adjacency grid RGB imagery tends to be low quality, challenging to align, or simply unavailable

7 RGB-D Images for Indoor Scenes
Short range sensors for indoor scenes Real-time 30 FPS depth maps based on structured light or time of flight in infrared Integrated RGB camera is better aligned and provides better quality under indoor conditions than LiDAR systems RGB-D image grid makes it well suited for traditional 2D computer vision techniques on image frames from a single view

8 Patterson et al. Object Detection from Large-Scale 3D datasets Using Bottom-up and Top-down Descriptors. Patterson, Mordohai, and Daniilidis. (ECCV 2008) Spin Image Extended Gaussian Images

9 Patterson et al. Compute normals for all points and spin images for a subset of sampled points Classify spin image features as either positive (object) or negative (background) points using nearest neighbor classifier. Greedy region growing of positively classified points gives object hypothesis Compute EGI and constellation EGI for object hypothesis and compute alignment and similarity with database model objects. Rotation hypothesis based on angles subtended by pairs of points Translation based on maximum frequency of Fourier transform of best rotation hypothesis Similarity based on fraction of inliers defined as query points that are nearby model points with small cosine similarity between normals after alignment If similarity is above a threshold then the object is positively detected and points that overlap with the database model after alignment are labeled to obtain segmentation.

10 Patterson et al. Precision 0.92 and Recall 0.74 for chosen inlier threshold parameter Computation and comparison of EGIs is slow due to alignment Cost of object detection grows linearly in the size of the database Recall Precision

11 Huber et al. Parts-based 3D Object Classification. Huber, Kapuria, Donamukkala, and Hebert. (CVPR 2004)

12 Huber et al. Vehicles are segmented into front/middle/back parts and part classes are generated as follows: For each part 𝑟 𝑖 , the distance between spin image features in 𝑟 𝑖 and ∀ 𝑟 𝑗 , 𝑖≠𝑗 is computed to produce 𝑝(𝑚 𝑟 𝑖 = 𝑟 𝑗 | 𝑟 𝑖 ) where the event denotes a nearest neighbor match from a feature of part 𝑟 𝑖 to a feature in part 𝑟 𝑗 . A symmetric similarity matrix is computed as the average of the matching probabilities between all pairs of parts. Part classes are determined by agglomerative clustering and the features for each part class are clustered by k-means to produce a class representation.

13 Huber et al. Relationship between object class and part class is determined by Bayes’ theorem, 𝑝 𝑂 𝑗 𝑅 𝑖 = 𝑝 𝑅 𝑖 𝑂 𝑗 𝑝 𝑂 𝑗 𝑗 𝑝 𝑅 𝑖 𝑂 𝑗 𝑝 𝑂 𝑗 𝑝 𝑅 𝑖 𝑂 𝑗 is determined empirically from the training data and 𝑝 𝑂 𝑗 is assumed uniform Object class is determined by maximizing likelihood over all parts arg max 𝑗 𝑅 𝑖 ∈ℛ 𝜋 𝑅 𝑅 𝑖 𝑝 𝑂 𝑗 𝑅 𝑖 Here 𝜋 𝑅 𝑅 𝑖 is determined by matching features between the query part 𝑅 𝑖 and the set of part classes 𝑅 as described during the part class generation stage.

14 Huber et al. Excellent accuracy on simulated scans but lacks experiments for real data. Consistent part segmentation requires recovery of vehicle pose. Improvement over classifier without using parts Solid Line: Parts-based Dashed Line: Object-based

15 Golovinskiy et al. Shape-based Recognition of 3D Point Clouds. Golovinskiy, Kim, and Funkhouser. (ICCV 2009)

16 Golovinskiy et al. Localization and segmentation are based on K-NN graph weighted by point distances Localization performed by agglomerative clustering Segmentation performed by min-cut using virtual background vertex and background radius parameter. Contextual features use geolocation alignment with street map and occupancy grid of neighboring objects. Relatively poor classification performance, perhaps due to a lack of local features

17 Stamos et al. Online Algorithms for Classification of Urban Objects in 3D Point Clouds. Stamos, Hadjiliadis, Zhang, and Flynn. (3DIMPVT 2012) Online classification of scan lines using HMMs and CUSUM hypothesis testing 𝑆 𝑛+1 = max 0, 𝑆 𝑛 + 𝑥 𝑛 − 𝜔 𝑛 𝜔 𝑛 is the likelihood of observation 𝑥 𝑛 under the null hypothesis HMM Change detected at large value of 𝑆 𝑘

18 Stamos et al. Simple features between points
Signed angles: sgn 𝐷 𝑖,𝑘 ⋅ 𝐷 𝑖,𝑘−1 𝑧 𝑇 ⋅ 𝐷 𝑖,𝑝 Line angles: Consistent for collinear points Sequence of online classifications performed to refine from coarse to fine classes Each additional classifier incorporates more prior knowledge about the target class. E.g., cars should be on the street at a certain height

19 Xiong et al. 3D Scene Analysis via Sequenced Predictions over Points and Regions. Xiong, Munoz, Bagnell, Hebert. (ICRA 2011) Context accumulated from neighbor segments Context from segment sent down to individual points Context from points averaged and sent up to segment

20 Xiong et al. Multi-Round Stacking generates contextual features by using a sequence of weak classifiers to predict class labels of neighbors Two-level hierarchy of regions: segments and points. MRS is run on one level of the hierarchy and then the results are passed on to the other level. Sensitive to quality of labeling in training, particularly if there is a “misc” class Contextual features for tree-trunk class

21 Silberman and Fergus Indoor Scene Segmentation Using a Structured Light Sensor. Silberman and Fergus. (ICCV 2011)

22 Silberman and Fergus Conditional Random Field
𝐸 𝑦 = 𝑖 𝜙 𝑥 𝑖 ,𝑖;𝜃 + 𝑖,𝑗 𝜓 𝑦 𝑖 , 𝑦 𝑗 𝜂 𝑖,𝑗 𝜙 ⋅ - Color/depth features and location prior 𝜓 𝑦 𝑖 , 𝑦 𝑗 =0 if 𝑦 𝑖 = 𝑦 𝑗 , 3 otherwise 𝜂 ⋅ - Spatial transition based on gradient Location prior improves performance for classes in consistent configurations with respect to camera but decreases otherwise E.g., bookshelves in office vs library 3D Location Priors

23 Couprie et al. Indoor Semantic Segmentation Using Depth Information. Couprie, Farabet, Najman, LeCun. (ICLR 2013)

24 Couprie et al. Simple application of CNN framework improves accuracy on classes at consistent depths such as walls and floors but performance for objects of interest degrades Depth gradients alone are not informative and depth information must be normalized or interpreted to be invariant to variations

25 Anand et al. Contextually Guided Semantic Labeling and Search for 3D Point Clouds. Anand, Koppula, Joachims, and Saxena. (IJRR 2012)

26 Anand et al. MRF trained by structured SVM.
𝑓 𝑤 𝑥,𝑦 = 𝑖∈𝑉 𝑘=1 𝐾 𝑦 𝑖 𝑘 𝑤 𝑛 𝑘 ⋅ 𝜙 𝑛 𝑖 + 𝑖,𝑗 ∈𝐸 𝑇 𝑡 ∈𝑇 𝑙,𝑘 ∈𝑇 𝑦 𝑖 𝑙 𝑦 𝑗 𝑘 [ 𝑤 𝑡 𝑙𝑘 ⋅ 𝜙 𝑡 𝑖,𝑗 ] 𝜙 𝑛 𝑖 - Unary features 𝜙 𝑡 𝑖,𝑗 - Pairwise features, may be associative or non-associative depending on 𝑇 𝑡 Associative – Feature between neighboring segments of same class, 𝑇 𝑡 only has self loops Object non-associative –Features between related class labels of neighboring segments

27 Anand et al. 𝑟 𝑖 − 𝑟 𝑗 𝑇 𝑛 𝑖 ≥0 𝑟 𝑗 − 𝑟 𝑖 𝑇 𝑛 𝑗 ≥0

28 Anand et al. Object part categories better exploit relationships than object categories alone Registered 3D scenes provide more coverage and context than single view scenes Common errors include objects that lie on top of other objects, e.g. a book on a table. Either the result of poor segmentation or smoothing effect from pairwise potentials

29 Related Works Unsupervised Feature Learning for RGB-D Based Object Detection. Bo, Ren, and Fox (ISER 2012)

30 Related Works Convolutional-Recursive Deep Learning for 3D Object Classification. Socher, Huval, Bhat, Manning and Ng. (NIPS 2012)

31 Related Works Kahler and Reid. (ICCV 2013)
Müller and Behnke. (ICRA 2014)

32 Related Works Sliding Shapes for 3D Object Detection in Depth Images. Song and Xiao. (ECCV 2014)

33 Related Works Instance Segmentation of Indoor Scenes Using a Coverage Loss. Silberman, Sontag, and Fergus. (ECCV 2014) Input Perfect Semantic Segmentation Correct Instance Segmentation Naïve Region Growing

34 Related Works Hierarchical Semantic Labeling for Task-Relevant RGB-D Perception. Wu, Lenz, and Saxena. (RSS 2014) NO-CT: Non-overlapping constraints HR-CT: Hierarchical relation constraints

35 Related Works Classification of Vehicle Parts in Unstructured 3D Point Clouds. Zelener, Mordohai, and Stamos. (3DV 2014) Unsupervised segmentation of parts by RANSAC plane fitting Structured prediction over parts and object class by HMM and structured perceptron Does not require pose estimation, experiments performed using real data p1 p2 pn x1 x2 xn c x1 x2 … xn

36 Comparison Fine tuned object recognition methods still appear to work best for specific tasks E.g. for car detection in urban scenes Indoor scenes have many potential objects of interest, difficult to scale number of classes Object classification requires 3D shape features that are discriminative Simple accumulators like the spin image are still competitive choices for features Learned representations may do better, but how to construct them is a challenge Differences in representations between point clouds and RGB-D images Errors in segmentation may propagate to classification Semantic segmentation jointly optimizes segmentation and classification Structured prediction provides useful context-based relationships, but can lead to false assumptions Context relationships are also often fixed and manually engineered

37 Conclusions 3D shape and context based features provide consistent improvements to classification systems Learned 3D representations that are aware of the unique properties of 3D shape features may see improvement over simple application of 2D techniques Structured prediction to model relationships between objects, their parts, and their environment also improves performance Sparse or hierarchical structured relationships are desirable for computational efficiency


Download ppt "Survey of Object Classification in 3D Range Scans"

Similar presentations


Ads by Google