Nonparametric Semantic Segmentation

Nonparametric Semantic Segmentation
Tackgeun You

Motivation Semantic Segmentation Nonparametric Semantic Segmentation?
Object detection + Segmentation Nonparametric Semantic Segmentation? Scalable with “the number of object categories” References Conferences (CVPR 2009) Nonparametric scene parsing: Label transfer via dense scene alignment (ECCV 2010) SuperParsing: Scalable Nonparametric Image Parsing with Superpixel Journal (TPAMI 2011) Nonparametric Scene Parsing via Label Transfer (IJCV 2012) SuperParsing: Scalable Nonparametric Image Parsing with Superpixels

0. “Recognition by Matching” Nonparametric Scene Parsing via Label Transfer
Database Of Labeled Images Query Query Image Transferring GT labels By Matching SIFT Merging & Shape Constraint Final Labeling Retrieved Set with Labels Scene Retrieval Transferred Labels for Retrieval Set

1. Scene Retrieval Nonparametric Scene Parsing via Label Transfer
Database Of Labeled Images Query Query Image Get {k−NN}∩{𝜖−𝑁𝑁} images in Global features Example : K=5,𝜖=1 𝟏+𝝐 ×𝒅 𝒎𝒊𝒏 Retrieved Set with Labels 1. Superpixel Approaches의 경우에는 k-NN으로만 뽑았는데 여기서는 e-NN까지 기준을 두는 이유입니다. 2개의 기준을 사용해서 뽑는게 좋다는 것을 보여주는 이미지인데요. Scene Retrieval

1. Scene Retrieval Nonparametric Scene Parsing via Label Transfer
Database Of Labeled Images Query Query Image Get <k,𝜖>- 𝑁𝑁 images in Global features Space Paper: k=150,𝜖=5 Global features ∈ℝ 5160 GIST (960) Spatial pyramid (4200) Get 𝑀≤𝑘 images by achieved energy of SIFT flow Retrieved Set with Labels 1. Superpixel Approaches의 경우에는 k-NN으로만 뽑았는데 여기서는 e-NN까지 기준을 두는 이유입니다. 2개의 기준을 사용해서 뽑는게 좋다는 것을 보여주는 이미지인데요. Scene Retrieval

2-1. Dense Matching Nonparametric Scene Parsing via Label Transfer
Dense Correspondence between images

2-2. Optical Flow Nonparametric Scene Parsing via Label Transfer
Brightness consistency assumption 𝐼 𝑥+Δ𝑥, 𝑦+Δ𝑦, 𝑡 =𝐼 𝑥,𝑦,𝑡−1 Approximated into linear by Taylor expansion 𝐼 𝑥+Δ𝑥, 𝑦+Δ𝑦, 𝑡 ≈𝐼 𝑥,𝑦,𝑡 + 𝛻 𝑥 𝐼⋅Δ𝑥+ 𝛻 𝑦 𝐼⋅Δ𝑦 Aperture problem → Using multiple points (image patch!) 𝛻 𝑥 𝐼( 𝒑 1 ) 𝛻 𝑦 𝐼( 𝒑 1 ) ⋯ 𝛻 𝑥 𝐼( 𝒑 𝑛 ) 𝛻 𝑦 𝐼( 𝒑 𝑛 ) Δ𝑥 Δ𝑦 ≈ − 𝛻 𝑡 𝐼 𝒑 1 ⋯ − 𝛻 𝑡 𝐼 𝒑 𝑛 Least-square method 𝒘 = arg min 𝒘 𝑨𝒘−𝒃 2 = 𝑨 𝑻 𝑨 −𝟏 𝑨𝒃 Lucas-Kanade Method Δ𝑥 Δ𝑦 = 𝑖 𝛻 𝑥 𝐼 𝑝 𝑖 𝑖 𝛻 𝑥 𝐼 𝑝 𝑖 𝛻 𝑦 𝐼 𝑝 𝑖 𝑖 𝛻 𝑥 𝐼 𝑝 𝑖 𝛻 𝑡 𝐼 𝑝 𝑖 𝑖 𝛻 𝑦 𝐼 𝑝 𝑖 −1 𝑖 𝛻 𝑥 𝐼( 𝑝 𝑖 ) 𝛻 𝑡 𝐼 𝑝 𝑖 𝑖 𝛻 𝑦 𝐼( 𝑝 𝑖 ) 𝛻 𝑡 𝐼 𝑝 𝑖 𝛻 𝑥 𝐼(𝒑) 𝛻 𝑦 𝐼(𝒑) Δ𝑥 Δ𝑦 ≈− 𝛻 𝑡 𝐼 𝒑

2-2. Optical Flow Nonparametric Scene Parsing via Label Transfer
Brightness consistency assumption 𝐼 𝑥+Δ𝑥, 𝑦+Δ𝑦, 𝑡 =𝐼 𝑥,𝑦,𝑡−1 Approximated into linear by Taylor expansion 𝐼 𝑥+Δ𝑥, 𝑦+Δ𝑦, 𝑡 ≈𝐼 𝑥,𝑦,𝑡 + 𝛻 𝑥 𝐼⋅Δ𝑥+ 𝛻 𝑦 𝐼⋅Δ𝑦 Aperture problem → Using multiple points (image patch!) 𝛻 𝑥 𝐼( 𝒑 1 ) 𝛻 𝑦 𝐼( 𝒑 1 ) ⋯ 𝛻 𝑥 𝐼( 𝒑 𝑛 ) 𝛻 𝑦 𝐼( 𝒑 𝑛 ) Δ𝑥 Δ𝑦 ≈ − 𝛻 𝑡 𝐼 𝒑 1 ⋯ − 𝛻 𝑡 𝐼 𝒑 𝑛 Least-square method 𝒘 = arg min 𝒘 𝑨𝒘−𝒃 2 = 𝑨 𝑻 𝑨 −𝟏 𝑨𝒃 Lucas-Kanade Method Δ𝑥 Δ𝑦 = 𝑖 𝛻 𝑥 𝐼 𝑝 𝑖 𝑖 𝛻 𝑥 𝐼 𝑝 𝑖 𝛻 𝑦 𝐼 𝑝 𝑖 𝑖 𝛻 𝑥 𝐼 𝑝 𝑖 𝛻 𝑡 𝐼 𝑝 𝑖 𝑖 𝛻 𝑦 𝐼 𝑝 𝑖 −1 𝑖 𝛻 𝑥 𝐼( 𝑝 𝑖 ) 𝛻 𝑡 𝐼 𝑝 𝑖 𝑖 𝛻 𝑦 𝐼( 𝑝 𝑖 ) 𝛻 𝑡 𝐼 𝑝 𝑖

2-3. SIFT flow Nonparametric Scene Parsing via Label Transfer
𝒘( 𝒑 𝟏 ) 𝑣( 𝒑 𝟏 ) Matching points by “SIFT” not “patch” Find the 𝒘 which minimize 𝐸 𝒘 𝐸 𝒘 = 𝒑 min⁡( 𝑠 1 𝒑 − 𝑠 2 𝒑+𝒘 𝒑 1 , 𝑡) 𝒑 𝜂 𝑢 𝒑 + 𝑣 𝒑 𝒑,𝒒 ∈𝜖 min 𝜆 𝑢 𝒑 −𝑢 𝒒 , 𝑑 + min 𝜆 𝑣 𝒑 −𝑣 𝒒 , 𝑑 𝒑,𝒒 ∈𝜖 min 𝜆 𝑢 𝒑 −𝑢 𝒒 , 𝑑 + min 𝜆 𝑣 𝒑 −𝑣 𝒒 , 𝑑 + 𝑢( 𝒑 𝟏 ) 𝒑 𝟏 𝑎 𝑛 = 𝑛 𝑘 𝑎 𝑘 𝑛 𝑥−𝑦 1 𝑥−𝑦 2 Data term Match SIFTs cf. optical flow : 𝒘 = arg min 𝒘 𝑨𝒘−𝒃 2 Small displacement term No information then make 𝒘 𝒑 smaller Smoothness term Make adjacent vectors be similar

3. “Dense” Scene Alignment Nonparametric Scene Parsing via Label Transfer
(k) Ground truth

4. Label Transfer Nonparametric Scene Parsing via Label Transfer
Find 𝐜 which minimize 𝐽 𝒄 by 𝐽 𝒄 = 𝒑 𝐸 𝐿 𝒄 𝒑 ;𝑠, 𝑠 𝑖 ′ 𝑖=1:𝑀 +𝛼 𝒑 𝐸 𝑃 (𝒄 𝒑 ) +𝛽 𝒑,𝒒 ∈𝜖 𝐸 𝑠 𝒄 𝒑 , 𝒄 𝒒 ;𝐼 + log 𝑍 𝐸 𝐿 𝒄 𝒑 =𝑙;𝑠, 𝑠 𝑖 ′ = min 𝑖∈ Ω 𝒑,𝑙 𝑠 𝒑 − 𝑠 𝑖 (𝒑+𝒘 𝒑 ) , Ω 𝒑,𝑙 ≠∅ &𝜏, Ω 𝒑,𝑙 =∅ Ω 𝒑,𝑙 = 𝑖; 𝑐 𝑖 𝑝+𝑤 𝑝 =𝑙 , 𝑙= 1,⋯,𝐿 𝐸 𝑃 𝒄 𝒑 =𝑙 =− log ℎ𝑖𝑠 𝑡 𝑙 (𝒑) 𝐸 𝑠 𝒄 𝒑 , 𝒄 𝒒 ;𝐼 =𝛿[𝒄 𝒑 ≠𝒄 𝒒 ] 𝜉+ 𝑒 𝛾 𝐼 𝒑 −𝐼 𝒒 𝜉+1 Likelihood 𝑐 𝑖 ≠ 𝑐 𝑗 𝛿[𝑥] 𝑐 𝑖 = 𝑐 𝑗 Prior Smooth Query Image/SIFT Result/Query Label Retrieval Image/SIFT/Label SIFT flow Transferred Image/SIFT/Label

0. “Recognition by Matching” Nonparametric Scene Parsing via Label Transfer
Database Of Labeled Images Query Query Image Transferring GT labels By Matching SIFT Merging & Shape Constraint Final Labeling Retrieved Set with Labels Scene Retrieval Transferred Labels for Retrieval Set

Merging & Shape Constraint Per-class Likelihoods
0. Overview SuperParsing: Scalable Nonparametric Image Parsing with Superpixel Database Of Labeled Images Query Image Query Superpixel-wise Transferring labels By Visual similarity Merging & Shape Constraint Nearest Scene Retrieval Final Labeling Per-class Likelihoods Retrieved Set with Labels

1. Scene Retrieval SuperParsing: Scalable Nonparametric Image Parsing with Superpixel
Retrieve Nearest 200 images in Global feature Space ∈ℝ 5952 GIST (960) Spatial Pyramid (4200) Tiny image (768) Color histogram (24) Database Of Labeled Images Query Image Query Nearest Scene Retrieval Retrieved Set

Per-class Likelihoods
2. Local Superpixel likelihood SuperParsing: Scalable Nonparametric Image Parsing with Superpixel The likelihood ratio for class 𝑐 𝐿 𝑠 𝑖 ,𝑐 = 𝑃( 𝑠 𝑖 |𝑐) 𝑃( 𝑠 𝑖 | 𝑐 ) = 𝑘 𝑃( 𝑓 𝑖 𝑘 |𝑐) 𝑃( 𝑓 𝑖 𝑘 | 𝑐 ) Non-parametric density estimation 𝑃( 𝑓 𝑖 𝑘 |𝑐) 𝑃( 𝑓 𝑖 𝑘 | 𝑐 ) = 𝑛(𝑐, 𝒩 𝑖 𝑘 )/𝑛(𝑐,𝒟) 𝑛( 𝑐 , 𝒩 𝑖 𝑘 )/𝑛( 𝑐 ,𝒟) = 𝑛(𝑐, 𝒩 𝑖 𝑘 ) 𝑛( 𝑐 , 𝒩 𝑖 𝑘 ) × 𝑛( 𝑐 ,𝒟) 𝑛(𝑐,𝒟) Superpixel features (1741 Dimension) Shape (67), Location (65), Texture/SIFT (800), Color (105), Appearance (704) Per-class Likelihoods Count the number of samples 𝒞 𝑖 ={ , } 세부적인 Feature들을 명시하고 왜 이렇게 많은 Feature를 사용하는지 생각해볼 것. 어떤 Feature하나가 크게 다르게 되면 Separation plane이 존재하기 쉽기 때문에.Feature가 여러 개면 Linearly separable에 유리할 것 같다. Non-parametric estimation의 의미와 특징에 대해서 생각해보기. 𝒩 𝑖 𝑘 :All of superpixels in the retrieval set whose 𝑘-th feature distance from 𝑓 𝑖 𝑘 , threshold 𝑡 𝑘 𝑛(𝑐,𝒟):All of superpixels with class 𝑐 in set 𝒟 𝒟:All of superpixels in training set

3. Contextual Inference SuperParsing: Scalable Nonparametric Image Parsing with Superpixel
Markov Random Field Given an undirected graph 𝐺=(𝑉,𝐸) 𝑣 𝑖 ∈𝑉~ 𝑋 𝑖 Satisfy Local Markov Property Given energy, it maximizes entropy Clique factorization 𝑃 𝑋 1 ,⋯, 𝑋 𝑛 ={ 𝑥 1 ,⋯, 𝑥 𝑛 } = 𝑪∈𝑐𝑙(𝐺) 𝜙 𝑪 ( 𝑥 𝑪 ) Every two vertices are connected by an edge Find 𝒄 which minimize 𝐽 𝒄 𝐽 𝒄 = 𝑠 𝑖 ∈𝑆 𝐸 𝐿 𝑠 𝑖 , 𝑐 𝑖 +𝜆 𝑠 𝑖 , 𝑠 𝑗 ∈𝐴 𝐸 𝑆 ( 𝑐 𝑖 , 𝑐 𝑗 ) 𝐸 𝐿 𝑠 𝑖 , 𝑐 𝑖 =− 𝑤 𝑖 𝑙𝑜𝑔𝐿( 𝑠 𝑖 , 𝑐 𝑖 ) 𝐸 𝑆 𝑐 𝑖 , 𝑐 𝑗 =− 𝑙𝑜𝑔 𝑃( 𝑐 𝑖 , 𝑐 𝑗 ) ×𝛿 𝑐 𝑖 ≠ 𝑐 𝑗 𝒄 a vector of labels for superpixels 𝑆 a set of superpixels 𝐴 a set of adjacent superpixels 𝒞 𝑖 ={ , } 𝒞 𝑗 ={ , , } Likelihood Smoothing (Edge) Many state-of-the-art approaches encode such constraints with the help of CRF models, However, CRFs tend to be very costly both in terms of learning and inference. Markov Random Field로 이 문제를 정의하는 것을 보여줌. “Data Term”에서 Likelihood가 높은 Label을 선택할 것이라는 것을 알려줌. 이 자체는 Maximum Label을 선택하는 것과 동일함. “Smoothing Term”에서 같은 Label이면 Penalty가 없고 다른 Label이면 Penalty가 있음을 알려줌. 𝑐 𝑖 ≠ 𝑐 𝑗 𝛿[𝑥] 𝑐 𝑖 = 𝑐 𝑗

4. Extend to Geometric Classes SuperParsing: Scalable Nonparametric Image Parsing with Superpixel
Semantic Geometric class matching 𝑐 𝑠𝑘𝑦 , 𝑐 𝑔𝑟𝑜𝑢𝑛𝑑 , 𝑐 𝑏𝑢𝑖𝑙𝑑𝑖𝑛𝑔 ↔ 𝑔 𝑣𝑒𝑟𝑡𝑖𝑐𝑎𝑙 𝑐 𝑠𝑘𝑦 ↔ 𝑔 ℎ𝑜𝑟𝑖𝑧𝑜𝑛𝑡𝑎𝑙 Extended MRF energy function to Semantic and Geometric classes Find 𝒄, 𝒈 which maximize 𝐻 𝒄, 𝒈 𝐻 𝒄, 𝒈 =𝐽 𝒄 +𝐽 𝒈 +𝜇 𝑠 𝑖 ∈𝑆𝑃 𝜓( 𝑐 𝑖 , 𝑔 𝑖 ) Many state-of-the-art approaches encode such constraints with the help of CRF models, However, CRFs tend to be very costly both in terms of learning and inference. Clique -> pronoun as “click”! Enforcing coherence of ( 𝑐 𝑖 , 𝑔 𝑖 )

4. Extend to Geometric Classes SuperParsing: Scalable Nonparametric Image Parsing with Superpixel
Many state-of-the-art approaches encode such constraints with the help of CRF models, However, CRFs tend to be very costly both in terms of learning and inference. Find 𝒄, 𝒈 which maximize 𝐻 𝒄, 𝒈 𝐻 𝒄, 𝒈 =𝐽 𝒄 +𝐽 𝒈 +𝜇 𝑠 𝑖 ∈𝑆𝑃 𝜓( 𝑐 𝑖 , 𝑔 𝑖 ) Enforcing coherence of ( 𝑐 𝑖 , 𝑔 𝑖 )

Comparison of NSS-algorithms
SIFT Flow SuperParsing Scene Retrieval <𝑘, 𝜖>-NN GIST + Spatial Pyramid of HoG SIFT flow scoring k-NN 200 images GIST + Spatial Pyramid of SIFT + Tiny images + Color histogram Prior Sum of all labels in dataset Not use Likelihood Transferred labels by SIFT flow Superpixel-wise Nonparametric density estimation with several features Contextual information (Semantic, Geometric) pair MRF Energy function 𝐸 𝒄 = 𝐸 𝐿 (𝒄) + 𝐸 𝑃 (𝒄) + 𝐸 𝑆 (𝒄) 𝐸 𝒄,𝒈 = 𝐸 𝐿 ( 𝑐 𝑖 ) + 𝐸 𝑆 ( 𝑐 𝑖 ) +𝐽 𝒈 +𝜓(𝒄,𝒈) Optimization Belief Propagation Graph-cut

Discussion - Experiments
SIFT Flow Barcelona LMO Polo VOC 2011 Dense Scene Alignment by SIFT flow (2009) 74.75 N/A 74.8 89.8 (2011) SuperParsing: Local Labeling (2010) 73.2 62.5 SuperParsing: MRF (2010) 76.3 66.6 SuperParsing: Joint semantic/geometric (2010) 76.9 66.9 87.9 Semantic Segmentation using Regions and Parts (2012) 40.8 Semantic Segmentation with Second-order Pooling (2012) 47.6 Tensor-based High-order Semantic Segmentation Relation Transfer for Semantic Scene Segmentation (2013) 77.1 94.2 Finding Things: Image Parsing with Regions and Per-Exemplar Detectors (2013) 78.6 Learning Hierarchical Features for Scene Labeling (2013) 78.5 67.8 Rich feature hierarchies for accurate object detection and semantic segmentation(2014) 47.9

Discussion - Datasets

Conclusion Non-parametric approaches
Dataset - concentration on specific classes How to design Energy function in MRF Shape Smoothing filter Additional information can be added Geometric context Performance enhancement by 1 ~ 2%

Semantic Segmentation using SO-NMF
Human Tree 여기에도 Sparse Constraint를 줄 수는 없을까? … N-times

Future Works Future works Other methods Deep learning features
2nd order pooling Hierarchical Inference Detecting objects Deep learning features Reflect Context information Experiments in large datasets

Thank you for listening.
여기에도 Sparse Constraint를 줄 수는 없을까? Thank you for listening.

3-4. Optimization of SIFT flow Nonparametric Scene Parsing via Label Transfer
Matching points by “SIFT” not “patch” Find the 𝒘 which minimize 𝐸 𝒘 𝐸 𝒘 = 𝒑 min⁡( 𝑠 1 𝒑 − 𝑠 2 𝒑+𝒘 𝒑 1 , 𝑡) 𝒑 𝜂 𝑢 𝒑 + 𝑣 𝒑 𝒑,𝒒 ∈𝜖 min 𝜆 𝑢 𝒑 −𝑢 𝒒 , 𝑑 + min 𝜆 𝑣 𝒑 −𝑣 𝒒 , 𝑑 𝒑,𝒒 ∈𝜖 min 𝜆 𝑢 𝒑 −𝑢 𝒒 , 𝑑 + min 𝜆 𝑣 𝒑 −𝑣 𝒒 , 𝑑 If image has 𝒉 𝟐 pixels by Dual-layer loopy Belief Propagation 𝑂 ℎ 8 → 𝑂 ℎ 4 by Coarse-to-fine matching scheme 𝑂 ℎ 4 → 𝑶 𝒉 𝟐 𝒍𝒐𝒈 𝒉

Per-class Likelihoods
2-2. Used Features SuperParsing: Scalable Nonparametric Image Parsing with Superpixel Superpixel features (1741 Dimension) Shape (67), Location (65), Texture/SIFT (800), Color (105), Appearance (704) Per-class Likelihoods 세부적인 Feature들을 명시하고 왜 이렇게 많은 Feature를 사용하는지 생각해볼 것. 어떤 Feature하나가 크게 다르게 되면 Separation plane이 존재하기 쉽기 때문에.Feature가 여러 개면 Linearly separable에 유리할 것 같다. Non-parametric estimation의 의미와 특징에 대해서 생각해보기.

Discussion – Scene Retrieval

Nonparametric Semantic Segmentation

Similar presentations

Presentation on theme: "Nonparametric Semantic Segmentation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonparametric Semantic Segmentation

Similar presentations

Presentation on theme: "Nonparametric Semantic Segmentation"— Presentation transcript:

Similar presentations

About project

Feedback