Alexander Kolesnikov and Christoph H. Lampert

Alexander Kolesnikov and Christoph H. Lampert
Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation Alexander Kolesnikov and Christoph H. Lampert Zhongwen (Rex) Zhang 2019/7/17

(fully) supervised image segmentation
Network Input Output Train network Annotated Laborious and time consuming Ground Truth

Weakly supervised image segmentation
Several forms: Dai, J., He, K., Sun, J.: BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: ICCV (2015) Bounding box Partial segmentation Scribble Image tags He, X., Zemel, R.S.: Learning hybrid models for image annotation with partially labeled data. In: NIPS (2009) Tang, Meng, et al. "Normalized cut loss for weakly-supervised cnn segmentation." In: CVPR Using weakly supervised form of annotation has become necessary. Common forms

Related works: Vezhnevets, A., Buhmann, J.M.: Weakly Supervised Semantic Segmentation with a Multi-Image Model. In: ICCV (2011) Self-training paradigm (EM-like procedure) Xu, J., Schwing, A.G., Urtasun, R.: Tell me what you see and I will show you where it is. In: CVPR (2014) Related works based only on image tags Infer labels for segments based on their similarity within and between images. they consider pixel or superpixel labels the latent variables. Then the optimization iterates between estimating the pixel label and updating the parameter of the segmentation generator. Similar to the last paper, the paper I presented employs the method making use of per image loss as well as per pixel loss. per-image & per-pixel loss

Proposed method: Seeding loss: generally enforce better localization of segments Expansion loss: penalizing the network for predicting segmentation mask with too small or wrong objects. enlarge the mask to a reasonable size. … loss: respect the color and spatial structure of the images. i.e. low level cues. Loss function:

The deep image classification network can also be employed to retrieve cues on object localization, but only with high enough score. This is the activation map, we can see that we only select these two small regions as reliable localization for cow and person. We call them seeds. Then the cross entropy is computed only for these seed pixels. This will Encourage the network predictions to match the salient regions/seeds (given by the classification activation map) while ignoring other parts But these cues are not precise for full and accurate segmentation masks

𝐿 𝑒𝑥𝑝𝑎𝑛𝑑 𝑓 𝑋 ,𝑇 =− 1 𝑇 𝑐∈𝑇 𝑙𝑜𝑔 𝐺 𝑐 (𝑓 𝑋 ; 𝑑 + ) − 1 | 𝐶 ′\ T| 𝑐∈ 𝐶 ′\ T log 1− 𝐺 𝑐 𝑓 𝑋 ; 𝑑 − −𝑙𝑜𝑔 𝐺 𝑐 𝑏𝑔 (𝑓(𝑋); 𝑑 𝑏𝑔 ) 𝐿 𝑒𝑥𝑝𝑎𝑛𝑑 𝑓 𝑋 ,𝑇 =− 1 𝑇 𝑐∈𝑇 𝑙𝑜𝑔 𝐺 𝑐 (𝑓 𝑋 ) − 1 | 𝐶 ′\ T| 𝑐∈ 𝐶 ′\ T log 1− 𝐺 𝑐 𝑓 𝑋 Global max pooling (GMP) Global average pooling (GAP) 𝐺 𝑐 𝑓 𝑋 = max 𝑢∈{1,…,𝑛} 𝑓 𝑢,𝑐 (𝑋) 𝐺 𝑐 𝑓 𝑋 = 1 𝑛 𝑢=1 𝑛 𝑓 𝑢,𝑐 (𝑋) Expansion loss actually make the segmentation results consistent with the image-level tags. Usually, the segmentation score map will be aggregated into classification score and then we can apply the standard loss function for multi-label classification. Dc is the decay parameter for class c. Global weighted rank pooling (GWRP) 0< 𝑑 𝑐 <1

Enforcing the segmentation mask to respect the discontinuity of color information of the image. The author added a fully connected CRF layer. Then for each pixel, the author computes the KL divergence of the distribution of CRF output and the segmentation network output over different labels. The average is used as the contrain loss. Downscale the image to match the resolution of segmentation mask Parameters for pairwise potential depend only on image pixels Over the pixel label distribution

Results: The proposed method improves the results by a large margin.

Successful cases, validation set Failure cases, validation set observations Occur almost always in the same background, boats on water, train on the trail. Such cooccurrence may lead the network to consider them a whole thing. Can only cover parts of the object. Since the weak localization cues tend to reliably detect only the discriminative parts. The face of a person. Wrong labels but quite rare since large FOV

Ablation study: Expand loss gives every weak localization of objects while only using seed loss can not suppress the prediction of classes that are not meant to be in the image. Only seed loss and constrain loss (try to encourage nearby regions of similar appearance to have same label) can not cover the whole object especially when an object has parts of different colors.

Alexander Kolesnikov and Christoph H. Lampert

Similar presentations

Presentation on theme: "Alexander Kolesnikov and Christoph H. Lampert"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alexander Kolesnikov and Christoph H. Lampert

Similar presentations

Presentation on theme: "Alexander Kolesnikov and Christoph H. Lampert"— Presentation transcript:

Similar presentations

About project

Feedback