Adaptive Segmentation Based on a Learned Quality Metric

Adaptive Segmentation Based on a Learned Quality Metric
I. Frosio1, E. Ratner2 1 NVIDIA, USA, 2 Lyrical Labs, USA

Motivation: good / bad segmentation
Let me start with showing some segmentation result. This is a the output of SLIC superpixel segmentation for an image of the sky with some clouds. Let’s have a look at the segmentation result… We can put a like here, where the sky and the cloud are well separated, but we should also put a dislike here, where cloud and sky are merged together. SLIC (Achanta, 2012)

Let’s move to another segmentation algorithm: graph-cut. Let us put a like for larger sky segment, which also preserves the boundary with the cloud. But what about the segment including both cloud and sky? Dislike! GRAPH-CUT (Felzenszwalb, 2004)

Finally, this slide shows the result obtained with the method proposed here. We still have one like here and one dislike here… Oops, no, two likes. ADAPTIVE GRAPH-CUT (our)

SLIC (Achanta, 2012) GRAPH-CUT (Felzenszwalb, 2004) ADAPTIVE GRAPH-CUT (our) > > We compared three different segmentation of the same image… And we decided that adaptive graph cut is better than graph cut and this is better than SLIC. Why do we make this classification? ? ? ?

Achanta, 2012 (SLIC); Kaufhold, 2004: segmentation algorithms aggregate sets of perceptually similar pixels in an image. Felzenszwalb, 2004 (graph-cut): a segmentation algorithm should capture perceptually important groupings or regions, which often reflect global aspects of the image. In this case the main issue with segmentation is that we do not have any general purpose solution approaching human level competence, and evaluating the quality of a segmentation algo remains challenging even today (I do no think things changed a lot since 2001). This is also evident when we analyze the inspiring principles of different segmentation algorithm, where the main idea is to aggreate PERCEPTUALLY similar pixels.

Motivation: segmentation & video compression
Frame segmentation Segment motion estimation Encoding True block and sub-block motion vectors

Aim #1: use the human factor (aka segmentation quality metric)
So the lesson we learned up to now is that segmentation quality has to be evaluated by a human observer.

Aim #2: automatic parameter tuning

1) Pick a segmentation algorithm…
Road map 3) … And put them together (autotuning). 2) … Learn a quality metric including the human factor (application needs) … The requirements of the application enter through the human evaluation of the segmentation quality 1) Pick a segmentation algorithm…

Graph-cut Graph: Nodes: Edges: Weights: w(vi, vj)>>0
Let me start with introducing some basic conceptfor graph cut. We have nodes, edges and weights. The larger the distance in terms of color between two pixels, the larger the weight. vi w(vi, vj)=0

Graph-cut Cm Internal difference:
Let’s define then a connected component in the graph, like the ones represented here.

Graph-cut Cm Difference between components: Cn
Let’s define then a connected component in the graph, like the ones represented here. Cn

Graph-cut 10 15 12 Ck Boundary predicate: Cn
Let’s define then a connected component in the graph, like the ones represented here. 10 15 12 Cn

Graph-cut 15 8 11 C1 Boundary predicate: C2
There is a last term, tau(Ck), which is defined as the ratio between a constant k and the number of pixels in a given component. When the component is small, the denominator is small and this term is large (it can dominates Int(Ck)). The effect of this is that the threshold to assert that there is a boundary is higher for smaller components. By the practical point of view, this term prevent the creation of small components, or, in other words, the constant k sets the scale of observation and it is the most significant parameter of the algorithm, that has to be defined by the user. 15 8 11 C2

Graph-cut Observation scale ~ k
Boundary predicate: Observation scale ~ k There is a last term, tau(Ck), which is defined as the ratio between a constant k and the number of pixels in a given component. When the component is small, the denominator is small and this term is large (it can dominates Int(Ck)). The effect of this is that the threshold to assert that there is a boundary is higher for smaller components. By the practical point of view, this term prevent the creation of small components, or, in other words, the constant k sets the scale of observation and it is the most significant parameter of the algorithm, that has to be defined by the user. C2

Graph-cut K = 3 K = 10,000 K = 100

Road map 3) … And put them together (autotuning). 2) … Learn a quality metric including the human factor… 1) Pick a segmentation algorithm…

(Weighted) symmetric uncertainty
4 bits = 33% 7 bits + 5 bits Entropy based average

k vs. Uw vs. quality 160 x 120 image block
Let me go more in details. Let us consider the 160x120 rectangle highlighted in the image in red, and let’s ask a human to classify this area of the image as under-, well- or over-segmented for different values of k. For 1  k  50, over-segmentation occurs: areas that are perceptually homogeneous are divided into several segments. The segmentation looks fine for 75  k  200, whereas for 350  k  10,000 only few segments are present (under-segmentation). Now let me also introduce the wegithed uncertainty index, Uw. This index measures the percentage of information shared between the original image and the segmented one. When it is one, the two images have the same information content, i.e. also noise is represented into the segmented image. On the other hand, when this index is zero, there is no correlation at all between the pixels in the original image and the segmented image. Notice this index is normalized, so that it is comprised between 0 and 1 and independent from the absolute quantity of information contained in the image. Moreover, it is a weighted index since it computes the quantity of information in the red, green and blue channel and makes a weighted average consequently, starting from the symmetric uncertainty computed for each channel. But let’s forget the math details – we only have to remember that Uw is and index that describes how much information is in common between the original and the segmented image. Not surprisingly, when we have a lot of segments Uw is high and it decreases as k increases and the number of segments in the image decreases. 160 x 120 image block

visual inspection & classification
k vs. Uw vs. quality Training 160 x 120 blocks 320x240 rgb images K = [1, …, 10,000] visual inspection & classification We segmented a set of 12 images including flower, portraits, landscapes and sport images using graph-cut and various values of k. For each 160x120 block in the images, we classified it as under-, well- or over-segmented. Here I plot the results for the case of images whose intial resolution is 320x240 (thus these images are divided into 2x2 blocks).

visual inspection & classification
k vs. Uw vs. quality Training 160 x 120 blocks 640x480 rgb images K = [1, …, 10,000] visual inspection & classification It is evident that a single value of k is not sufficient to correctly segment all the images. It is also evident that, for all the blocks, we have a S-shaped curve in the log(k) / Uw space.

Learning the metric Uw = m log(k) + b
We want to identify a curve (a line in the log(k), Uw space) s.t. the points on the curve are associated to well-segmented blocks. Once we have defined this area, we can force the segmentation algorithm to produce results that lie close to this line, so they are likely to be well segmented. To compute this curve, we used a SVM like approach – we define the classification error for over- / well- segmented blocks (delta us and delta we) and give more importance to these error if we are far from the line that separates the under-segmented and well-segmented areas. We minimize this cost function with Nelder-Mead simplex algorithm and we get the green line in the plots, which is associated to an area where the human observer classified the blocks as well segmented. Uw = m log(k) + b

Road map 3) … And put them together (autotuning). 2) … Learn a quality metric including the human factor… 1) Pick a segmentation algorithm…

Automatic k selection Given a 160x120 block, we are now able to automatically select k s.t. the quality of the segmented image is optimal. This is an iterative process. Let’s start by segmenting the image for k = 1 (very low) and k = (very high). In both case we can measure Uw after segmentation, and we realize that for k = 1 we have oversegmentation, since the point is over the optimal line, whereas for k = we have undersegmentation, since the corresponding Uw is under the optimal line. Thus the optimal value of k has to lie between k = 1 and k = We go on with a bracketing search strategy for the identification of the optimal k.

Automatic k selection At iteration 1, we therefore try to segment the image with k = 100 (average value in the log space). We measure the weighted symmetric uncertainty Uw and we realize that this is still high (we are above the optimal line), so we have to increase k.

Automatic k selection Segmentation with the new value of k leads to undersegmentation. So we have to decrease k.

Automatic k selection At iteration 3 the point is still under the optimal line. K has to be increased…

Automatic k selection After 5 iteration we are already at convergence.

… and adaptivity k = k(x,y)
To get an adaptive segmentation procedure, we compute thethe optimal k for each (non overlapping) 160x120 block in the image, we assign the optimal k to each pixel of the image and we smooth it to avoid brisk transitions.

Road map

Adaptive graph-cut (ours) Graph-cut (Felzensswalb, 2004) *
Results - Quality Adaptive graph-cut (ours) Graph-cut (Felzensswalb, 2004) * SLIC (Achanta, 2012) * * Same number of segments forced for each algorithm We compared numerically the developed adaptive segmentation algorithm with the original version of graph cut, and with SLIC. Inter-class contrast measures the contrast between different segmentes – it should be high if the segmentes are really different. The Intra-class uniformity is a measure of the std of the pixels within the same segments – it should be low if each segment is uniform (with the exception of a textured segment). The ratio between the two is a measure which is more independent wrt texture or noise in the image.

Results

Results SLIC Graph-cut Adaptive graph-cut

Results

Results SLIC Graph-cut Adaptive graph-cut

Results: inter-class contrast (the higher the better)
Sum of the contrasts among segments weighted by their areas (Chabrier, 2004) We compared numerically the developed adaptive segmentation algorithm with the original version of graph cut, and with SLIC. Inter-class contrast measures the contrast between different segmentes – it should be high if the segmentes are really different. The Intra-class uniformity is a measure of the std of the pixels within the same segments – it should be low if each segment is uniform (with the exception of a textured segment). The ratio between the two is a measure which is more independent wrt texture or noise in the image.

Results: intra-class uniformity (the lower the better)
Sum of the normalized standard deviation for each region (Chabrier, 2004) We compared numerically the developed adaptive segmentation algorithm with the original version of graph cut, and with SLIC. Inter-class contrast measures the contrast between different segmentes – it should be high if the segmentes are really different. The Intra-class uniformity is a measure of the std of the pixels within the same segments – it should be low if each segment is uniform (with the exception of a textured segment). The ratio between the two is a measure which is more independent wrt texture or noise in the image.

Results: contrast - uniformity ratio (the higher the better)
We compared numerically the developed adaptive segmentation algorithm with the original version of graph cut, and with SLIC. Inter-class contrast measures the contrast between different segmentes – it should be high if the segmentes are really different. The Intra-class uniformity is a measure of the std of the pixels within the same segments – it should be low if each segment is uniform (with the exception of a textured segment). The ratio between the two is a measure which is more independent wrt texture or noise in the image.

Discussion LEARNED segmentation quality metric including the HUMAN FACTOR Iterative method to AUTOMATICALLY and ADAPTIVELY compute the optimal scale parameter

A more general approach (edge thresholding segmentation in YUV)

A more general approach (edge thresholding segmentation in YUV)
Openboradcast encoding (x264) Lyricallabs encoding (adaptive segmentation) Show

Open issues & improvements
Resolution dependency (160x120 blocks) Learning: the Berkeley Segmentation Dataset Avoid iterations (see I. Frosio, SPIE EI 2015)

Questions ? ? ?

Adaptive Segmentation Based on a Learned Quality Metric

Similar presentations

Presentation on theme: "Adaptive Segmentation Based on a Learned Quality Metric"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Adaptive Segmentation Based on a Learned Quality Metric

Similar presentations

Presentation on theme: "Adaptive Segmentation Based on a Learned Quality Metric"— Presentation transcript:

Similar presentations

About project

Feedback