Semantics Sensitive Segmentation and Annotation of Natural Images

Semantics Sensitive Segmentation and Annotation of Natural Images
2008 IEEE International Conference on Signal Image Technology and Internet Based Systems Amina Asghar Naveed Iqbal Rao National University of Sciences and Technology, Pakistan

Outline Introduction Semantic Sensitive Segmentation of Color Images
Adaptive medoidshift clustering N-cut method Semantics Sensitive Annotation of Natural Images Experiments Conclusion

Introduction To index and retrieve them efficiently is a challenging problem. With near one decade research, it is found that content based image retrieval (CBIR) is a practical and satisfactory solution to this challenge [1]. Such retrieval systems exploit low-level features such as colors, textures, shapes with or without user relevance feedback, to retrieve images. However the performance of CBIR systems is limited due to the existence of “semantic gap” between low level features and high level semantics concepts of image. [1] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta and R. Jain, “Content-based image retrieval at the end of the early years”, IEEE Trans, PAMI, vol.22, 2000.

Introduction Several approaches have been presented to bridge up the semantic gap [2], [3], [4], [5]. Most of these approaches utilize a segmentation method, followed by region based feature extraction, to represent the semantic content of image. However, all these efforts suffer from the problem of meaningful segmentation. [2] L. Zhu, et al., “Keyblock: An approach for content-based image retrieval,” ACM Multimedia 2000, pp. 157–166, Oct. 2000. [3] W. Wang, Y. Song, A. Zhang, “Semantics retrieval by content and context of image regions,” Proc. 15th Int. Conf. Vision Interface, pp. 17–24, May 2002. [4] J. Li, J. Wang, “Automatic linguistic indexing of pictures by a satatistical modeling approach,” IEEE Tr.PAMI, vol. 25, Sept.2003. [5] A. Mojsilovi´c, B.E. Rogowitz, “Semantic metric for image library exploration,” IEEE Tr. MM, vol. 6, pp. 828– 838, Dec.2004.

Introduction A significant number of region based image annotation techniques have been proposed [6], [7], [8]. However these techniques only enable annotation of homogenous regions, which represents only the content of image and have little correspondence to relevant semantic concepts. [6] P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth, “Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary”, ECCV, 2002. [7] K. Barnard and D. Forsyth, “Learning the semantics of words and pictures”, Proc. ICCV, pp , 2001. [8] K. Branard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, M.I. Jordan,“Matching words and pictures”, ACM SIGIR, pp , 2003.

Introduction We present a new effective and meaningful segmentation method of natural color images. This method performs clustering at two levels. Adaptive Medoidshift (AMS). N-cut method N-cut method performs the perceptual grouping of initial segmented regions into final meaningful regions.

Introduction The AMS algorithm is a modification of recently presented medoidshift algorithm [11] . [11] Y.A. Sheikh, E. A. Khan and T. Kanade. Mode seeking by medoidshift. IEEE Trans, ICCV: 1-8,2007

Adaptive medoidshift clustering
In medoidshift algorithm, mode is seeked based on approximating the local gradient using a weighted estimate of medoids. Like mean shift [9], medoidshift clustering automatically computes the number of clusters but unlike mean shift, the algorithm does not require the definition of a mean. computes the number of clusters definition of a mean [9] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24(5): , 2002.

Given a set of samples N, denoted by { 𝑥 𝑖 } ∈ ℜ 𝑑 , i = 1,…,N, multivariate kernel density estimate of unknown underlying density f at a point x, is given by G(.)= -K'(.) bandwidth kernel function

The medoidshift is non-parametric clustering method which calculates weighted medoid 𝑦 k+1 that is considered to be best sample data point that minimizes the function. A weighted medoid is calculated as following for every sample data point until the mode is obtained. The medoidshift algorithm uses a fixed global bandwidth h for every sample data point to complete mode seeking, and furthermore it is predefined. G(.)= -K'(.)

We made the medoidshift adaptive by using different bandwidths ℎ 𝑖 associated with each data point 𝑥 𝑖 , and it is automatically calculated using neighborhood information of the data point. This means that each kernel would have its own size and orientation associated with its respective data point.

In order to reduce the computational complexity, we use concept of nearest neighbors of fixed number to calculate ℎ 𝑖 for density estimation [12] [13]. Let 𝑥 𝑖,𝑘 be the kth nearest neighbors of data point 𝑥 𝑖 , then ℎ 𝑖 is calculated as A higher value of k will yield a larger bandwidth and a smaller k value will yield a smaller bandwidth. 𝑥 𝑖 ℎ 𝑖 𝑥 𝑖,6 [12] L. Breiman, W. Meisel, and E. Purcell. Variable kernel estimates of multivariate densities. Technometrics, 19: 135- 144, 1977 [13] B. Georgescu, I. Shimshoni, and P. Meer . Mean shift based clustering in high dimensions: A texture classification example. IEEE Trans, ICCV, 2003

AMS (k=10) AMS (k=5) AMS (k=30)

The use of k to calculate variable bandwidth ℎ 𝑖 is better to tune the image data for segmentation purpose than using fixed global bandwidth h.

First level clustering(AMS) to extract only a few representative spatial adaptive dominant colors that can differentiate neighboring regions in the image.

N-cut method In order to carry out global region clustering using graph theory, regions are treated as nodes of graph in image plane. A weighted undirected region adjacency graph G = (V, E, W) is constructed [10]. V is set of nodes that represents the regions produced by AMS E is set of edges that denotes the connectivity of regions W is weight matrix that represents dissimilarity between regions [10] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8): , 2000.

N-cut method The weight w ( 𝑅 𝑖 , 𝑅 𝑗 ) between region 𝑅 𝑖 and region 𝑅 𝑗 is given as F( 𝑅 𝑖 ) = {L( 𝑅 𝑖 ) , a( 𝑅 𝑖 ) , b( 𝑅 𝑖 )} color feature vector of the region 𝑅 𝑖 α positive scaling factor

N-cut method

N-cut method Using the graph representation, N-cut method is applied to perceptually group the over-segmented regions into meaningful segments. The N-cut equation for partitioning of graph into two disjoints sets A and B (V-A) [10] is given as association of nodes in A to all nodes V in graph.

N-cut method Second level clustering(N-cut)
improvement in the segmentation performance by correctly partitioning of graph into final segments of perceptual importance

Semantics Sensitive Annotation of Natural Images
we extract the representative properties of every region by incorporating semantics-related features such as 6-dimensional Lab dominant color and color variances, 8-dimensional wavelet texture features, 3-dimensional Tamura texture features (coarseness, directionality and contrast), 1- dimensional density ratio for coarse shape representation and 6 dimensional spatial location( 2- dimensions of region centre and 4- dimensions of region bounding box)

We have used the support vector machines [14] in one versus all (OVA) mode to classify the segments of image. The labeled training samples set is represented by XTrain 𝑖 = { 𝑋 𝑗 , 𝐿 𝑖 ( 𝑋 𝑗 )| j = 1,…,N} , containing positive samples of a semantic concept i and negative samples. Each labeled training sample is a pair of ( 𝑋 𝑗 , 𝐿 𝑖 ( 𝑋 𝑗 )), where 𝑋 𝑗 is feature vector of salient region j and 𝐿 𝑖 ( 𝑋 𝑗 ) is the semantic label of region j which is either +1 or -1. [14] S. Tong and E. Chang, “Support vector machine active learning for image retrieval”, ACM MM, 2001.

For positive training samples 𝑋 𝑗 with semantic label 𝐿 𝑖 ( 𝑋 𝑗 )= +1 , there exists transformation parameter w and b such as w. 𝑋 𝑗 + b > +1 . Similarly for negative training samples 𝑋 𝑗 with semantic label 𝐿 𝑖 ( 𝑋 𝑗 )= −1 , the constraint is w. 𝑋 𝑗 + b > −1. The margin between these two supporting planes will be 2 | 𝑤 | 2 .

In order to enable the optimal separating hyperplane to be generalized, the margin maximization problem is transformed to following optimization problem, 𝜉 𝑗 ≥ 0 misclassification rate, C > 0 penalty parameter to adjust the training error rate w. 𝑤 𝑇 /2 the regularization term Φ( 𝑋 𝑗 ) maps 𝑋 𝑗 into higher dimensional space

Experiments Datasets Corel LabelMe Flickr
The k (number of neighbors) in AMS algorithm is set to 10.

Experiments Color clustering by AMS (first level clustering) N-cut
(second level clustering) Original Image N-cut alone

Experiments Beach Sailing Garden Mountain

Experiments Recall = TP / (TP+FN) Precision = TP / (TP+FP)

Conclusion In this paper we, have presented new perceptual and effective techniques for segmentation and annotation of natural color images, that can be effectively used for indexing and retrieval purposes of images. The segmentation method segments the image into semantic sensitive salient regions which represents the semantic content of image. The proposed approach combines AMS clustering and N-cut grouping to achieve a significant increase in performance of image segmentation. Our annotation method of segments and scene of image, has achieved very good performance. This significantly reduces the semantic gap.

Semantics Sensitive Segmentation and Annotation of Natural Images

Similar presentations

Presentation on theme: "Semantics Sensitive Segmentation and Annotation of Natural Images"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semantics Sensitive Segmentation and Annotation of Natural Images

Similar presentations

Presentation on theme: "Semantics Sensitive Segmentation and Annotation of Natural Images"— Presentation transcript:

Similar presentations

About project

Feedback