Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)

Introduction to Information Retrieval (Part 2) By Evren Ermis.

A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.

Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

February, Content-Based Image Retrieval Saint-Petersburg State University Natalia Vassilieva Il’ya Markov

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

Presented by Zeehasham Rasheed

Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.

Scalable Text Mining with Sparse Generative Models

Language Modeling Approaches for Information Retrieval Rong Jin.

FLANN Fast Library for Approximate Nearest Neighbors

Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Image Annotation and Feature Extraction

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Chapter 23: Probabilistic Language Models April 13, 2004.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Flat clustering approaches

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

Multimodal Learning with Deep Boltzmann Machines

Learning to Rank with Ties

Presentation transcript:

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie Mellon University ACM international conference on Multimedia 2004

Introduction Automatic image annotation learn the correlation between image features and textual words from the examples of annotated images apply the learned correlation to predicate words for unseen images Problem each annotated word for an image is predicated independently from other annotated words for the same image

Introduction The word-to-word correlation is important particularly when image features are insufficient in determining an appropriate word annotation sky vs. ocean if ‘grass’ is very likely to be an annotated word, ‘sky’ will usually be preferred over ‘ocean’

Introduction They propose a coherent language model for automatic image annotation that takes into account word-to-word correlation It is able to automatically determine the annotation length for a given image It can be naturally used for active learning to significantly reduce the required number of annotated image examples

Related work Machine translation model Co-occurrence model Latent space approaches Graphic models Classification approaches Relevance language models

Relevance language model Idea first find annotated images that are similar to a test image use the words shared by the annotations of the similar images to annotate the test image T: collection of annotated images J i ={b i,1,b i,2,…,b i,m ;w i,1,w i,2,…,w i,n } J i : annotated image b i,j : the number of j-th blob that appears in the i-th image w i,j : a binary variable indicating whether or not the j-th word appears in the i-th image

Relevance language model Given an image I={b i,1,b i,2,…,b i,m } estimate the likelihood for any word to be annotated for I

Coherent language model Estimate the probability of annotating image I with a set of word {w} (p({w}|I)) Estimate the probability for using a language model θ w to generate annotation words for image I (p(θ w |I)) Θ w ={p 1 (θ w ), p 2 (θ w ),…, p n (θ w )} p j (θ w )=p(w j =1|θ w ) how likely the j-th word will be used for annotation p(θ w |I) ∝ p(I|θ w ) p(θ w )

Coherent language model

Use Expectation-Maximization algorithm to find the optimal solution E-step Z I : normalization constant that ensures M-step Z w : normalization constant that ensures

Determining Annotation Length It would be more appropriate to describe annotation words with Bernoulli distributions than multinomial distributions each word is annotated at most once for an image

Determining Annotation Length is no longer a constant a word is used for annotation if and only if the corresponding probability

Active Learning for Automatic Image Annotation Active learning selectively sample examples for labeling so that the uncertainty in determining the right model is reduced most significantly choosing examples that are most informative to a statistical model For each un-annotated image, they apply the CLMFL (coherent language model with flexible length) model to determine its annotation words and compute its averaged word probability The un-annotated image with the least averaged word probability is chosen for users to annotate.

Active Learning for Automatic Image Annotation Select the images that not only are poorly annotated by the current model but also are similar to test images choose the images that are most similar to the test images from the set of images that the current annotation model cannot produce any annotations

Experiments Data (Duygulu, et al., 2002) 5,000 images from 50 Corel Stock Photo CDs Normalized cut largest 10 regions are kept for each image use K-means algorithm to cluster all image regions into 500 different blobs Each image is annotated with 1 to 5 words, totally 371 distinct words 4500 images are used as training examples and the rest 500 images are used for testing

Experiments The quality of automatic image annotation is measured by the performance of retrieving auto-annotated images regarding to single- word queries Precision Recall There are totally 263 distinct words in the annotations of test images focus on 140 words that appear at least 20 times in the training dataset

Coherent Language Model vs. Relevance Language Model Coherent language model is better than relevance language model

Coherent Language Model vs. Relevance Language Model Word-to-word correlation has little impact on the very top-ranked words that have been determined by the image features with high confidence It is much more influential to the words that are not ranked at the very top For those words, the word-to-word correlation is used to promote the words that are more consistent with the very top-ranked words

Generating Annotations with Automatically Determined Length The average length for the generated annotations is about 3 words for each annotation The CLMFL model performs significantly better than the CLM models when the fixed length is 3 Two-word queries 100 most frequent combinations of two words from the annotations of test images and use them as two-word queries

Generating Annotations with Automatically Determined Length CLMFL model the generated annotations are able to reflect the content of images more accurately than the CLM model that uses a fixed annotation length

Generating Annotations with Automatically Determined Length In fifth image, ‘water’ does not appear in the annotation that is generated by the CLMFL

Active Learning for Automatic Image Annotation First, 1000 annotated images are randomly selected from the training set and used as the initial training examples Then, the system will iteratively acquire annotations for selected images For each iteration, at most 20 images from the training pool can be selected for manual annotation four iterations at most 80 additional annotated images are acquired At each iteration, generate annotations for the 500 testing images

Active Learning for Automatic Image Annotation Baseline model randomly selects 20 images for each iteration

Active Learning for Automatic Image Annotation The active learning method provides more chance for the annotation model to learn new objects with new words

Conclusion Coherent language model takes an advantage of word-to-word correlation Coherent language model with flexible length automatically determine the annotation length Active learning method (based on the CLM model) effectively reduce the required number of annotated images

Conclusion Future work Learning the threshold values from training examples have thresholds that depend the properties of annotation words Using different measurements of uncertainty for active learning method