Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation www.languagecomputer.com.

Slides:

Advertisements

Similar presentations

LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.

Advertisements

Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst.

Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.

Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.

Image Search Presented by: Samantha Mahindrakar Diti Gandhi.

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

Presented by Zeehasham Rasheed

ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model.

Scalable Text Mining with Sparse Generative Models

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.

A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.

Information Retrieval in Practice

DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Image Annotation and Feature Extraction

Semantic Indexing of multimedia content using visual, audio and text cues Written By:.W. H. Adams. Giridharan Iyengar. Ching-Yung Lin. Milind Ramesh Naphade.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

HANOLISTIC: A HIERARCHICAL AUTOMATIC IMAGE ANNOTATION SYSTEM USING HOLISTIC APPROACH Özge Öztimur Karadağ & Fatoş T. Yarman Vural Department of Computer.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Hierarchical Annotation of Medical Images Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loškovska 1, Sašo Džeroski 2 1 Department of Computer Science, Faculty.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Ranking and Classifying Attractiveness of Photos in Folksonomies Jose San Pedro and Stefan Siersdorfer University of Sheffield, L3S Research Center WWW.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.

Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Object Recognition a Machine Translation Learning a Lexicon for a Fixed Image Vocabulary Miriam Miklofsky.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.

Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.

Image Classification for Automatic Annotation

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.

Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Semantic search-based image annotation Petra Budíková, FI MU CEMI meeting, Plzeň,

Ontology-based Automatic Video Annotation Technique in Smart TV Environment Jin-Woo Jeong, Hyun-Ki Hong, and Dong-Ho Lee IEEE Transactions on Consumer.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.

Bag-of-Visual-Words Based Feature Extraction

Semantic Video Classification

Video Google: Text Retrieval Approach to Object Matching in Videos

Matching Words with Pictures

Ying Dai Faculty of software and information science,

Text Categorization Berlin Chen 2003 Reference:

Video Google: Text Retrieval Approach to Object Matching in Videos

Motivation It can effectively mine multi-modal knowledge with structured textural and visual relationships from web automatically. We propose BC-DNN method.

Presentation transcript:

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation Richardson, Texas

Motivation Automatic Image Annotation Problem Ontologies for Defining Visual Vocabularies Hierarchical Models for image annotation Related Work Experiments & Results Conclusion and Future Work Contents

Majority of efforts in Q/A focus on textual corpora and processing Large amounts of information held within multimedia sources – images/audio/video Extend the Power of Q/A into the realm of multimedia Exploit commonality and union of text and multimedia information Motivation: Multimedia Question Answering

Some ways in which multimedia can be used in Q/A Multimedia (video clip/image) as Answer Multimedia and Lexical combination providing enhanced understanding to Answer questions Caption: Ronaldo seals Brazil's place in the last eight with a shot through Geert de Vlieger's legs late on to eliminate Belgium Question: What color jersey did Brazil wear in the World Cup? Multimedia Question Answering

Feature extraction High- and Low-level features Object recognition Auto Annotation of images Object semantics extraction Locative/temporal/etc Build Knowledge Representation from Image/Video Merge with audio/text Knowledge Representation Lexical information from ASR and VOCR Provide Multimedia Q/A based using Multimedia Ontologies Approach Feature extraction High- and Low-level features Object recognition Auto Annotation of images Object semantics extraction Locative/temporal/etc Build Knowledge Representation from Image/Video Merge with audio/text Knowledge Representation Lexical information from ASR and VOCR Provide Multimedia Q/A based using Multimedia Ontologies

Automatic Image Annotation Task of automatically assigning words to an image that describe the contents of the image Most models exploit the correlation between images and words Exploit the correlation between the annotation words themselves to 1. Define visual vocabularies 2. Develop hierarchical models for automatic image annotation Use ontological information about annotation words to improve image annotation

Models for translating visual representation of concept to textual representation (Duygulu et al., 2002) Based on Brown model for Machine Translation (Brown et al., 1993) Image Features translate to Annotation Words K-Means used to cluster image features to generate blobs Dependencies between blobs and words is not explicitly captured Use ontology to drive the definition of blobs Prior Work: Translation Models

Hierarchical Aspect Cluster Model (T. Hofmann, 1998) Induces an hierarchical structure from co-occurrence of image features Topology is externally defined Depth of the induced hierarchy is user selected Levels define the generality of the concept expressed in regions and words The hierarchies defined in ontologies have well-defined semantics Image feature hierarchy induced from a text ontology Prior Work: HACM Model

Estimate P(w|I) to classify an Image I (represented by image features) into one of the classes (annotation word w) Generative Models Flat classification: Learn one classifier per annotation word SVM Classifier (Cusano et al., 2004) Discriminative Models Jeon and Manmatha (2004) showed improvements over translation using Maximum Entropy Models Unigram (blob, word) and Bigram: (horizontal blob pairs, word) feature Explore hierarchical classification using ontology Prior Work: Classification Approaches

Image Representation using Visual Vocabulary Image Segmentation Feature Extraction Image Representation Image Image Segmentation 1. Image regions corresponding to objects in the image 2. Grid-based image segmentation Feature Extraction Extract image features from image regions Color, Shape, Texture Image Representation 1. real-valued feature vectors 2. Visual vocabulary derived based on clustering feature vectors Cluster centers (Blobs) define the vocabulary

Visual vocabulary from Ontologies Image regions from images are organized in the hierarchy based on the image annotation Image attributes of children nodes are related parent node’s image attributes

Using Ontologies in Translation Models for Automatic Image Annotation 1. Ontology-induced visual vocabulary –Annotation word hierarchy used in selecting the initial set of blobs for K-means clustering 2. Ontology-weighed K-means clustering –Weight the cluster membership of image regions in the estimation of cluster centers (blobs) n(w,c) – number of image regions in cluster c associated with word w n(c) – number of image regions in cluster c f(r) – feature vector for region r

Image Annotation by Hierarchical Classification Based on hierarchical approach to text classification (McCallum et al., 1998) –Statistical, back-off model induced by the hierarchy derived from annotation word ontology –Given an image I with blob sequence, the probability of word w is given by –Assuming a Bernoulli model for annotations, the blob likelihood given a word is estimated as V – Visual vocabulary T – Training set of annotated images W – Set of annotation words

Image Annotation using Hierarchical Classification (contd.) The IS-A hierarchy among annotation words is used to estimate blob-likelihood probability tiger cat feline animal … ROOT cougarleopardlionlynx Feature weights learned using EM algorithm

Corel Data Set Annotated images using pre-processed data from (Duygulu, et al., 2002) 4500 images annotated using 374 words 4000 for training; 500 for testing Image Representation Image Segmentation using N-cuts (Duygulu et al., 2002) 36 different image features represent each image region Ontology: WordNet Hierarchy with 714 unique concepts was induced from 374 annotation words Experiments

Annotation systems predict P(w|I) A cut-off or threshold required to assign annotations Unnormalized: take top 5 words Normalized: take top m words, where m is #of annotations for I Metrics Number of words of positive recall Mean per-word Precision-Recall All words in the dictionary Selected set of words Retrieved: words retrieved using the method Common: words predicted by all annotation systems Union: all words predicted by at least one annotation system Image Annotation Evaluation

FeaturesDescriptionPrecisionRecallPredicted Positive Recall KM-500Baseline K-means clustering WKM-500Weighted K-means clustering ONT-714 Using 714 clusters with one cluster per word in the induced ontology ONT-500 Reducing ONT-714 to 500 clusters by combining “close clusters” Results: Translation Models and Ontologies Precision/Recall numbers are average over “pooled” set of 42 words Observations Using ontologies increase the number of words predicted with postive recall Hierarchy based initial clusters attaches better semantics to clusters Results for ontology-induced clusters is based on ‘One blob per concept’

Results: Classification Approaches and Ontologies Comparing Flat classification versus Hierarchical classification for image annotations FeaturesPrecisionRecall# Ret.#Pos. Recall Flat + KMeans Hier + KMeans Precision/Recall numbers correspond to using the KM-500 visual vocabulary Observations Improved Precision (10%) and Recall (14%) values Increase in number of annotations with positive recall Hierarchy derived from annotation ontology results in improved performance

Results: Hierarchical Classification with Ontology-induced Visual Vocabularies Hierarchical approach improves precision/recall values on different visual vocabularies ONT-714 has improved positive recall numbers Ontologies defined on text annotations provide a good framework for developing hierarchical models for image features MeasuresKM-500WKM-500ONT-714ONT-500 Baseline – Flat Classification Method Precision Recall Predicted Positive Recall Hierarchical Classification Method Precision Recall Predicted Positive Recall

Results: Comparing Translation and Classification Approaches MeasuresKM-500WKM-500ONT-714ONT-500 # Common Words Translation Method Precision Recall Flat Classification Method Precision Recall Hierarchical Classification Method Precision Recall Comparison based on common annotation words predicted by different models Significant improvement in recall using classification approaches

Experimental Results: Ontology in translation model 19.5% increase in average precision 13% increase in average recall Ontology in classification 10% increase in average precision 14% increase in average recall Using word hierarchies improve annotation results when used as a source for selecting initial blobs, and as framework for hierarchical classification Ontologies in Automatic Image Annotation

Proposed methods for using ontologies in automatic image annotation Translation Models: Defining Visual vocabulary Hierarchical Classification Models: Provide the hierarchy for models defined image features Explore the use of ontologies in other approaches to automatic image annotation Discriminative models Exploit the dependence between annotation words in automatic image annotation Correlation between annotation words of an image can be exploited Summary and Future Work

Utilize hierarchical organization of concepts and language models on image blobs to develop multi- modal ontologies Use multi-modal ontologies in Q/A Summary and Future Work (Contd.)

Transportation WordNet hierarchy with Multimedia data Multimedia Ontology: Example Node

Thank You.