Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst.

Slides:

Advertisements

Similar presentations

LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.

Advertisements

Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)

Information Retrieval Models: Probabilistic Models

ICASSP, May Arjen P. de Vries Thijs Westerveld Tzvetanka I. Ianeva Combining Multiple Representations on the TRECVID Search Task.

Assuming normally distributed data! Naïve Bayes Classifier.

Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.

Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.

Video retrieval using inference network A.Graves, M. Lalmas In Sig IR 02.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.

INFO 624 Week 3 Retrieval System Evaluation

 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.

WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.

Presented by Zeehasham Rasheed

Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.

Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.

Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.

ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.

Overview and Mathematics Bjoern Griesbach

A Search Engine for Historical Manuscript Images Toni M. Rath, R. Manmatha and Victor Lavrenko Center for Intelligent Information Retrieval University.

Image Annotation and Feature Extraction

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.

Modeling term relevancies in information retrieval using Graph Laplacian Kernels Shuguang Wang Joint work with Saeed Amizadeh and Milos Hauskrecht.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

INDRI - Overview Don Metzler Center for Intelligent Information Retrieval University of Massachusetts, Amherst.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Chapter 23: Probabilistic Language Models April 13, 2004.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

C.Watterscsci64031 Probabilistic Retrieval Model.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.

Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.

11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.

Reading Notes Wang Ning Lab of Database and Information Systems

Information Retrieval Models: Probabilistic Models

15-826: Multimedia Databases and Data Mining

Murat Açar - Zeynep Çipiloğlu Yıldız

John Lafferty, Chengxiang Zhai School of Computer Science

Multimedia Information Retrieval

Michal Rosen-Zvi University of California, Irvine

INF 141: Information Retrieval

Information Retrieval and Web Design

Presentation transcript:

Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst

Motivation Most image retrieval systems assume:  Implicit “AND” between query terms  Equal weight to all query terms  Query made up of single representation (keywords or image) “tiger grass” => “find images of tigers AND grass where each is equally important” How can we search with queries made up of both keywords and images? How do we perform the following queries?  “swimmers OR jets”  “tiger AND grass, with more emphasis on tigers than grass”  “find me images of birds that are similar to this image”

Related Work Inference networks Semantic image retrieval Kernel methods

Inference Networks Inference Network Framework [Turtle and Croft ‘89]  Formal information retrieval framework  INQUERY search engine  Allows structured queries phrases, term weighting, synonyms, etc… #wsum( 2.0 #phrase ( image retrieval ) 1.0 model )  Handles multiple document representations (full text, abstracts, etc…) MIRROR [deVries ‘98]  General multimedia retrieval framework based on inference network framework  Probabilities based on clustering of metadata + feature vectors

Image Retrieval / Annotation Co-occurrence model [Mori, et al] Translation model [Duygulu, et al] Correspondence LDA [Blei and Jordan] Relevance model-based approaches  Cross-Media Relevance Models (CMRM) [Jeon, et al]  Continuous Relevance Models (CRM) [Lavrenko, et al]

Goals Input  Set of annotated training images  User’s information need Terms Images “Soft” Boolean operators (AND, OR, NOT) Weights  Set of test images with no annotations Output  Ranked list of test images relevant to user’s information need

Data Corel data set †  4500 training images (annotated)  500 test images  374 word vocabulary Each image automatically segmented using normalized cuts  Each image represented as set of representation vectors  36 geometric, color, and texture features  Same features used in similar past work † Available at:

Features Geometric (6)  area  position (2)  boundary/area  convexity  moment of inertia Color (18)  avg. RGB x 2 (6)  std. dev. of RGB (3)  avg. L*a*b x 2 (6)  std. dev. of L*a*b (3) Texture (12)  mean oriented energy, 30 deg. increments (12)

Image representation cat, grass, tiger, water annotation vector (binary, same for each segment) representation vector (real, 1 per image segment)

Image Inference Network J – representation vectors for image, (continuous, observed) q w – word w appears in annotation, (binary, hidden) q r – representation vector r describes image, (binary, hidden) q op – query operator satisfied (binary, hidden) I – user’s information need is satisfied, (binary, hidden) I J q r1 q rk … q op1 q op2 q w1 q wk … “Image Network” “Query Network” fixed (based on image) dynamic (based on query)

Example Instantiation #or #and tigergrass

What needs to be estimated? P(q w | J) P(q r | J) P(q op | J) P(I | J) I J q r1 q rk … q op1 q op2 q w1 q wk …

P(q w | J) [ P( tiger | ) ] Probability term w appears in annotation given image J Apply Bayes’ Rule and use non-parametric density estimation Assumes representation vectors are conditionally independent given term w annotates the image ???

How can we compute P(r i | q w )? training set representation vectors representation vectors associated with image annotated by w area of high likelihood area of low likelihood

P(q w | J) [final form] Σ assumed to be diagonal, estimated from training data

Regularized estimates… P(q w | J) are good, but not comparable across images termP(q w | J) cat0.45 grass0.35 tiger0.15 water0.05 termP(q w | J) cat0.90 grass0.05 tiger0.01 water0.03 Is the 2 nd image really 2x more “cat-like”? Probabilities are relative per image

Regularized estimates… Impact Transformations  Used in information retrieval  “Rank is more important than value” [Anh and Moffat] Idea:  rank each term according to P(q w | J)  give higher probabilities to higher ranked terms  P(q w | J) ≈ 1/rank qw Zipfian assumption on relevant words  a few words are very relevant  a medium number of words are somewhat relevant  many words are not relevant

Regularized estimates… termP(q w | J)1/rank cat grass tiger water termP(q w | J)1/rank cat grass tiger water

What needs to be estimated? P(q w | J) P(q r | J) P(q op | J) P(I | J) I J q r1 q rk … q op1 q op2 q w1 q wk …

P(q r | J) [ P( | ) ] Probability representation vector observed given J Use non-parametric density estimation again Impose density over J’s representation vectors just as we did in the previous case Estimates may be poor  Based on small sample (~ 10 representation vectors)  Naïve and simple, yet somewhat effective

Model Comparison Relevance modeling-based  CMRM, CRM  General form: Fully non-parametric  Model used here  General form:

What needs to be estimated? P(q w | J) P(q r | J) P(q op | J) P(I | J) I J q r1 q rk … q op1 q op2 q w1 q wk …

Query Operators “Soft” Boolean operators  #and / #wand (weighted and)  #or  #not One node added to query network for each operator present in query Many others possible  #max, #sum, #wsum  #syn, #odn, #uwn, #phrase, etc…

#or( #and ( tiger grass ) ) #or #and tigergrass

Operator Nodes Combine probabilities from term and image nodes Closed forms derived from corresponding link matrices Allows efficient inference within network Par(q) = Set of q’s parent nodes

… but where do they come from? AB Q P(Q=true|a,b)AB 0false 0 true 0 false 1true

Results - Annotation ModelTranslationCMRMCRMInfNet # words with recall >= Results on full vocabulary Mean per-word recall Mean per-word precision F-measure

foals (0.46) mare (0.33) horses (0.20) field (1.9E-5) grass (4.9E-6) railroad (0.67) train (0.27) smoke (0.04) locomotive (0.01) ruins (1.7E-5) sphinx (0.99) polar (5.0E-3) stone (1.0E-3) bear (9.7E-4) sculpture (6.0E-4)

Results - Retrieval 5 retrieved images 1 word2 word3 word CMRM CRM InfNet InfNet-reg Mean Average Precision 1 word2 word3 word CMRM CRM InfNet InfNet-reg

Future Work Use rectangular segmentation and improved features Different probability estimates  Better methods for estimating P(q r | J)  Use CRM to estimate P(q w | J) Apply to documents with both text and images Develop a method/testbed for evaluating for more “interesting” queries

Conclusions General, robust model based on inference network framework Departure from implied “AND” between query terms Unique non-parametric method for estimating network probabilities Pros  Retrieval (inference) is fast  Makes no assumptions about distribution of data Cons  Estimation of term probabilities is slow  Requires sufficient data to get a good estimate