Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science Carnegie Mellon University International Conference on Web Intelligence(WIC ’ 03) Presented by Chu Huei-Ming 2005/02/24

2 Reference Relevance Models in Information Retrieval –Victor Lavrenko and W.Bruce Croft –Center for Intelligent Information Retrieval Department of Computer Science University of Massachusetts (UMass) –11-56 2003 Kluwer Academic Publishers. Printed in the Netherlands

3 Outline Introduction Relevance Model Web image retrieval re-ranking Estimating a Relevance Model –Estimation form a set of examples –Estimation without examples Ranking Criterion Experiment Conclusion

4 Introduction (1/2) Most current large-scale web image search engines exploit text and link structure to “ understand ” the content of the web images. This paper propose a re-ranking method to improve web image retrieval by reordering the images retrieved from an image search engine. The re-ranking process is based on a relevance model.

5 Introduction (2/2) The web image retrieval with relevance model re-ranking

6 Relevance Model (1/2) Mathematical Formalism – is a vocabulary in some language –C is a large collection of documents –Define the relevant class R to be the subset of document in C which are relevant to some particular information needs –Define the relevance model to be the probability distribution –For every word, the relevance model gives the probability that observe w in the randomly selected some document D form the relevant class R and then picked a random word from D

7 Relevance Model (2/2) The important issue in IR is capturing the topic discussed in a sample of text, and to that end unigram models fare quite well. The choice of estimation techniques has a particularly strong influence on the quality of relevance models

8 Web image retrieval re-ranking (1/2) For each image I in the rank list returned from a web image search engine, there is one associated HTML document D Can we estimate the probability that the image is relevant given text of document D, i.e. Pr(R|D) ? By Bayes ’ Theorem –Pr(D) is equal for all documents and assume every document is equally possible –Pr(D|R) is needed to estimate if we want to know he relevance of the document

9 Web image retrieval re-ranking (2/2) Suppose the document D is consisted of words Apply common word independence assumption Pr(w|R) can be estimated without training data

10 Estimating a Relevance Model Estimation form a set of examples –There has full information about the set R of relevant documents Estimation without examples –There has no examples from which we could estimate directly

11 Estimation form a set of examples (1/2) There has perfect knowledge of the entire relevant class R The probability distribution P(w|R) is a randomly picked word from a random document will be the word w Let denote the probability of randomly picking document D from the relevant set R. Assume each relevant document is equally likely to be picked at random |R| is the total number of document in R

12 Estimation form a set of examples (2/2) The probability of randomly picking a document D and then observing the word w is Assumed that the document model of D completely determines word probabilities, when fix D, the probability of observing w is independent of the relevant class R and only depends on D The smoothing is achieved by interpolating the maximum-likelihood probability from (5) with some background distribution P(w) over the vocabulary.

13 Estimation without examples (1/6) In the ad-hoc information retrieval, there has only a short 2-3 word query, indicative of the user ’ s information need and no examples of relevant documents

14 Estimation without examples (2/6) Assume that for every information need there exists an underlying relevance model R Assigns the probabilities to the word occurrence in the relevant documents Given a large collection of documents and a user query

15 Estimation without examples (3/6) Method 1: i.i.d(random) sampling Assume that the query words and the word w in relevant documents are sampled identically Pick a distribution with probability p(D) and sample from it k+1 times. Then the total probability of observing w together with is Assumed w and all q i are sampled independently and identically to each other Final estimation

16 Estimation without examples (4/6) Method 2: conditional sampling Using chain rule and make the assumption that query words are independent given word w To estimate the conditional probabilities we compute the expectation over the universe C of our unigram models

17 Estimation without examples (5/6) Additional assumption that q i is independent of w once we picked a disbution D i Then the final estimation for the joint probability of w and query is

18 Estimation without examples (6/6) The word prior probability is The probability of picking a distribution D i based on w is P(D i ) is kept uniform over all the documents in C

19 Comparison of Estimation Method1 and 2 Probability Ranking Principle Document ranked the decreasing probability ratio Cross-Entropy

20 Ranking Criterion (1/2) Ranking by (2) will favor short document Use Kullback-Leibler(KL) divergence to avoid the short document bias is the unigram model from the document associated with rank i image in the list is the aforementioned relevance model

21 Ranking Criterion (2/2)

22 Experiment (1/4) Test the idea of re-ranking on six text queries to a large-scale web image search engine, Google Image Search. From July 2001 to March 2003 there are 425 million images indexed by it Six queries are chosen from image categories in Core Image Database Each text query is typed into Google Image Search and top 200 entries are saved for evaluation The 1200 images for six queries are fetched, they are manually labeled into three categories: relevant, ambiguous, irrelevant

23 Experiment (2/4)

24 Experiment (3/4) For each query, send the same keywords to Google Web Search and obtain a list of relevant documents via Google Web APIs Top-ranked 200 web documents are removed all the HTML tag and filter out the words appearing in the INQUERY stop-word list and stem words using Porter algorithm The smoothing parameter is 0.6

25 Experiments (4/4) The average precision at DCP over six queries

26 Conclusion The average precision at the top 50 documents with the precision improvement from the original 30-35% to 45 % The Internet users are usually with limit time and patience, high precision at top-ranked documents will save user a lot of efforts and help them find relevant images more easily and quickly

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Similar presentations

Presentation on theme: "Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Similar presentations

Presentation on theme: "Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback