Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.

Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung

Introduction Due to the increased importance of the Internet, the use of image search engines is becoming increasingly widespread. However, it is difficult for users to make a decision as to which image search engine should be selected. The more effective the system is, the more it will offer satisfaction to the user. Retrieval effectiveness becomes one of the most important parameters to measure the performance of image retrieval systems.

Measures: Precision Recall Significant Challenge: the total number of relevant images is not directly observable in such a potentially infinite database

Objective To Investigate the probabilistic behavior of the distribution of relevant images among the returned results for the image search engines: a) Independent Distribution b) Markov Chain Distribution From such models, we shall determine algorithms for the meaningful estimation of recall.

Independent Model Let p k denote the probability that the cumulative relevance of all the images in page k. In general, it is normally true that, for search engines, the first pages provide a larger probability, so that p 1  p 2    p k  p k+1   Since the relevant outcomes of different ranked images are not mutually exclusive events and that the search results do not feasibly terminate, we have in general and that, as

Independent Model Record the number of relevant images per page as some stochastic process X i1,X i2, …X ik, where i=1,2, …69 k=1,2… Investigate the quadratic formula: P k =  1 k 2 +  2 k + , where k=1, 2, 3… Determine the parameters using the least square method Calculate the percentage that the cumulative relevance of all the images in page k, Obtain a mean number of relevant images for each page    69 1,...2,1, i ikk kXX

Markov Chain Model Since in internet image search, results are returned in units of pages, we shall focus on the integer-valued stochastic process X 1, X 2,…, where X J represents the aggregate relevance of all the images in page J, the sequence X={X 1, X 2,…} will be modeled as Markov Chain. Take the conditional probability of the number of relevant images in X J given the number of relevant images in X J-1 to be the transition probability: p (J-1),J ={ X J =x J |X J-1 =x J-1 }.

Markov Chain Model From this, we construct the transition probability matrix. where n is the number of images contained in a page.

Markov Chain Model Calculate the initial probabilities. The probabilities are placed in a vector of state probabilities:  (J)= vector of state probabilities for page J = (  0,  1,  2,  3, …,  n ) Where  k is the probability of having k relevant images Therefore, from this model, we can estimate the number of relevant images by pages by using the formula:  (J) =  (J-1)*P, J=1, 2, 3, …, n

Experiment Image search engine selection: Google, Yahoo, Msn Queries Selection: the queries consist of one-word, two-word and more than three-word queries, which range from simple words like apple to specific query like apple computers and finally progressing to more specific query like eagle catching fish Record the stochastic sequence X={X 1, X 2,…} for each query Apply the models: Independent Model and Markov Chain Model Test the returned results using the query: volcano, tibetan girl, desert camel shadow

Independent Model and Testing Results for Google Figure 1. Independent Model for Google Figure 2. Testing Results and Independent Distribution Model for Google

Independent Model and Testing Results for Yahoo Figure 3. Independent Model for Yahoo Figure 4. Testing Results and Independent Distribution Model for Yahoo

Independent Model and Testing Results for Msn Figure 5. Independent Model for MsnFigure 6. Testing Results and Independent Distribution Model for Msn

Markov Chain Model and Testing Results for Google Figure 7. Search Result of Testing Queries and Markov Chain Model for Google

Markov Chain Model and Testing Results for Yahoo Figure 8. Search Result of Testing Queries and Markov Chain Model for Yahoo

Markov Chain Model and Testing Results for Msn Figure 9. Search Result of Testing Queries and Markov Chain Model for Msn

Measure of Accuracy mean absolute deviationMAD One measure of accuracy is the mean absolute deviation (MAD) ISE MAD Model GoogleYahooMsn One- word Two- word Three- word One- word Two- word Three- word One- word Two- word Three- word INDP Model 1.22.41.12.94.62.52.72.611.8 MC Model 10.42.32.902.11.41.715.8

Conclusion In terms of MAD, we conclude that the Markov Chain Model can estimate the number of relevant images for the ISE better than Independent Model does. Except for three word query for Msn, such models could estimate the total number of image search engines quite well

Future Work Optimal stopping rules for the different models will be established Time series modeling and exponential Smoothing. Because the previous models indicates that the situation may be modeled as a time series with the page number representing the time.

Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.

Similar presentations

Presentation on theme: "Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.

Similar presentations

Presentation on theme: "Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung."— Presentation transcript:

Similar presentations

About project

Feedback