Reading Notes Wang Ning Lab of Database and Information Systems

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

Information retrieval – LSI, pLSI and LDA

Unsupervised Learning

Biointelligence Laboratory, Seoul National University

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Visual Recognition Tutorial

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Lecture 5: Learning models using EM

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Thanks to Nir Friedman, HU

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

A Search Engine for Historical Manuscript Images Toni M. Rath, R. Manmatha and Victor Lavrenko Center for Intelligent Information Retrieval University.

Review of Lecture Two Linear Regression Normal Equation

Crash Course on Machine Learning

1 Probabilistic Language-Model Based Document Retrieval.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

KNN & Naïve Bayes Hongning Wang

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Data Modeling Patrice Koehl Department of Biological Sciences

Probability Theory and Parameter Estimation I

Ch3: Model Building through Regression

Lecture 15: Text Classification & Naive Bayes

When the subjects of metadata embrace the statistical learning

Data Mining Lecture 11.

Latent Variables, Mixture Models and EM

When the subjects of metadata embraces the statistical learning

Relevance Feedback Hongning Wang

Hidden Markov Models Part 2: Algorithms

Probabilistic Models with Latent Variables

Murat Açar - Zeynep Çipiloğlu Yıldız

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Bayesian Learning Chapter

Michal Rosen-Zvi University of California, Irvine

Language Model Approach to IR

LECTURE 07: BAYESIAN ESTIMATION

Parametric Methods Berlin Chen, 2005 References:

Biointelligence Laboratory, Seoul National University

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Learning to Rank with Ties

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Presentation transcript:

Modeling Score Distributions for Combining the Outputs of Search Engines Reading Notes Wang Ning (wangning@db.pku.edu.cn) Lab of Database and Information Systems Dec 3rd, 2003

Revision History Nov. 30th, 2003: Draft Dec. 1st, 2003: Add all pictures Dec. 2nd, 2003: Add references

Literature Information Title Modeling Score Distributions for Combining the Outputs of Search Engines Author R. Manmatha(manmatha@cs.umass.edu) T. Rath(trath@cs.umass.edu) F. Feng(feng@cs.umass.edu) Institution Center for Intelligent Information Retrieval University of Massachusetts Conference Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Basic Idea Meta Search: Difficulties Previous Work The Authors’ Idea Combining results from search engines Difficulties No architecture and algorithm information No score information Previous Work Linear combination of document ranks COMMIN, COMMAX, COMSUM, COMMNZ The Authors’ Idea Model the score distributions

Test Data TREC: Text REtrieval Conference Search Engines TREC 3, TREC 4 TREC 6 for Chinese Documents Search Engines INQUERY (Probabilistic Model) CITY (Probabilistic Model) SMART (Vector Space Model) Bellcore (LSI Engine)

Model Assumptions The sets of non-relevant documents can be modeled with exponential distribution The sets of relevant documents can be modeled with Gaussian distribution Explanations and argumentations comes later

Non-relevant Documents: Exponential Distribution

Relevant Documents: Gaussian Distribution

Likelihood Function

MLE: Maximum Likelihood Estimate

Basic Idea of MLE God always let the event with the biggest probability happen firstly -- The MLE of Θ is to make the sample occur the most likely.

Limitations of Gaussian Fit Well: sufficient relevant documents (>=60) Bad: fewer relevant documents (usually) Why? Model Fault Lack of samples (the authors’ point) Solutions Maybe Bayesian analysis works here

Mixture Model Fit

Mixture Model Fit (cont.)

EM: Expectation Maximization Important parameter estimation method

EM Steps

Mixture Model Fit: INQUERY

Mixture Model Fit: SMART

Posterior Probabilities

Posterior Probabilities: SMART

Limitations of Posterior Probabilities

Problem I: Mixture Model Model Selection: Exponential and Gaussian? Fit the data well Can be recovered with EM algorithm EM Algorithm: Limitations and Solutions Local maxima Solutions: Arbitrary initial condition Fit the exponential distribution first, and remove those documents that do not fit well to fit the Gaussian

Problem II: Shapes of Distributions

Shapes of Poisson's

Applications Combining Outputs of Search Engines Using posterior probabilities Automatic Engine Selection Distinction: larger distance between mean and intersect point of two distributions Relevance: higher maximum of posterior probabilities

Comparative Study: Combining

Comparative Study: Selecting

What Can I Learn from this Paper? Scientific Methodology Clear and simple models Theoretical reasoning & experimental support Natural and simple mathematical methods Standard test data and comparative study

Alternative Method Bayes Optimal Metasearch: A Probabilistic Model for Combining the Results of Multiple Retrieval Systems J. A. Aslam & M. Montague Dartmouth College SIGIR’01

Probabilistic Model

Comparisons manmatha01modeling aslam01Bayes Pros Cons Clear and simple models Cons Strong model assumptions Some inherent limitations of EM algorithm aslam01Bayes Training prior probabilities Naive Bayes independent assumptions

My Thoughts Training of prior probabilities to obtain more accurate outputs models The small sample space limits the use of traditional statistics. Maybe we can use Bayes analysis to avoid it.

References R. Manmatha and T. Rath and Fangfang Feng. Modeling Score Distributions for Combining the Outputs of Search Engines. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001 267-275. J. A. Aslam and M. Montague. Bayes optimal metasearch: A probabilistic model for combining the results of multiple retrieval systems. In the Proc. of the 23rd ACM SIGIR conf. on Research and Developement in Information Retrieval, pages 379--381, 2000. Jiangsheng, Yu. Expectation Maximization: An Approach to Parameter Estimation. Lecture of Machine Learning Seminar, 2003