Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Lecture Discrete Probability. 5.3 Bayes’ Theorem We have seen that the following holds: We can write one conditional probability in terms of the.
Lecture 4A: Probability Theory Review Advanced Artificial Intelligence.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
SI485i : NLP Day 2 Probability Review. Introduction to Probability Experiment (trial) Repeatable procedure with well-defined possible outcomes Outcome.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Binomial Distribution & Bayes’ Theorem. Questions What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Lec 18 Nov 12 Probability – definitions and simulation.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Discrete random variables Probability mass function Distribution function (Secs )
Introduction to Probability Theory Rong Jin. Outline  Basic concepts in probability theory  Bayes’ rule  Random variable and distributions.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Bayesian learning finalized (with high probability)
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Special discrete distributions Sec
Language Modeling Approaches for Information Retrieval Rong Jin.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
Recitation 1 Probability Review
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
PBG 650 Advanced Plant Breeding
Statistics 303 Chapter 4 and 1.3 Probability. The probability of an outcome is the proportion of times the outcome would occur if we repeated the procedure.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Lecture 5a: Bayes’ Rule Class web site: DEA in Bioinformatics: Statistics Module Box 1Box 2Box 3.
Chapter 5: Probability Analysis of Randomized Algorithms Size is rarely the only property of input that affects run time Worst-case analysis most common.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Probability (outcome k) = Relative Frequency of k
Chapter 2. Conditional Probability Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
2003/02/19 Chapter 2 1頁1頁 Chapter 2 : Basic Probability Theory Set Theory Axioms of Probability Conditional Probability Sequential Random Experiments Outlines.
Probability Theory, Bayes’ Rule & Random Variable Lecture 6.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Conditional Probability 423/what-is-your-favorite-data-analysis-cartoon 1.
4. Overview of Probability Network Performance and Quality of Service.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Essential Probability & Statistics
Introduction to Probability Theory
Introduction to Probability Theory
Bayes Net Learning: Bayesian Approaches
Econ 113 Lecture Module 2.
Binomial Distribution & Bayes’ Theorem
Chapter 4 – Part 3.
Chapter 4 – Part 2.
CSCI 5832 Natural Language Processing
Advanced Artificial Intelligence
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
I flip a coin two times. What is the sample space?
CS590I: Information Retrieval
Experiments, Outcomes, Events and Random Variables: A Revisit
Mathematical Foundations of BME Reza Shadmehr
A random experiment gives rise to possible outcomes, but any particular outcome is uncertain – “random”. For example, tossing a coin… we know H or T will.
Presentation transcript:

Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks to the user's taste automates the query refinement process Rocchio's method Folding-in user feedback To query vector Add a weighted sum of vectors for relevant documents D+ Subtract a weighted sum of the irrelevant documents D-.

Relevance Feedback (contd.) Pseudo-relevance feedback D+ and D- generated automatically E.g.: Cornell SMART system top 10 documents reported by the first round of query execution are included in D+ typically set to 0; D- not used Not a commonly available feature Web users want instant gratification System complexity Executing the second round query slower and expensive for major search engines

Basic IR Process How to represent text objects What similarity function should be used? How to refine query according to users’ feedbacks?

A Brief Review of Probability Probability space Random variable (discrete, continuous and mixed) Probability distributions (binomial, multinomial, Gaussian) Expectation, Variance Probability independent, conditional independent, conditional probability, Bayesian theorem …

Definition of Probability Experiment: toss a coin twice Sample space: possible outcomes of an experiment S = {HH, HT, TH, TT} Event: a subset of possible outcomes A={HH}, B={HT, TH} Probability of an event : an number assigned to an event Pr(A) Axiom 1: Pr(A)  0 Axiom 2: Pr(S) = 1 Axiom 3: For every sequence of disjoint events Example: Pr(A) = n(A)/N: frequentist statistics

Independence Two events A and B are independent in case Pr(A,B) = Pr(A)Pr(B) Pr(A,B): probability for both A and B Independence  Disjoint Pr(A,B) = 0 when A and B are disjoint !

If A and B are events with Pr(A) > 0, the conditional probability of B given A is Example: Conditional Probability Pr(Drug1 = Succ|Women) = ? Pr(Drug2 = Succ|Women) = ? Pr(Drug1 = Succ|Men) = ? Pr(Drug2 = Succ|Men) = ?

Conditional Independence Event A and B are conditionally independent given C in case Pr(A,B|C)=Pr(A|C)Pr(B|C) Example A:success, B:women, C:drug I Will A and B conditional independent from C?

Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then

Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Derived from the definition of conditional prob.

Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Reversing the definition of conditional prob.

Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Using the property

Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Reversing the definition of conditional prob.

Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Normalization Factor

Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Pr(B i |A) ~ Pr( B i ) * Pr(A|B i )

Bayes ’ Rule: Example q: t, h, t, h, t, t C1: h, h, h, t, h,h  bias b1 = 5/6 C2: t, t, h, t, h, h  bias b2 = 1/2 C3: t, h, t, t, t, h  bias b3 = 1/ p(C1) = p(C2) = p(C3) = 1/3 p(q|C1)  5.2*10 -4, p(q|C2)  0.015, p(q|C3)  p(C1|q) = ?, p(C2|q) = ?, p(C3|q) = ?

Why Bayes ’ Rule is Important C1: bias = 5/6 C2: bias = 1/2C3: bias = 1/3O: t, h, t, h, t, t ?? ? Observations (O) Conclusion (C) Pr(C|O)

Why Bayes ’ Rule is Important C1: bias = 5/6 C2: bias = 1/2C3: bias = 1/3O: t, h, t, h, t, t ?? ? Observations (O) Conclusion (C) Pr(C|O)  Pr(O|C) and Pr(C) It is easy to compute Pr(O|Ci). Bayes’ rule helps us convert the computation of Pr(C|O) to the computation of Pr(O|C) and Pr(C).

Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C)

Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C) Prior First, you have prior knowledge about conclusions, i.e., Pr(C)

Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C) LikelihoodPrior First, you have prior knowledge about conclusions, i.e., Pr(C) Then, based on your observation O, you estimate the likelihood Pr(O|C) for each possible conclusion C

Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C) LikelihoodPrior Posterior First, you have prior knowledge about conclusions, i.e., Pr(C) Then, based on your observation O, you estimate the likelihood Pr(O|C) for each possible conclusion C Finally, your expectation of conclusion C (i.e., Pr(C|O)) will be shaped by the product of prior and likelihood