© CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

Slides:



Advertisements
Similar presentations
CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Quantum One: Lecture 9. Graham Schmidt Orthogonalization.
Quantum One: Lecture 16.
Probabilistic Information Retrieval Chris Manning, Pandu Nayak and
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
1 Discussion Class 3 Inverse Document Frequency. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for.
Paper Summary of: Modelling Retrieval and Navigation in Context by: Massimo Melucci Ahmed A. AlNazer May 2008 ICS-542: Multimedia Computing – 072.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Dirac Notation and Spectral decomposition
Chapter 5: Information Retrieval and Web Search
Quantum One: Lecture 7. The General Formalism of Quantum Mechanics.
राघव वर्मा Inner Product Spaces Physical properties of vectors  aka length and angles in case of arrows Lets use the dot product Length of Cosine of the.
Presented by: Erik Cox, Shannon Hintzman, Mike Miller, Jacquie Otto, Adam Serdar, Lacie Zimmerman.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Michael A. Nielsen University of Queensland Quantum Mechanics I: Basic Principles Goal of this and the next lecture: to introduce all the basic elements.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Quantum Mechanics(14/2)Taehwang Son Functions as vectors  In order to deal with in more complex problems, we need to introduce linear algebra. Wave function.
Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Orthogonality and Least Squares
Too Many to Count.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
1 Computing Relevance, Similarity: The Vector Space Model.
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Clustering C.Watters CS6403.
Vector Space Models.
AGC DSP AGC DSP Professor A G Constantinides©1 Signal Spaces The purpose of this part of the course is to introduce the basic concepts behind generalised.
1 Information Retrieval LECTURE 1 : Introduction.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Mathematical Tools of Quantum Mechanics
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Searching Full Text 3.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Chapter 3 Postulates of Quantum Mechanics. Questions QM answers 1) How is the state of a system described mathematically? (In CM – via generalized coordinates.
Automated Information Retrieval
Plan for Today’s Lecture(s)
Postulates of Quantum Mechanics
Chapter 3 Formalism.
Information Retrieval and Web Search
Multimedia Information Retrieval
Quantum One.
Lecture on Linear Algebra
Information Retrieval and Web Search
Quantum One.
Orthogonality and Least Squares
CS 430: Information Discovery
Boolean and Vector Space Retrieval Models
CS 430: Information Discovery
Linear Vector Space and Matrix Mechanics
量子力學導論 Chap 1 - The Wave Function
Information Retrieval and Web Design
Orthogonality and Least Squares
CS 430: Information Discovery
Presentation transcript:

© CvR1 The Geometry of IR Keith van Rijsbergen Tampere 15 th August, 2002 (lost in Hilbert space!)

© CvR2 Unscripted comments I States Observables Measurement => Reality? Projection Postulates Cognitive State Changes

© CvR3 Unscripted comments II (quoting John von Neumann) However, all quantum mechanical probabilities are defined by inner products of vectors. Essentially if a state of a system is given by one vector, the transition probability in another state is the inner product of the two which is the square of the angle between them. In other words, probability correspond precisely to intro- ducing the angles geometrically. Furthermore, there is only one way to introduce it. The more so because in the quantum mechanical machinery the negation of a statement, so the negation of a statement which represented by a linear set of vectors, correponds to the orthogonal complement of this linear space. Unsolved problems in mathematics, typescript, September, 1954

© CvR4 What is this talk about? Not about quantum computation. see Nielsen and Chuang, CUP, 2000 Not about Logic see Engesser and Gabbay, AI, 2002 History (von Neumann, Dirac, Schroedinger) Motivation (complementarity) Duality (Syntax/Semantics) Measurement (Incompatibility) Projections (subspaces) Probability (inner products) IR application (feedback, clusters, ostension)

© CvR5

6

7 Images not Text: how might that make a difference? no visual keywords (yet) - tf/idf issue aboutness revisable (eg Maron) relevance revisable (eg Goffman) feedback requires salience aboutness -> relevance -> aboutness

© CvR8 This is not new! Goffman, 1969:..that the relevance of the information from one document depends upon what is already known about the subject, and in turn affects the relevance of other documents subsequently examined. Maron, : Just because a document is about the subject sought by a patron, that fact does not imply that he would judge it relevant.

© CvR9 Marons theory of indexing …..in the case where the query consists of single term, call it B, the probability that a given document will be judged relevant by a patron submitting B is simply the ratio of the number of patrons who submit B as their query and judge that document as relevant, to the number of patrons, who submit B as their search query

© CvR10 In 1949 D.M Mackay wrote a paper Quantal aspects of scientific information, SER, vol 41, no.314, in which he alluded to using the quantum mechanics paradigm to IR

© CvR11 Expectation Catalogue It ( -function) is now the means for predicting probability of measurement results. In it is embodied the momentarily-attained sum of theoretically based future expectation, somewhat as laid down in a catalogue. It is the relation-and-determinacy-bridge between measurements and measurements It is, in principle, determined by a finite number of suitably chosen measurement on the object.....Thus the catalogue of expectations is initially compiled. Schrödinger, 1935 &1980

© CvR12 Hypotheses Cluster Hypothesis: closely associated documents tend to be relevant to the same requests. (1971) [co-ordination is positively correlated with external relevance, Jackson, 1969] Association Hypothesis: If an index term is good at discriminating relevant from non-relevant documents then any closely associated index term is also likely to be good at this. (1979) [co-occurrence of terms within documents is a suitable measure of similarity between terms, Jackson,1971]

© CvR13 Navigation - Browsing T-space D-space

© CvR14 DUALITY Direct file/Inverted file Statespace/Space of Projections d = (x,y,z,u,v,w)d =(u,v,w,k,l,m) [[u]] = {d,d}; [[x]] = {d}; [[m]] = {d} Boolean Logic: [[u x]] = {d}; [[x m]] ={d,d} Quantum Logic: [[u x]] = same; [[x m]] = different

© CvR15 The mathematics you need Hilbert space (complex!!!) inner product norms ||x|| 2 = operator (linear) HermitianA*=A tracetr(A) = a ii eigenvaluesAx = x

© CvR16 Crash course on Dirac notation |x> : vector (called ket) *: functional (bra) = (row vector)(column vector)= x i *y i |x><y| : linear operator |x><x| : a projector onto ray x tr(|x> I = |i><i| : universal projector

© CvR17 Hierarchy of Projectors P 0 = P n = I P 1 = |1><1| P 2 = |1> <2|. P n = |1> <n|

© CvR18 Summary Relevance/Aboutness Documents Queries Observables Operators State function Operators can be applied to state function; and operators can be decomposed into projectors. A = a i P i

© CvR19 That is the relevance or irrelevance of a given retrieved document may affect the users current state of knowledge resulting in a change of the users information need, which may lead to a change of the users perception/ interpretation of the subsequent retrieved documents…. Borlund, 2000

© CvR20 T T T R R Y N Y N N Y Relevance/Aboutnes is Interaction/User dependent

© CvR21 probability as inner product |t> <t| = |t><t| = | | 2 |t><t| = cos 2 |t><t| (in real Hilbert space)

© CvR22 |r=1> |t=1> |t=0> |r=0> x

© CvR23 An operator T is of trace-class provided that T is positive ( 0, x) and trace of T is finite ( ) T is a density operator if T is trace-class and tr(T) = 1 T = a i P i is a density operator if 0 a i and a i = 1

© CvR24 Theorem Let be any measure on the closed subspaces of a separable (real or complex) Hilbert space H of dimension at least 3. There exists a positive self-adjoint operator T of trace class such that, for all closed subspace L of H, (L) = Tr(TP L ) If is to be a probability measure, thus requiring that (H) = 1, then Tr(T) =1, that is, T is a density operator.

© CvR25 Conditional Probability P(L A |L B ) = tr(P B DP B P A ) / tr(DP B ) Note that P A could be E -> F

© CvR26 What is T? – without blinding you with science -Relevance Feedback ( a mixture with log weights) -Pseudo relevance feedback (a mixture with similarity weights) -Clustering (superposition of members?) -Ostension (a history)

© CvR27 Conclusions? Is it worth it? Does it matter? - images - logic/probability/information/vectors - language

© CvR28 Useful References Readings in Information Retrieval,Morgan Kaufman, Edited by Sparck Jones and Willett Advances in Information Retrieval: Recent Research from CIIR, Edited by Bruce Croft. Information Retrieval: Uncertainty and Logics,Advanced Models for the Representation and Retrieval of Information, Edited by Crestani, Lalmas, Van Rijsbergen. Finding out about, Richard Belew.