Presentation is loading. Please wait.

Presentation is loading. Please wait.

Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.

Similar presentations


Presentation on theme: "Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for."— Presentation transcript:

1 Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for practice, e.g. because they increase our understanding, allow more fact-based statements, etc. General advantages of theoretical models: Behavior can be clearly understood and reconstructed, characteristics can be proven, etc. Plug-and-play, i.e. easily build on previous work, strong theoretical background and framework, etc.

2 Models for IR – Taxonomy Classic models: Boolean model ( based on set theory ) Vector space model ( based on algebra ) Probabilistic models ( based on probability theory ) Fuzzy set model Extended Boolean model Generalized vector model Latent semantic indexing Neural networks Inference networks Belief network SOURCE: R. BAEZA-YATES [1], PAGE 20+21 Further models: Structured Models Models for Browsing Filtering

3 Formal Specification of the Task Definition: An information retrieval model is a quadrupel [D, Q, F, R(q i, d j )] where D is a set composed of logical views (or represen- tations) for the documents in the collection Q is a set composed of logical views (or representations) for the user information needs. Such representations are called queries. F is a framework for modeling document representations, queries, and their relationships. R(q i, d j ) is a ranking function which associates a real number with a query q i in Q and a document representation d j in D. Such ranking defines an ordering among the documents with regard to the query q i. SOURCE: R. BAEZA-YATES [1], PAGE 23

4 Formal Specific. of the Task (Cont.) CF. R. BAEZA-YATES [1], PAGE 25 Generally, we represent the query and documents through a set of terms T = {t 1, …, t k } where k is the number of all unique index terms in the system. We assume w i,j to be a weight for term t i in document d j with w i,j = 0 if t i is not in d j. Document d j can be represented as an index term vector d j = (w 1,j, w 2,j, …, w k,j ). g i represents a function for which g i (d j ) = w i,j (i.e. given a document d j, g i delivers the weight of term t i in d j ).

5 Boolean Retrieval Model – Queries Based on set theory and Boolean algebra Queries: Terms combined with AND, OR, NOT Boolean expression in disjunctive normal form Example:

6 References & Recommended Reading [1] R. BAEZA-YATES, B. RIBEIRO-NETO: MODERN INFORMATIN RETRIEVAL, ADDISON WESLEY, 1999 CHAPTER 2-2.5 (IR MODELS) CHAPTER 3 (EVALUATION) [2] N. FUHR: SKRIPTUM ZUR VORLESUNG INFORMATION RETRIEVAL, SS 2006 (PARTLY IN GERMAN, PARTLY ENGLISH) CHAPTER 3 (EVALUATION) CHAPTER 5.1-5.3, 5.5 (MODELS) AVAILABLE ONLINE AT THE COURSE HOME PAGE http://www.is.informatik.uni-duisburg.de/courses/ir_ss06/index.html OR DIRECTLY AT http://www.is.informatik.uni-duisburg.de/courses/ir_ss06/folien/irskall.pdf

7 Schedule Introduction IR-Basics (Lectures) Overview, terms and definitions Index (inverted files) Term processing Query processing Ranking (TF*IDF, …) Evaluation IR-Models (Boolean, vector, probab.) IR-Basics (Exercises) Web Search (Lectures and exercises)


Download ppt "Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for."

Similar presentations


Ads by Google