Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Chapter 5: Introduction to Information Retrieval
Multimedia Database Systems
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
ISP 433/533 Week 2 IR Models.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 12: Language Models for IR.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Evaluating the Performance of IR Sytems
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Lecture 3: Retrieval Evaluation Maya Ramanath. Benchmarking IR Systems Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Performance Measurement. 2 Testing Environment.
Information Retrieval
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Relevance Feedback Hongning Wang
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Search and Retrieval: Finding Out About Prof. Marti Hearst SIMS 202, Lecture 18.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 14: Language Models for IR.
1 Probabilistic Models for Ranking Some of these slides are based on Stanford IR Course slides at
Lecture 13: Language Models for IR
CSCI 5417 Information Retrieval Systems Jim Martin
Multimedia Information Retrieval
Relevance Feedback Hongning Wang
Information Retrieval
Basic Information Retrieval
موضوع پروژه : بازیابی اطلاعات Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
John Lafferty, Chengxiang Zhai School of Computer Science
CS246: Information Retrieval
Language Model Approach to IR
Search Engine Architecture
CS 4501: Information Retrieval
CS 430: Information Discovery
INF 141: Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

Lecture 1: Overview of IR Maya Ramanath

Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result for the query “maya ramanath”? OR: How good is Google?

Lectures Overview (this lecture) Retrieval Models Retrieval Evaluation Why DB and IR?

Information Retrieval “An information retrieval system does not inform (i.e. change the knowledge of) the user on the subject of his inquiry. It merely informs on the existence (or non- existence) and whereabouts of documents relating to his request.” “Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).”

Basic Terms TermDefinition DocumentA sequence/set of terms, expressing ideas about one or more topics, usually in natural language Corpus/CollectionA set of documents Information needCorresponds to an innate idea of information/knowledge that the user is currently looking for Term/Keyword/PhraseA semantic unit, a word, phrase or potentially root of a word QueryThe expression of the information need by the user RelevanceA measure of how well the retrieved documents satisfy the user’s information need

What is a retrieval system? Source: Hiemstra, D. (2009) Information Retrieval Models, in Information Retrieval: Searching in the 21st Century (eds A. Göker and J. Davies), John Wiley & Sons, Ltd, Chichester, UK.

Retrieval Models Source and Further Reading: Hiemstra, D. (2009) Information Retrieval Models, in Information Retrieval: Searching in the 21st Century (eds A. Göker and J. Davies), John Wiley & Sons, Ltd, Chichester, UK.

2 kinds of models No Ranking – Boolean models – Region models Ranking – Vector space model – Probabilistic models – Language models

Boolean Model Based on set theory Simple query language Ex: information AND (retrieval OR management) retrieval management information

Vector Space Model (1/2) Based on the notion of “similarity” between query and document – Query is the representation of the document that you want to retrieve – Compare similarity between query and document Luhn’s formulation: The more two representations agreed in given elements and their distribution, the higher would be the probability of their representing similar information.

Vector Space Model (2/2) Document Query We will study more in the next lecture

Probabilistic IR (1/2) Based on probability theory – Specifically, we would like to estimate the probability of relevance The Probability Ranking Principle If a reference retrieval system’s response to each request is a ranking of the documents in the collections in order of decreasing probability of usefulness to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data has been made available to the system for this purpose, then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data.

Probabilistic IR (2/2) Ranking of documents based on Odds We will study more in the next lecture

Language Models (1/3) Based on generative models for documents and queries Documents, Query: Samples of an underlying probabilistic process Estimate the parameters of this process Measure how close the distributions are (KL- divergence) – “Closeness” gives a measure of relevance

Language Models (2/3) d2d2 d1d1 q Documents Query

Language Models (3/3) The Maximum Likelihood Estimator + smoothing We will study more in the next lecture

Evaluation (Which system is best?)

Benchmarking IR Systems (1/2) Why do we need to benchmark? To benchmark an IR system – Efficiency – Quality Results Power of interface Ease of use, etc.

Benchmarking IR Systems (2/2) Result Quality Data Collection – Ex: Archives of the NYTimes Query set – Provided by experts, identified from real search logs, etc. Relevance judgements – For a given query, is the document relevant?

Precision, Recall, F-Measure Precision Recall F-Measure: Weighted harmonic mean of Precision and Recall

That’s it for today!