Modern Information Retrieval Computer engineering department Fall 2005.

Slides:



Advertisements
Similar presentations
Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh
Advertisements

Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh
Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
An Introduction to Information Retrieval and Applications J. H. Wang Feb. 19, 2008.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Search – Summer Term 2006 I. General Introduction (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Modern Information Retrieval Chapter 1: Introduction
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
INFORMATION RETRIEVAL WEEK 1 AND 2
Evaluating the Performance of IR Sytems
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Chapter 1: Introduction to IR.
CS580: Building Web Based Information Systems Roger Alexander & Adele Howe The purpose of the course is to teach theory and practice underlying the construction.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.
Query Expansion.
Search Engines and Information Retrieval Chapter 1.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Chapter 6: Information Retrieval and Web Search
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
G. Marchionini, Univ. of Maryland Electronic Environments Cost Trends: Hardware cost < Software cost < Information cost < People time Virtuality (transcend.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Information Retrieval and Web Search Vasile Rus, PhD websearch/
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Visual Information Retrieval
Information Retrieval (in Practice)
Modern Information Retrieval
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Engine Architecture
Multimedia Information Retrieval
Information Retrieval
Thanks to Bill Arms, Marti Hearst
Evaluation of IR Performance
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Search Engine Architecture
Information Retrieval and Extraction
Information Retrieval and Web Design
Recuperação de Informação
Presentation transcript:

Modern Information Retrieval Computer engineering department Fall 2005

Subjects 1. Introduction 2. Models 3. Retrieval evaluation 4. Query languages 5. Query reformulation 6. Text properties 7. Text languages 8. Text processing 9. Information retrieval from the Web

Sources 1.Modern Information RetrievalR. Baeza-Yates and B. Ribeiro-Neto; Addison-Wesley Information Retrieval Systems: Theory and ImplementationG. Kowalski; Kluwer, Automatic Text ProcessingG. Salton; Addison- Wesley, Database System Concepts, 4thEditionA. Silberschatz, H. Korth and S. Sudarshan; McGraw-Hill, Various Web sites. 6.Original material.

Motivation IR: representation, storage, organization of, and access to information items IR: representation, storage, organization of, and access to information items Focus is on the user information need Focus is on the user information need Information item: Usually text, but possibly also image, audio, video, etc. Information item: Usually text, but possibly also image, audio, video, etc. Presently, most retrieval of non-text items is based on searching their textual descriptions. Presently, most retrieval of non-text items is based on searching their textual descriptions. Text items are often referred to as documents, and may be of different scope (book, article, paragraph, etc.). Text items are often referred to as documents, and may be of different scope (book, article, paragraph, etc.).

Motivation (cont.) User information need: User information need: –Find all docs containing information on college tennis teams which: (1) are maintained by a USA university and (2) participate in the NCAA tournament. Emphasis is on the retrieval of information (not data) Emphasis is on the retrieval of information (not data)

Motivation (cont.) Data retrieval  which docs contain a set of keywords?  Well defined semantics  a single erroneous object implies failure! Information retrieval  information about a subject or topic  semantics is frequently loose  small errors are tolerated IR system:  interpret contents of information items  generate a ranking which reflects relevance  notion of relevance is most important

Motivation (cont.) IR at the center of the stage  IR in the last 20 years:  classification and categorization  systems and languages  user interfaces and visualization  Still, area was seen as of narrow interest  Advent of the Web changed this perception once and for all  universal repository of knowledge  free (low cost) universal access  no central editorial board  many problems though: IR seen as key to finding the solutions!

Objecttives Overall objective:  Minimize search overhead Measurement of success:  Precision and recall Facilitate the overall objective:  Good search tools  Helpful presentation of results

Minimize search overhead Minimize overheadof a user who is locating needed information. Overhead: Time spent in all steps leading to the reading of items containing the needed information (query generation, query execution, scanning results, reading non-relevant items, etc.). Needed information: Either  Sufficient information in the system to complete a task.  All information in the system relevant to the user needs.  Example –shopping:  Looking for an item to purchase.  Looking for an item to purchase at minimal cost.  Example –researching:  Looking for a bibliographic citation that explains a particular term.  Building a comprehensive bibliography on a particular subject.

Measurement of success Two dual measures:  Precision: Proportion of items retrieved that are relevant.  Precision = relevant retrieved / total retrieved  =  Answer  Relevant  /  Answer   Recall: Proportion of relevant items that are retrieved.  Recall = relevant retrieved / relevant exist  =  Answer  Relevant  /  Relevant   Most popular measures, but others exist.

Measurement of success (cont.)

Support user search Support user search, providing tools to overcome obstacles such as: Ambiguities inherent in languages.  Homographs: Words with identical spelling but with multiple meanings.  Example: Chinon—Japanese electronics, French chateau. Limits to user's ability to express needs.  Lack of system experience or aptitude. Lack of expertise in the area being searched.  Initially only vague concept of information sought.  Differences between user's vocabulary and authors' vocabulary: different words with similar meanings.

Presentation of results Present search results in format that helps user determine relevant items: Arbitrary (physical) order Relevance order Clustered (e.g., conceptual similarity) Graphical (visual) representation

Basic Concepts The User Task  Retrieval  information or data  purposeful  Browsing  glancing around  F1; cars, Le Mans, France, tourism Retrieval Browsing Database

Querying(retrieval) vs. Browsing Two complementary forms of information or data retrieval: Querying:  Information need (retrieval goal) is focused and crystallized.  Contents of repository are well-known.  Often, user is sophisticated.

Querying(retrieval) vs. Browsing (cont.) Browsing:  Information need (retrieval goal) is vague and imprecise.  Contents of repository are not well-known.  Often, user is naive. Querying and browsing are often interleaved (in the same session).  Example: present a query to a search engine, browse in the results, restate the original query, etc.

Pulling vs. Pushing information Querying and browsing are both initiated by users (information is “pulled” from the sources). Alternatively, information may be “pushed” to users.  Dynamically compare newly received items against standing statements of interests of users (profiles) and deliver matching items to user mail files.  Asynchronous (background) process.  Profile defines all areas of interest (whereas an individual query focuses on specific question).  Each item compared against many profiles (whereas each query is compared against many items).

Basic Concepts Logical view of the documents Document representation viewed as a continuum: logical view of docs might shift structure Accents spacing stopwords Noun groups stemming Manual indexing Docs structure Full text Index terms

User Interface Text Operations Query Operations Indexing Searching Ranking Index Text query user need user feedback ranked docs retrieved docs logical view inverted file DB Manager Module Text Database Text The Retrieval Process