Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Multimedia Database Systems
Modern Information Retrieval Chapter 1: Introduction
An Introduction to Information Retrieval and Applications J. H. Wang Feb. 19, 2008.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
PrasadL1IntroIR1 Information Retrieval Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Yahoo and Stanford) and Christopher.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Modern Information Retrieval Chapter 1: Introduction
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
INFORMATION RETRIEVAL WEEK 1 AND 2
1 Information Retrieval and Web Search Introduction.
What is a document? Information need: From where did the metaphor, doing X is like “herding cats”, arise? quotation? “Managing senior programmers is like.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Srihari-CSE535-Spring2008 CSE 535 Information Retrieval Chapter 1: Introduction to IR.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
1 Web Information Retrieval Web Science Course. 2.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
 IR: representation, storage, organization of, and access to information items  Focus is on the user information need  User information need:  Find.
IR Lecture 1. Information Retrieval  Information retrieval is concerned with representing, searching, and manipulating large collections of electronic.
Web Technologies Search Engines
Query Expansion.
Information Retrieval, Search, and Mining
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Modern Information Retrieval Computer engineering department Fall 2005.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Search Engine Interfaces search engine modus operandi.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Information Retrieval, Search, and Mining Introduction.
Information Retrieval Introduction/Overview Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
CSCE 5300 Information Retrieval and Web Search Introduction to IR models and methods Instructor: Rada Mihalcea Class web page:
Search Engine Architecture
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Recuperação de Informação Cap. 01: Introdução 21 de Fevereiro de 1999 Berthier Ribeiro-Neto.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught.
ITEC547 Text Mining Web Technologies Search Engines.
I NFORMATION R ETRIEVAL AND W EB S EARCH Jianping Fan Department of Computer Science UNC-Charlotte 1.
CENG 776 Information Retrieval Nihan Kesim Çiçekli URL: 1/60.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Information Retrieval and Web Search Vasile Rus, PhD websearch/
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Visual Information Retrieval
Search Engine Architecture
Modern Information Retrieval
Information Retrieval and Web Search
Search Engine Architecture
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Search Engine Architecture
Information Retrieval and Extraction
Information Retrieval and Web Design
Information Retrieval and Web Design
Recuperação de Informação
Information Retrieval and Web Search
Presentation transcript:

Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL

2 What is Information Retrieval? Google ? Yahoo ? MSN search? What goes on behind search engine??? How do they work??????

3..Many Question What makes a system like Google search trick? How does it gather information? What trick does it use? Extending beyond the Web How can those approaches be made better? Natural language understanding? User interactions?

4..Many Question How can we do to make things work quickly? Faster computer? Caching? Compression? How do we decide it works well? For all queries? For special types of queries? On every collection of information?

5 Motivation IR: representation, storage, organization of, and access to information items Focus is on the user information need User information need: Find all docs containing information on college tennis teams which: (1) are maintained by a USA university and (2) participate in the NCAA tournament. Emphasis is on the retrieval of information (not data)

6 Motivation Data retrieval which documents contain a set of keywords? Well defined semantics a single erroneous object implies failure! Information retrieval information about a subject or topic semantics is frequently loose small errors are tolerated IR system: interpret contents of information items generate a ranking which reflects relevance notion of relevance is most important

7 Motivation IR at the center of the stage IR in the last 20 years: classification and categorization systems and languages user interfaces and visualization Advent of the Web changed this perception once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though: IR seen as key to finding the solutions!

8 Information Retrieval The indexing and retrieval of textual documents. Concerned firstly with retrieving relevant documents to a query. Concerned secondly with retrieving from large sets of documents efficiently.

9 Typical IR Task Given: A corpus of textual natural-language documents. A user query in the form of a textual string. Find: A ranked set of documents that are relevant to the query.

10 Relevance Relevance is a subjective judgment and may include: Being on the proper subject. Being timely (recent information). Being authoritative (from a trusted source). Satisfying the goals of the user and his/her intended use of the information (information need).

11 Relevance Much of IR depends upon idea that Similar vocabulary -> relevant to same queries Usually look for documents matching query words “Similar” can be measured in many ways String matching/comparison Same vocabulary used Probability that documents arise from same model Same meaning of text

12 Keyword Search Simplest notion of relevance is that the query string appears verbatim in the document. Slightly less strict notion is that the words in the query appear frequently in the document, in any order (bag of words).

13 Problems with Keywords May not retrieve relevant documents that include synonymous terms. “restaurant” vs. “café” May retrieve irrelevant documents that include ambiguous terms. “bat” (baseball vs. mammal) “Apple” (company vs. fruit) “bit” (unit of data vs. act of eating)

14 Intelligent IR Taking into account the meaning of the words used. Taking into account the order of words in the query. Adapting to the user based on direct or indirect feedback. Taking into account the authority of the source.

15 IR Basic Concepts The User Task Retrieval information or data purposeful Browsing glancing around F1; cars, Le Mans, France, tourism Retrieval Browsing Database

16 IR Basic Concepts Logical view of documents Document representation viewed as a continuum: logical view of documents might shift Docs structure Accents spacing stopwords Noun groups stemming Manual indexing structureFull textIndex terms

17 IR System Architecture User Interface Text Operations Query Operations Indexing Searching Ranking Index Text query user need user feedback ranked docs retrieved docs logical view inverted file DB Manager Module Text Database Text

18 IR System Components Text: Operations forms index words (tokens). Stopword removal Stemming Indexing: constructs an inverted index of word to document pointers. Searching: retrieves documents that contain a given query token from the inverted index. Ranking: scores all retrieved documents according to a relevance metric.

19 IR System Components User Interface: manages interaction with the user: Query input and document output. Relevance feedback. Visualization of results. Query Operations: transform the query to improve retrieval: Query expansion using a thesaurus. Query transformation using relevance feedback.

20 Web Search Application of IR to HTML documents on the World Wide Web. Differences : Must assemble document corpus by spidering the web. Can exploit the structural layout information in HTML (XML). Documents change uncontrollably. Can exploit the link structure of the web.

21 Web Search System Query String IR System Ranked Documents 1. Page1 2. Page2 3. Page3. Document corpus Web Spider

22 History of IR ’s: Initial exploration of text retrieval systems for “small” corpora of scientific abstracts, and law and business documents. Development of the basic Boolean and vector-space models of retrieval. Prof. Salton and his students at Cornell University are the leading researchers in the area.

23 History of IR 1980’s: Large document database systems, many run by companies: Lexis-Nexis Dialog MEDLINE

24 History of IR 1990’s: Searching FTP able documents on the Internet Archie WAIS Searching the World Wide Web Yahoo Altavista

25 History of IR 2000’s & continued : Link analysis for Web Search Google Multimedia IR Image Video Audio and music Document Summarization