1 CS 430: Information Discovery Lecture 20 The User in the Loop.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval IR 7. Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Modern Information Retrieval
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
Case study - usability evaluation Howell Istance.
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Evaluation of Retrieval Effectiveness 2.
Anatomy of a Large-Scale Hypertextual Web Search Engine (e.g. Google)
1 CS 430: Information Discovery Lecture 3 Inverted Files and Boolean Operations.
Information Retrieval in Practice
© Anselm SpoerriInfo + Web Tech Course Information Technologies Info + Web Tech Course Anselm Spoerri PhD (MIT) Rutgers University
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
1 CS 430 / INFO 430 Information Retrieval Lecture 4 Searching Full Text 4.
1 CS 502: Computing Methods for Digital Libraries Lecture 11 Information Retrieval I.
WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
1 CS 430: Information Discovery Lecture 15 Usability 2.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
1 CS 430: Information Discovery Lecture 3 Inverted Files.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 CS430: Information Discovery Lecture 18 Usability 3.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
1 CS 430: Information Discovery Lecture 19 User Interfaces.
Information Retrieval
LIS618 lecture 8 Thomas Krichel Lexis/Nexis Lexis is a specialized legal research service Nexis is primarily a news services adds an important.
1 CS 430: Information Discovery Lecture 14 Usability I.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Evaluation of Retrieval Effectiveness 2.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
1 CS 430: Information Discovery Lecture 3 Inverted Files.
1 CS 430: Information Discovery Lecture 21 Interactive Retrieval.
General Architecture of Retrieval Systems 1Adrienn Skrop.
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
Information Retrieval in Practice
Information Retrieval in Practice
Text Based Information Retrieval
CS 430: Information Discovery
CS 430: Information Discovery
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
CS 430: Information Discovery
CS 430: Information Discovery
Data Mining Chapter 6 Search Engines
Introduction to Information Retrieval
CS/INFO 430 Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

1 CS 430: Information Discovery Lecture 20 The User in the Loop

2 Course Administration Final examination: Date: Tuesday, 15-MAY Start Time: 3:00 PM Finish Time: 5:30 PM Room: KL B11

3 Course Administration Assignment 3 Not acceptable: I recommend that the company uses XYZ commercial software package. A common question: What file structure is suitable for fielded searching?

4 Inverted File (Basic) Inverted file: a list of the words in a set of documents and the documents in which they appear. Word Document abacus actor aspen 5 atoll Stop words are removed before building the index. From Lecture 3

5 Inverted File (Enhanced) WordPostings DocumentLocation abacus actor aspen atoll From Lecture 3

6 Inverted File (Enhanced) WordPostings DocumentLocationField abacus4 3 94normal 19 7title normal 2256subject actor3 2 66title normal 29 45normal aspen1 5 43list atoll3 11 3normal 1170normal 34 40footnote

7 The Human in the Loop Search index Return hits Browse repository Return objects

8 Evaluation of Usability Observing users (user protocols) Focus groups Measurements effectiveness in carrying out tasks speed Expert review Competitive analysis

9 See paper by Croft, Cook and Wilder in the CS 430 readings

10 THOMAS The documents: Full text of all legislation introduced in Congresses, since Text of the Congressional Record. Indexes Bills are indexed by title, bill number, and the text of the bill. The Congressional Record is indexed by title, document identifier, date, speaker, and page number. Search system InQuery -- developed by the University of Massachusetts, Available commercially from Sovereign Hill Software.

11 Weighting Single-word Query The more instances of that word in the document, the more relevant the document will be considered. Occurrence of the term in the title are considered most relevant (weight x 20).

12 Weighting Multiple-word Queries 1. Documents containing instances of the search terms as a phrase --i.e., adjacent to each other 2. Search terms occur near, but not next to, each other, and not necessarily in the same order as entered. 3. All search terms appear singly, not in proximity to each other. 4. Documents contain less than all of the words.

13 Language Problems InQuery considers of NO relevance documents containing NO instances of any form of the search words Search for "capital punishment" does not find legislation about "death penalty". If there are no highly relevant documents, InQuery returns poorly relevant documents Search for "elderly black Americans" into the system and received a bill on "black bears" as most relevant, followed by bills relating to "black colleges and universities". (There were no bills in any way related to "elderly black Americans".)

14 Advanced Features Ranked output: Combines evidence in the text of the document and the corpus as a whole. Passage-based retrieval: The probability of relevance is based both on the entire content of a document and the best matching passage in the document. Simple and complex queries: e.g., simple word-based queries, Boolean queries, phrase-based queries or a combination. Field-based retrieval: e.g., bill number and type. Flexible and efficient indexing: Incorporates a variety of document structures (e.g. HTML, MARC, etc.) Tools for query processing and query expansion

15 Queries WordsUnique Queries 1 5, , , , Total 25,321 Table showing number of words in queries

16 D-Lib Working Group on Metrics DARPA-funded attempt to develop a TREC-like approach to digital libraries (1997). "This Working Group is aimed at developing a consensus on an appropriate set of metrics to evaluate and compare the effectiveness of digital libraries and component technologies in a distributed environment. Initial emphasis will be on (a) information discovery with a human in the loop, and (b) retrieval in a heterogeneous world. " Very little progress made. See:

17 MIRA Evaluation Frameworks for Interactive Multimedia Information Retrieval Applications European study Chair Keith Van Rijsbergen, Glasgow University Expertise Multi Media Information Retrieval Information Retrieval Human Computer Interaction Case Based Reasoning Natural Language Processing

18 MIRA Starting Point Information Retrieval techniques are beginning to be used in complex goal and task oriented systems whose main objectives are not just the retrieval of information. New original research in IR is being blocked or hampered by the lack of a broader framework for evaluation.

19 MIRA Aims Bring the user back into the evaluation process. Understand the changing nature of IR tasks and their evaluation. 'Evaluate' traditional evaluation methodologies. Consider how evaluation can be prescriptive of IR design Move towards balanced approach (system versus user) Understand how interaction affects evaluation. Support the move from static to dynamic evaluation. Understand how new media affects evaluation. Make evaluation methods more practical for smaller groups. Spawn new projects to develop new evaluation frameworks

20 MIRA Approaches Developing methods and tools for evaluating interactive IR. Possibly the most important activity of all. User tasks: Studying real users, and their overall goals. Improve user interfaces is to widen the set of users Develop a design for a multimedia test collection. Get together collaborative projects. (TREC was organized as competition.) Pool tools and data.