WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

Slides:



Advertisements
Similar presentations
Comparison of BIDS ISI (Enhanced) with Web of Science Lisa Haddow.
Advertisements

Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Modern Information Retrieval
Query Languages: Patterns & Structures. Pattern Matching Pattern –a set of syntactic features that must occur in a text segment Types of patterns –Words:
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Chapter 4 : Query Languages Baeza-Yates, 1999 Modern Information Retrieval.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Modern Information Retrieval Chapter 4 Query Languages.
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Overview of Search Engines
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Evaluating IR (Web) Systems Study of Information Seeking & IR Pragmatics of IR experimentation The dynamic Web Cataloging & understanding Web docs Web.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
1 Information Retrieval LECTURE 1 : Introduction.
Performance Measurement. 2 Testing Environment.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Information Retrieval
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
WIRED Future Quick review of Everything What I do when searching, seeking and retrieving Questions? Projects and Courses in the Fall Course Evaluation.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Information Retrieval Quality of a Search Engine.
Search and Retrieval: Query Languages Prof. Marti Hearst SIMS 202, Lecture 19.
WIRED Week 6 Syllabus Review Readings Overview Search Engine Optimization Assignment Overview & Scheduling Projects and/or Papers Discussion.
WIRED Week 5 Readings Overview - Text & Multimedia Languages & Properties - Text Operations - Multimedia IR Finalize Topic Discussions Schedule Projects.
ASSOCIATIVE BROWSING Evaluating 1 Jin Y. Kim / W. Bruce Croft / David Smith by Simulation.
Session 5: How Search Engines Work. Focusing Questions How do search engines work? Is one search engine better than another?
Information Retrieval in Practice
Information Retrieval (in Practice)
Federated & Meta Search
WIRED Week 2 Syllabus Update Readings Overview.
Introduction to Information Retrieval
Planning and Storyboarding a Web Site
CS246: Information Retrieval
Information Retrieval and Web Design
Introduction to information retrieval
Presentation transcript:

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries Assignment Overview & Scheduling - Leading WIRED Topic Discussions - Web Information Retrieval System Evaluation & Presentation Projects and/or Papers Discussion - Initial Ideas - Evaluation - Revise & Present

Evaluating IR Systems Recall and Precision Alternative Measures Reference Collections - What - Why Trends

Why Evaluate IR Systems? Leave it to the developers? - No bugs - Fully functional Let the market (users) decide? - Speed - (Perceived) accuracy Relevance is relevant - Different types of searches, data and users “How precise is the answer set?” p 73

Retrieval Performance Evaluation Task - Batch or Interactive - Each needs a specific interface Setting Context - New search - Monitoring Usability - Lab tests - Real world (search log) analysis

Recall and Precision Basic evaluation measurement for IR system performance Recall: the fraction of relevant documents retrieved - 100% is perfect recall - Every document that is relevant is found Precision: the fraction of retrieved documents which are relevant - 100% relevancy is perfect precision - How good the recall is

Recall and Precisions goals Everything is found (recall) The right set of documents is pulled from the found set (precision) What about ranking? - Ranking is an absolute measure of relevance for the query. - Ranking is Ordinal in almost all cases

Recall and Precision Considered 100 documents have been analyzed 10 documents relevant to the query in the set - 4 documents are found and all are relevant ??% recall, ??% precision - 8 documents are found, but 4 are relevant ??% recall, ??% precision Which is more important?

Recall and Precision Appropriate? Disagreements over perfect sets User errors in using results Redundancy of results - Result diversity - Metadata Dynamic data - Indexable - Recency of information may be key A single measure is better - Combinatory - User evaluation

Back to the User User evaluation Is one answer good enough? Rankings Satisficing Studies of Relevance are key

Other Evaluation Measures Harmonic Mean - Single, combined measure - Between 0 (none) & 1 (all) - Only high when both P & R are high - Still a percentage E measure - User determines (parameter) value of R & P - Different tasks (legal, academic) - An interactive search?

Coverage and Novelty System effects - Relative recall - Relative effort sMore natural, user understandable measure User knows some % documents are relevant Coverage = % documents user expects Novelty = % of documents user didn’t know of - Content of document - Document itself - Author of document - Purpose of document

Reference Collections Testbeds for IR evaluation TREC (Text Retrieval Conference) set - Industry focus - Topic-based or General - Summary tables for tasks (queries) - R & P averages - Document analysis - Measures for each topic CACM (general CS) ISI (academic, indexed, industrial)

Trends in IR Evaluation Personalization Dynamic Data Multimedia User Modeling Machine Learning (CPU/$)

Understanding Queries Types of Queries: - Keyword - Context - Boolean - Natural Language Pattern Matching - More like this… - Metadata Structural Environments

Boolean AND, OR, NOT Combination or individually Decision tree parsing for the system Not so easy for the user when advanced queries Hard to backtrack and see differences in results

Keyword Single word (most common) - Sets - “Phrases” Context - “Phrases” - Near (# value in characters, words, documents links)

Natural Language Asking Quoting Fuzzy matches Different evaluation methods might be needed Dynamic data “indexing” problematic Multimedia challenges

Pattern Matching Words Prefixes “comput*” Suffixes “*ology” Substrings “*exas*” Ranges “four ?? years ago” Regular Expressions (GREP) Error threshold User errors

Query Protocols HTTP Z Client – Server API WAIS - Information/ database connection ODBC JDBC P2P

Assignment Overview & Scheduling Leading WIRED Topic Discussions - # in class = # of weeks left? Web Information Retrieval System Evaluation & Presentation - 5 page written evaluation of a Web IR System - technology overview (how it works) - a brief history of the development of this type of system (why it works better) - intended uses for the system (who, when, why) - (your) examples or case studies of the system in use and its overall effectiveness

How can (Web) IR be better? - Better IR models - Better User Interfaces More to find vs. easier to find Scriptable applications New interfaces for applications New datasets for applications Projects and/or Papers Overview

Project Idea #1 – simple HTML Graphical Google What kind of document? When was the document created?

Project Ideas Google History: keeps track of what I’ve seen and not seen Searching when it counts: Financial and Health information requires guided, quality search