Evaluating IR (Web) Systems Study of Information Seeking & IR Pragmatics of IR experimentation The dynamic Web Cataloging & understanding Web docs Web.

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

Grade 6 Hopewell Elementary You will read each slide, then try to think of the answer. When you think you know the answer, click ONCE on the mouse.
Chapter 5: Introduction to Information Retrieval
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advanced Searching Engineering Village.
Information Retrieval in Practice
Search Engines and Information Retrieval
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Information Retrieval in Practice
© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Overview of Search Engines
Internet Research Search Engines & Subject Directories.
With Windows 7 Comprehensive© 2012 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Windows 7 Comprehensive.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Search Engines and Information Retrieval Chapter 1.
Using a Web Browser What does a Web Browser do? A web browser enables you to surf the World Wide Web. What are the most popular browsers?
Put it to the Test: Usability Testing of Library Web Sites Nicole Campbell, Washington State University.
You are about to view an instructional presentation created in PowerPoint. Many of the slides have animated text. Please wait several seconds before advancing.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Downloading defined: Downloading is the process of copying a file (such as a game or utility) from one computer to another across the internet. When you.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Personal Information Management Vitor R. Carvalho : Personalized Information Retrieval Carnegie Mellon University February 8 th 2005.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Never-ending Search: (What you REALLY need to know about online searching) Ms. Emili school year.
What to Know: 9 Essential Things to Know About Web Searching Janet Eke Graduate School of Library and Information Science University of Illinois at Champaign-Urbana.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
The Savvy Cyber Teacher ® Using the Internet Effectively in the K-12 Classroom 1 Copyright © 2003 Stevens Institute of Technology, CIESE, All Rights Reserved.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Endangered Species A Collaborative Teaching Unit.
Web Page Concept and Design :
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
WIRED Future Quick review of Everything What I do when searching, seeking and retrieving Questions? Projects and Courses in the Fall Course Evaluation.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Information Architecture & Design Week 9 Schedule - Web Research Papers Due Now - Questions about Metaphors and Icons with Labels - Design 2- the Web -
KMS & Collaborative Filtering Why CF in KMS? CF is the first type of application to leverage tacit knowledge People-centric view of data Preferences matter.
Documents and Indexing Readings Overview Topic Discussions Schedule Set Projects and Papers Ideas.
WIRED Week 6 Syllabus Review Readings Overview Search Engine Optimization Assignment Overview & Scheduling Projects and/or Papers Discussion.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
WIRED Week 5 Readings Overview - Text & Multimedia Languages & Properties - Text Operations - Multimedia IR Finalize Topic Discussions Schedule Projects.
7 th Grade Big6 Project Assignment: Make a children’s informational book (It can be in graphic novel format or regular picture-book format)
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Session 5: How Search Engines Work. Focusing Questions How do search engines work? Is one search engine better than another?
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Information Retrieval in Practice
Information Retrieval in Practice
Search Engine Architecture
Information Retrieval (in Practice)
Federated & Meta Search
A Brief Introduction to the Internet
Search Engines & Subject Directories
Search Search Engines Search Engine Optimization Search Interfaces
WIRED Week 2 Syllabus Update Readings Overview.
Information Retrieval
Search Engines & Subject Directories
Search Engines & Subject Directories
Planning and Storyboarding a Web Site
Presentation transcript:

Evaluating IR (Web) Systems Study of Information Seeking & IR Pragmatics of IR experimentation The dynamic Web Cataloging & understanding Web docs Web site characteristics

Study of Info seeking & retrieval - Well known authors (useful for research papers) Real life studies (not TREC) - User context of questions - Questions (structure & classification) - Searcher (cognitive traits & decision making) - Information Items Difference searches with same question Relevant items “models, measures, methods, procedures and statistical analyses” p 175 Beyond common sense and anecdotes

Study 2 Is there ever enough user research? A good set of elements to include in an IR system evaluation How do you test for real life situations? - Questions the users actually have - Expertise in subject (or not) - Intent - User’s computers, desks & materials What’s a search strategy? - Tactics, habits, previous knowledge How do you collect search data?

Study 3 How do you ask questions? - General knowledge test - Specific search terms Learning Style Inventory - NOT the best way to understand users - Better than nothing - Choose your questions like your users Let users choose their questions? Let users work together on searches Effectiveness Measures - Recall, precision, relevance

Study 4 Measuring efficiency - Time on tasks - Task completion Correct answer Any answer? - Worthwhile? Counting correct answers Statistics - Clicks, commands, pages, results - Not just computer time, but the overall process - Start with the basics, then get advanced - Regression analysis (dependencies for large studies)

Let’s design an experiment User Selection - Searcher (cognitive traits & decision making) - User context of questions Environment Questions (structure & classification) Information Items - Successful answers - Successful/Worthwhile sessions Measurement

Pragmatics of IR experimentation The entire IR evaluation must be planned Controls are essential Working with what you can get - Expert defined questions & answers - Specific systems Fast, cheap, informal tests - Not always, but could be pre-tests - Quick results for broad findings

Pragmatic Decision1 Testing at all? - Purpose of test - Pull data from previous tests Repeat old test - Old test with new system - Old test with new database Same test, many users - Same system - Same questions (data)

Pragmatic Decision 2 What kind of test? Everything at once? - System (help, no help?) - Users (types of) - Questions (open-ended?) Facts - Answers with numbers - Words the user knows General knowledge - Found more easily - Ambiguity goes both ways

Pragmatic Decision 3 Understanding the Data What are your variables? (p 207) Working with initial goals of study Study size determines measurement methods - Lots of user - Many questions - All system features, competing system features What is acceptable/passable performance? - Time, correct answers, clicks? - Which are controlled?

Pragmatic Decision 4 What database? - The Web (no control) - Smaller dataset (useful to user?) Very similar questions, small dataset - Web site search vs. whole Web search - Prior knowledge of subject - Comprehensive survey of possible results beforehand Differences other than content?

Pragmatic Decision 5 Where do queries/questions come from? - Content itself - User pre-interview (pre-tests) - Other studies What are search terms (used or given) - Single terms - Advanced searching - Results quantity

Pragmatic Decisions 6, 7, 8 Analyzing queries - Scoring system - Logging use What’s a winning query (treatment of units) - User success, expert answer - Time, performance - Different querie with same answer? Collect the data - Logging and asking users - Consistency (software, questionnaires, scripts)

Pragmatic Decisions 9 & 10 Analyzing Data Dependent on the dataset Compare to other studies Basic statistics first Presenting Results Work from plan Purpose Measurement Models Users Matching other studies

Keeping Up with the Changing Web Building Indices is difficult enough in theory What about a continuously changing huge volume of information? Is old information good? What does up-to-date mean anymore? Is Knowledge a depreciating commodity? - Correctness + Value over time Different information changes at different rates - Really it’s new information How do you update an index with constantly changing information?

Changing Web Properties Known distributions for information change Sites and pages may have easily identifiable patterns of update - 4% change on every observation - Some don’t ever change (links too) If you check and a page hasn’t changed, what is the probability it will ever change? Rate of change is related to rate of attention - Machines vs. Users - Measures can be compared along with information

Dynamic Maint. of Indexes w/Landmarks Web Crawlers do the work in gathering pages Incremental crawling means incremented indices - Rebuild the whole index more frequently - Devise a scheme for updates (and deletions) - Use supplementary indices (i.e. date) New documents Changed documents 404 documents

Landmarks for Indexing Difference-based method Documents that don’t change are landmarks - Relative addressing - Clarke: block-based - Glimpse: chunking Only update pointers to pages Tags and document properties are landmarked Broader pointers mean less updates Faster indexing – Faster access?

Yahoo! Cataloging the Web How do information professionals build an “index” of the Web? Cataloging applies to the Web Indexing with synonyms Browsing indexes vs searching them Comprehensive index not the goal - Quality - Information Density Yahoo’s own ontology – points to site for full info Subject Trees with aliases to other locations “More like this” comparisons as checksums

Yahoo uses tools for indexing

Investigation of Documents from the WWW What properties do Web documents have? What structure and formats do Web documents use? What properties do Web documents have? - Size – 4K avg. - Tags – ratio and popular tags - MIME types (file extensions) - URL properties and formats - Links – internal and external - Graphics - Readability

WWW Documents Investigation How do you collect data like this? - Web Crawler URL identifier, link follower - Index-like processing Markup parser, keyword identifier Domain name translation (and caching) How do these facts help with indexing? Have general characteristics changed? (This would be a great project to update.)

Properties of Highly-Rated Web Sites What about whole Web sites? What is a Web site? - Sub-sites? - Specific contextual, subject-based parts of a Web site? - Links from other Web pages: on the site and off - Web site navigation effects Will experts (like Yahoo catalogers) like a site?

Properties Links & formatting Graphics – one, but not too many Text formatting – 9 pt. with normal style Page (layout) formatting – min. colors Page performance (size and acess) Site architecture (pages, nav elements) - More links within and external - Interactive (search boxes, menus) Consistency within a site is key How would a user or index builder make use of these?

Extra Discussion Little Words, Big Difference - The difference that makes a difference - Singular and plural noun identification can change indices and retrieval results - Language use differences Decay and Failures - Dead links - Types of errors - Huge amount of dead links (PageRank effective) 28% in Computer & CACM 41% in 2002 articles Better than the average Web page?

Break!

Topic Discussions Set Leading WIRED Topic Discussions - About 20 minutes reviewing issues from the week’s readings Key ideas from the readings Questions you have about the readings Concepts from readings to expand on - PowerPoint slides - Handouts - Extra readings (at least a few days before class) – send to wired listserv

Web IR Evaluation - 5 page written evaluation of a Web IR System - technology overview (how it works) Not an eval of a standard search engine Only main determinable diff is content - a brief overview of the development of this type of system (why it works better) - intended uses for the system (who, when, why) - (your) examples or case studies of the system in use and its overall effectiveness

How can (Web) IR be better? - Better IR models - Better User Interfaces More to find vs. easier to find Web documents sampling Web cataloging work - Metadata & IR - Who watches the catalogers? Scriptable applications - Using existing IR systems in new ways - RSS & IR Projects and/or Papers Overview

Project Ideas Searchable Personal Digital Library Browser hacks for searching Mozilla keeps all the pages you surf so you can search through them later - Mozilla hack - Local search engines Keeping track of searches Monitoring searches

Paper Ideas New datasets for IR Search on the Desktop – issues, previous research and ideas Collaborative searching – advantages and potential, but what about privacy? Collaborative Filtering literature review Open source and IR systems history & discussion