Stuart Card PARC (since ’74) Area Manager of the User Interface Research Center Ph.D. in Psychology from Carnegie Mellon Co-authored “The Psychology of.

Slides:



Advertisements
Similar presentations
User Interface Structure Design
Advertisements

Web Intelligence Text Mining, and web-related Applications
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
A review on “Answering Relationship Queries on the Web” Bhushan Pendharkar ASU ID
How to Search the USFSP Digital Archive By Carol Hixson, Dean Nelson Poynter Memorial Library May 31, 2014.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Search Engines and Information Retrieval
Internet Resources Discovery (IRD) Search Engines Quality.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Modern Information Retrieval Chapter 1 Introduction.
Search engines. The number of Internet hosts exceeded in in in in in
The Vector Space Model …and applications in Information Retrieval.
University of Liverpool Proposed New Library Interface A Direct Manipulation based strategy COMP106 Assessment 2Proposal 16.
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Object-Oriented Analysis and Design LECTURE 8: USER INTERFACE DESIGN.
Effective Internet Searching. Why use the Internet Search for a question Research a topic Current research Variety of sources, a click away What other.
Search Engines and Information Retrieval Chapter 1.
Path Knowledge Discovery: Association Mining Based on Multi-Category Lexicons Chen Liu, Wesley W. Chu, Fred Sabb, Stott Parker and Joseph Korpela.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Search Engine Interfaces search engine modus operandi.
Internet Business Foundations © 2004 ProsoftTraining All rights reserved.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
Chapter 6: Information Retrieval and Web Search
Data and information. Information and data By the end of this, you should be able to state the difference between DATE and INFORMAITON.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
ITGS Databases.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Digital Citizenship 3rd-5th Unit 1 Lesson 4 The Key to Keywords
Practical Programming COMP153-08S Week 5 Lecture 1: Screen Design Subroutines and Functions.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
5th Grade Internet Research
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Searching Introduction Grade 2. Types of Searches Directory Search Keyword Search.
NOODLETOOLS Note Cards All note card instruction was obtained from the Noodletools User Guide.
Information Organization: Overview
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Pages and Results
Multimedia Information Retrieval
Data Mining Chapter 6 Search Engines
IL Step 3: Using Bibliographic Databases
Information Organization: Overview
Information Retrieval and Web Design
Presentation transcript:

Stuart Card PARC (since ’74) Area Manager of the User Interface Research Center Ph.D. in Psychology from Carnegie Mellon Co-authored “The Psychology of Human-Computer Human Interaction”

Why Develop ScentIndex?

Why ScentIndex We are archiving large amounts of existing paper documents that into electronic books. It makes sense to (usually carefully made) subject indexes.

Why not just keyword searches? The function of searching keywords is to find content related to the concept of the keyword. The key word of interest may show up in too high or too low of frequency in a given document. In the former case, a more precise, conceptually related key word may be needed. In the latter case, it may be the reader needs a better key word.

Identifying Conceptual Relatedness

Chi, Hong, Heiser, Park (2006) Flow chart of the ScentIndex algorithm describing how the word semantic association matrix is used.

Word Co-Occurrence Matrix (M) Word 1 Word 2 Word 3 … Word n Word 1 M 2,1 M 3,1 …M n,1 Word 2 M 1,2 M 3,2 … M n,2 Word 3 M 1,3 M 2,3 …M n,3 … …………… Word n M 1,n M 2,n M 3,n … M= M i,j = The number of times word j occurs within a +/- 20 word span of each instance of word i.

From Q to Q’: Expanding user’s key terms ScentIndex takes user’s keyword query (vector Q) and uses spreading activation from the word co- occurrence matrix to identify other conceptually related terms. Q’ is the expanded set of keywords relevant to the user’s original query.

From E(k) to E(k)’: Expanding the subject index E is a vector of all subject entries. E(k) is a single entry in the subject index. Using the same spreading activation equation, keywords that are conceptually related to E(k) can be identified. E(k)’ is the expanded set of keywords relevant to subject index entries.

Putting it together ScentIndex takes the two expanded concept vectors (concepts/words including and related to the user’s query; concepts/words including and related to index entries) and does a cosine similarity comparison to output the most relevant index items.

Customized Subject Index

Usage Scenario (Overview) General User Scenario: 1) Enter key words 2) System narrows down to relevant entries and displays them for the user

Usage Scenario Based on “Biohazard” “What year did Russia open negotiations with Iraq for large fermentation vessels? What year did Vladimir Kryuchkov become chairman of the KGB? Which occurred first?” Steps: 1. Search within Index i.e. “kryuchkov chairman kgb” 2. New single screen index view is created organizes entries (most relevant is on top, exact matches in red), limits the amount the user has to search through 3. Click relevant page words are highlighted

User Study: Comparing ScentIndex and Paper Index Task Type  Retrieving (2 w/ScentIndex, 2 w/Paper Index)- 2 min max  The last natural occuring case of WHICH virus occurred in Somalia in  Comparing (2 w/ScentIndex, 2 w/Paper Index) – 4 min max  What is the death rate of smallpox and tularemia? Which virus has a higher death rate?  Comprehending information (2 w/PaperIndex, 2 w/Book Index) – 6 min  Diseases caused by different ages have different symptoms. Connect the items on the agent list to the symptoms on the right.

Measured  Speed  Accuracy Participant Types  (8) Experts  (8) Novice

Results Participants were faster when using the ScentIndex (M=145) than when using the Paper Subject Index (M=160). There were no interaction effects so:  This was true for both experts and novices.  This was true across all task types (retrieving, comparing and comprehending) Participants were more accurate using ScentIndex (only marginally significant).

Questions? Alternate eBook search techniques?  Keyword search engines (Google, AltaVista)  Cross-referencing table  Natural Language Processing User Study Questions Things good? Is Paper Index v. ScentIndex a valid comparison? What is an alternative? Generalizability of results? Expert v. novice? Other variables?