SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Slides:



Advertisements
Similar presentations
Recuperação de Informação B Cap. 10: User Interfaces and Visualization 10.1,10.2,10.3 November 17, 1999.
Advertisements

Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Information Retrieval: Human-Computer Interfaces and Information Access Process.
Jane Reid, AMSc IRIC, QMUL, 13/11/01 1 IR interfaces Purpose: to support users in information-seeking tasks Issues: –Functionality –Usability Motivations.
The Experience Factory May 2004 Leonardo Vaccaro.
Information Retrieval in Practice
Search Engines and Information Retrieval
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Interfaces for Retrieval Results. Information Retrieval Activities Selecting a collection –Talked about last class –Lists, overviews, wizards, automatic.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
More Interfaces for Retrieval. Information Retrieval Activities Selecting a collection –Lists, overviews, wizards, automatic selection Submitting a request.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Interfaces for Selecting and Understanding Collections.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
SIMS 296a-3: UI Background Marti Hearst Fall ‘98.
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
Information Retrieval: Human-Computer Interfaces and Information Access Process.
© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.
WMES3103: INFORMATION RETRIEVAL WEEK 10 : USER INTERFACES AND VISUALIZATION.
UCB CS Research Fair Search Text Mining Web Site Usability Marti Hearst SIMS.
ISP 433/633 Week 12 User Interface in IR. Why care about User Interface in IR Human Search using IR depends on –Search in IR and search in human memory.
Overview of Search Engines
1 Prototype Hierarchy Based Clustering for the Categorization and Navigation of Web Collections Zhao-Yan Ming, Kai Wang and Tat-Seng Chua School of Computing,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Document Clustering 文件分類 林頌堅 世新大學圖書資訊學系 Sung-Chien Lin Department of Library and Information Studies Shih-Hsin University.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search Engine Architecture
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
Interaction LBSC 734 Module 4 Doug Oard. Agenda Where interaction fits Query formulation Selection part 1: Snippets  Selection part 2: Result sets Examination.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Relevance Feedback in Image Retrieval System: A Survey Tao Huang Lin Luo Chengcui Zhang.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Retrieval in Practice
Information Organization: Overview
Search Engine Architecture
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Search Engine Architecture
Visualization of Web Search Results in 3D
Visualizing Document Collections
Document Clustering Matt Hughes.
Magnet & /facet Zheng Liang
Search Engine Architecture
The ultimate in data organization
Information Retrieval and Web Design
Information Organization: Overview
Lab 2: Information Retrieval
Presentation transcript:

SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98

Carol Butler Fall 98 Outline IA Interfaces IA Interfaces Design Principles Design Principles Aids for Source Selection Aids for Source Selection SavvySearch SavvySearch HITS HITS Kohonen maps Kohonen maps Implications for New Research Implications for New Research

Carol Butler Fall 98 IA Interface should help User: Express information needs and/or formulate queries. Express information needs and/or formulate queries. Select among available sources. Select among available sources. Understand search results. Understand search results. From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Carol Butler Fall 98 IA Interface should allow User to: Reassess goals and adjust search strategy. Reassess goals and adjust search strategy. Follow trails with unanticipated results. Follow trails with unanticipated results. Monitor the progress of a search strategy. Monitor the progress of a search strategy. Use output of one action as input to the next. Use output of one action as input to the next. From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Carol Butler Fall 98 Role of Visualization: Communicate more rapidly and effectively. Communicate more rapidly and effectively. Techniques Techniques icons and color highlighting icons and color highlighting brushing and linking brushing and linking panning and zooming panning and zooming focus-plus-context focus-plus-context animation animation Interactivity Interactivity From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Carol Butler Fall 98 “Visualization of inherently abstract information is more difficult, and visualization of textually represented information is especially challenging.” From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

Carol Butler Fall 98 Starting Points for Search Lists of sources (Lexis-Nexis) Lists of sources (Lexis-Nexis) Overviews Overviews Clusters Clusters Category Hierarchies/Subject Codes Category Hierarchies/Subject Codes Co-citation Links Co-citation Links Examples Examples Automatic source selection Automatic source selection

Carol Butler Fall 98 Last Week’s Readings Overviews via Category Hierarchies Overviews via Category Hierarchies HIBROWSE (Pollitt 97) HIBROWSE (Pollitt 97) Cat-A-Cone (Hearst 97) Cat-A-Cone (Hearst 97)

Carol Butler Fall 98 Today’s Readings Automatic Source Selection Automatic Source Selection SavvySearch (Howe & Dreilinger 97) SavvySearch (Howe & Dreilinger 97) Overviews via co-citation hyperlinks Overviews via co-citation hyperlinks HITS (Kleinberg et al. 97) HITS (Kleinberg et al. 97) Overviews via clusters Overviews via clusters Kohonen maps (Chen et al. 97) Kohonen maps (Chen et al. 97)

Carol Butler Fall 98 SavvySearch Addresses problems with meta-search engines. Addresses problems with meta-search engines. reduce burden on user … but reduce burden on user … but may waste computational and Web resources may waste computational and Web resources Carefully selects search engines likely to return useful results. Carefully selects search engines likely to return useful results.

Carol Butler Fall 98 Options provided by interface Sources and types of information. Sources and types of information. Treatment of query terms. Treatment of query terms. Display of results. Display of results. Interface language. Interface language. View interface. View interface. View interface View interface

Carol Butler Fall 98 Query Processing Reasoning about available resources Reasoning about available resources modify concurrency (number of search engines queried in parallel) modify concurrency (number of search engines queried in parallel) network load estimates (lookup table, time) network load estimates (lookup table, time) local CPU load (UNIX uptime command) local CPU load (UNIX uptime command) Ranking search engines Ranking search engines learned associations between search engines and query terms (stored in a meta- index) learned associations between search engines and query terms (stored in a meta- index) recent data on performance recent data on performance

Carol Butler Fall 98 Meta-Index No Results No Results search engine failed to return links search engine failed to return links reduces confidence that this engine is appropriate for particular query reduces confidence that this engine is appropriate for particular query effectiveness values are reduced effectiveness values are reduced Visits Visits number of links explored by user number of links explored by user indicates user found some links to be interesting and increases confidence indicates user found some links to be interesting and increases confidence

Carol Butler Fall 98 Future Development Meta-search will need to be personalized and embedding in other systems. Meta-search will need to be personalized and embedding in other systems. Experimental version divides search into categories, with separate sets of rules for creating a search plan. Experimental version divides search into categories, with separate sets of rules for creating a search plan. Experimental version Experimental version Web Indexes Web Directories Usenet News Software People Reference Entertainment Technical Reports

Carol Butler Fall 98 Hyperlink-Induced Topic Search (HITS) System for locating authoritative web sources System for locating authoritative web sources Two premises: Two premises: Implicit annotation provided by creators of hyperlinks contains sufficient information to infer a notion of “authority. Implicit annotation provided by creators of hyperlinks contains sufficient information to infer a notion of “authority. Sufficiently broad topics contain embedded communities of hyperlinked pages. Sufficiently broad topics contain embedded communities of hyperlinked pages.

Carol Butler Fall 98 HITS Two types of pages Two types of pages Authorities Authorities highly referenced pages on the topic highly referenced pages on the topic Hubs Hubs pages that “point” to many of the authorities pages that “point” to many of the authorities Mutually reinforcing relationships Mutually reinforcing relationships Starts from a user-supplied query Starts from a user-supplied query

Carol Butler Fall 98 HITS method Base set of pages returned by search engine Base set of pages returned by search engine Add pages that point to, or are pointed to by, any page in base set Add pages that point to, or are pointed to by, any page in base set Assign each page a hub weight h(p) and authority weight a(p) (initialize to 1) Assign each page a hub weight h(p) and authority weight a(p) (initialize to 1) For each page: For each page: Replace a(p) by the sum of the h()’s of all pages pointing to it Replace a(p) by the sum of the h()’s of all pages pointing to it Replace h(p) by the sum of the a()’s of all pages pointed to by it Replace h(p) by the sum of the a()’s of all pages pointed to by it Repeat Repeat

Carol Butler Fall 98 HITS results Broad topics tend to have robust structure Broad topics tend to have robust structure astrophysics astrophysics Michael Jordan Michael Jordan Generalizes topics not sufficiently broad Generalizes topics not sufficiently broad Dennis Ritchie Dennis Ritchie Density of linkage on a topic influences authority/hub structure Density of linkage on a topic influences authority/hub structure English literature vs. German literature English literature vs. German literature Web-centric topics Web-centric topics cryptography cryptography Commercialization Commercialization tennis tennis

Carol Butler Fall 98 Future Development Study temporal evolution of communities on the Web. Study temporal evolution of communities on the Web. Combining text and the structure of hyperlinks. Combining text and the structure of hyperlinks. text within text within text near hyperlink text near hyperlink CLEVER project at IBM Almaden Research Center CLEVER project at IBM Almaden Research Center CLEVER

Carol Butler Fall 98 Automatically Generated Concept Space (Kohonen map and ET-Space Thesaurus) IR users need: IR users need: Working knowledge of the system where the information is stored Working knowledge of the system where the information is stored how to navigate how to navigate how info is categorized or organized how info is categorized or organized Knowledge of the subject of interest Knowledge of the subject of interest particularly the vocabulary of the subject domain particularly the vocabulary of the subject domain

Carol Butler Fall 98 Browsing vs. Searching Browsing Browsing users rely on mental models users rely on mental models embedded digression problem embedded digression problem Searching Searching content-based content-based two basic approaches two basic approaches keyword search keyword search combined keyword search and categorization combined keyword search and categorization vocabulary differences problem vocabulary differences problem

Carol Butler Fall 98 User Aids for Browsing Directories Directories categories limited in granularity categories limited in granularity categories limited in timeliness categories limited in timeliness creating categories is manual, slow, and cumbersome creating categories is manual, slow, and cumbersome Kohonen self-organizing map (SOM) Kohonen self-organizing map (SOM) generates clusters of important concepts generates clusters of important concepts

Carol Butler Fall 98 Concept “Landscapes” Pharmacology Anatomy Legal Disease Hospitals Built using Kohonen Feature Maps Xia Lin, H.C. Chen slide by Marti Hearst

Carol Butler Fall 98 User Aids for Searching Query expansion Query expansion Relevance feedback Relevance feedback Multidimensional scaling Multidimensional scaling metric similarity modeling metric similarity modeling latent semantic indexing latent semantic indexing Thesauri use Thesauri use incorporating existing thesauri incorporating existing thesauri automatic thesaurus generation automatic thesaurus generation

Carol Butler Fall 98 Automatic Thesaurus Generation Statistical co-occurrence Statistical co-occurrence Cluster analysis further groups terms Cluster analysis further groups terms Chen et al. Chen et al. document collection document collection automatic indexing automatic indexing co-occurrence analysis co-occurrence analysis associative retrieval associative retrieval Et-Space Webpage Et-Space Webpage Et-Space Webpage Et-Space Webpage

Carol Butler Fall 98 Experiment with Yahoo Browsing tested with Kohonen SOM Browsing tested with Kohonen SOM subjects who started with Yahoo were less successful in repeating the task with the SOM than vice versa subjects who started with Yahoo were less successful in repeating the task with the SOM than vice versa useful more for broad exploring than for searching useful more for broad exploring than for searching Searching tested with AGT Searching tested with AGT suggested terms came from web pages suggested terms came from web pages most useful in further refining an initially too broad search most useful in further refining an initially too broad search

Carol Butler Fall 98 Future Development Effects of different information sources Effects of different information sources cohesion cohesion consistent with user’s mental model consistent with user’s mental model User Interface design User Interface design flexibility flexibility spelling errors and typos spelling errors and typos pan-zoom pan-zoom help screens or instructions (or more intuitive design, or both) help screens or instructions (or more intuitive design, or both)

Carol Butler Fall 98 Review and Discussion Overviews Overviews Category Labels Category Labels when docs stored “inside” categories, users cannot create queries based on combinations of categories when docs stored “inside” categories, users cannot create queries based on combinations of categories display of hierarchies takes up large amounts of screen space display of hierarchies takes up large amounts of screen space tightly coupled with queries? tightly coupled with queries? Other starting points Other starting points

Carol Butler Fall 98 Overviews in the User Interface Unsupervised Groupings Unsupervised Groupings Clustering Clustering Kohonen Feature Maps Kohonen Feature Maps Supervised Categories Supervised Categories Yahoo! Yahoo! Superbook Superbook HiBrowse HiBrowse Cat-a-Cone Cat-a-Cone Combinations Combinations DynaCat DynaCat SONIA SONIA

Carol Butler Fall 98 Category Labels (from Hearst slide) Advantages: Advantages: Interpretable Interpretable Capture summary information Capture summary information Describe multiple facets of content Describe multiple facets of content Domain dependent, and so descriptive Domain dependent, and so descriptive Disadvantages Disadvantages Do not scale well (for organizing documents) Do not scale well (for organizing documents) Domain dependent, so costly to acquire Domain dependent, so costly to acquire May mis-match users’ interests May mis-match users’ interests

Carol Butler Fall 98 Other Starting Points Approaches Co-citation Links Co-citation Links Examples, Guided Tours Examples, Guided Tours

Carol Butler Fall 98 Review and Discussion (cont..) Interface Design Interface Design Visualization Visualization textual vs. 2D spatial representation textual vs. 2D spatial representation Search Strategies Search Strategies integration with non-search parts of process (reading, annotating, analysis) integration with non-search parts of process (reading, annotating, analysis) Evaluation Methodology Evaluation Methodology