Presentation is loading. Please wait.

Presentation is loading. Please wait.

SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98.

Similar presentations


Presentation on theme: "SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98."— Presentation transcript:

1 SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98

2 Carol Butler Fall 98 Outline IA Interfaces IA Interfaces Design Principles Design Principles Aids for Source Selection Aids for Source Selection SavvySearch SavvySearch HITS HITS Kohonen maps Kohonen maps Implications for New Research Implications for New Research

3 Carol Butler Fall 98 IA Interface should help User: Express information needs and/or formulate queries. Express information needs and/or formulate queries. Select among available sources. Select among available sources. Understand search results. Understand search results. From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

4 Carol Butler Fall 98 IA Interface should allow User to: Reassess goals and adjust search strategy. Reassess goals and adjust search strategy. Follow trails with unanticipated results. Follow trails with unanticipated results. Monitor the progress of a search strategy. Monitor the progress of a search strategy. Use output of one action as input to the next. Use output of one action as input to the next. From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

5 Carol Butler Fall 98 Role of Visualization: Communicate more rapidly and effectively. Communicate more rapidly and effectively. Techniques Techniques icons and color highlighting icons and color highlighting brushing and linking brushing and linking panning and zooming panning and zooming focus-plus-context focus-plus-context animation animation Interactivity Interactivity From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

6 Carol Butler Fall 98 “Visualization of inherently abstract information is more difficult, and visualization of textually represented information is especially challenging.” From: User Interfaces and Visualization, by Marti A. Hearst, 1998.

7 Carol Butler Fall 98 Starting Points for Search Lists of sources (Lexis-Nexis) Lists of sources (Lexis-Nexis) Overviews Overviews Clusters Clusters Category Hierarchies/Subject Codes Category Hierarchies/Subject Codes Co-citation Links Co-citation Links Examples Examples Automatic source selection Automatic source selection

8 Carol Butler Fall 98 Last Week’s Readings Overviews via Category Hierarchies Overviews via Category Hierarchies HIBROWSE (Pollitt 97) HIBROWSE (Pollitt 97) Cat-A-Cone (Hearst 97) Cat-A-Cone (Hearst 97)

9 Carol Butler Fall 98 Today’s Readings Automatic Source Selection Automatic Source Selection SavvySearch (Howe & Dreilinger 97) SavvySearch (Howe & Dreilinger 97) Overviews via co-citation hyperlinks Overviews via co-citation hyperlinks HITS (Kleinberg et al. 97) HITS (Kleinberg et al. 97) Overviews via clusters Overviews via clusters Kohonen maps (Chen et al. 97) Kohonen maps (Chen et al. 97)

10 Carol Butler Fall 98 SavvySearch Addresses problems with meta-search engines. Addresses problems with meta-search engines. reduce burden on user … but reduce burden on user … but may waste computational and Web resources may waste computational and Web resources Carefully selects search engines likely to return useful results. Carefully selects search engines likely to return useful results.

11 Carol Butler Fall 98 Options provided by interface Sources and types of information. Sources and types of information. Treatment of query terms. Treatment of query terms. Display of results. Display of results. Interface language. Interface language. View interface. View interface. View interface View interface

12 Carol Butler Fall 98 Query Processing Reasoning about available resources Reasoning about available resources modify concurrency (number of search engines queried in parallel) modify concurrency (number of search engines queried in parallel) network load estimates (lookup table, time) network load estimates (lookup table, time) local CPU load (UNIX uptime command) local CPU load (UNIX uptime command) Ranking search engines Ranking search engines learned associations between search engines and query terms (stored in a meta- index) learned associations between search engines and query terms (stored in a meta- index) recent data on performance recent data on performance

13 Carol Butler Fall 98 Meta-Index No Results No Results search engine failed to return links search engine failed to return links reduces confidence that this engine is appropriate for particular query reduces confidence that this engine is appropriate for particular query effectiveness values are reduced effectiveness values are reduced Visits Visits number of links explored by user number of links explored by user indicates user found some links to be interesting and increases confidence indicates user found some links to be interesting and increases confidence

14 Carol Butler Fall 98 Future Development Meta-search will need to be personalized and embedding in other systems. Meta-search will need to be personalized and embedding in other systems. Experimental version divides search into categories, with separate sets of rules for creating a search plan. Experimental version divides search into categories, with separate sets of rules for creating a search plan. Experimental version Experimental version Web Indexes Web Directories Usenet News Software People Reference Entertainment Technical Reports

15 Carol Butler Fall 98 Hyperlink-Induced Topic Search (HITS) System for locating authoritative web sources System for locating authoritative web sources Two premises: Two premises: Implicit annotation provided by creators of hyperlinks contains sufficient information to infer a notion of “authority. Implicit annotation provided by creators of hyperlinks contains sufficient information to infer a notion of “authority. Sufficiently broad topics contain embedded communities of hyperlinked pages. Sufficiently broad topics contain embedded communities of hyperlinked pages.

16 Carol Butler Fall 98 HITS Two types of pages Two types of pages Authorities Authorities highly referenced pages on the topic highly referenced pages on the topic Hubs Hubs pages that “point” to many of the authorities pages that “point” to many of the authorities Mutually reinforcing relationships Mutually reinforcing relationships Starts from a user-supplied query Starts from a user-supplied query

17 Carol Butler Fall 98 HITS method Base set of pages returned by search engine Base set of pages returned by search engine Add pages that point to, or are pointed to by, any page in base set Add pages that point to, or are pointed to by, any page in base set Assign each page a hub weight h(p) and authority weight a(p) (initialize to 1) Assign each page a hub weight h(p) and authority weight a(p) (initialize to 1) For each page: For each page: Replace a(p) by the sum of the h()’s of all pages pointing to it Replace a(p) by the sum of the h()’s of all pages pointing to it Replace h(p) by the sum of the a()’s of all pages pointed to by it Replace h(p) by the sum of the a()’s of all pages pointed to by it Repeat Repeat

18 Carol Butler Fall 98 HITS results Broad topics tend to have robust structure Broad topics tend to have robust structure astrophysics astrophysics Michael Jordan Michael Jordan Generalizes topics not sufficiently broad Generalizes topics not sufficiently broad Dennis Ritchie Dennis Ritchie Density of linkage on a topic influences authority/hub structure Density of linkage on a topic influences authority/hub structure English literature vs. German literature English literature vs. German literature Web-centric topics Web-centric topics cryptography cryptography Commercialization Commercialization tennis tennis

19 Carol Butler Fall 98 Future Development Study temporal evolution of communities on the Web. Study temporal evolution of communities on the Web. Combining text and the structure of hyperlinks. Combining text and the structure of hyperlinks. text within text within text near hyperlink text near hyperlink CLEVER project at IBM Almaden Research Center CLEVER project at IBM Almaden Research Center CLEVER

20 Carol Butler Fall 98 Automatically Generated Concept Space (Kohonen map and ET-Space Thesaurus) IR users need: IR users need: Working knowledge of the system where the information is stored Working knowledge of the system where the information is stored how to navigate how to navigate how info is categorized or organized how info is categorized or organized Knowledge of the subject of interest Knowledge of the subject of interest particularly the vocabulary of the subject domain particularly the vocabulary of the subject domain

21 Carol Butler Fall 98 Browsing vs. Searching Browsing Browsing users rely on mental models users rely on mental models embedded digression problem embedded digression problem Searching Searching content-based content-based two basic approaches two basic approaches keyword search keyword search combined keyword search and categorization combined keyword search and categorization vocabulary differences problem vocabulary differences problem

22 Carol Butler Fall 98 User Aids for Browsing Directories Directories categories limited in granularity categories limited in granularity categories limited in timeliness categories limited in timeliness creating categories is manual, slow, and cumbersome creating categories is manual, slow, and cumbersome Kohonen self-organizing map (SOM) Kohonen self-organizing map (SOM) generates clusters of important concepts generates clusters of important concepts

23 Carol Butler Fall 98 Concept “Landscapes” Pharmacology Anatomy Legal Disease Hospitals Built using Kohonen Feature Maps Xia Lin, H.C. Chen slide by Marti Hearst

24 Carol Butler Fall 98 User Aids for Searching Query expansion Query expansion Relevance feedback Relevance feedback Multidimensional scaling Multidimensional scaling metric similarity modeling metric similarity modeling latent semantic indexing latent semantic indexing Thesauri use Thesauri use incorporating existing thesauri incorporating existing thesauri automatic thesaurus generation automatic thesaurus generation

25 Carol Butler Fall 98 Automatic Thesaurus Generation Statistical co-occurrence Statistical co-occurrence Cluster analysis further groups terms Cluster analysis further groups terms Chen et al. Chen et al. document collection document collection automatic indexing automatic indexing co-occurrence analysis co-occurrence analysis associative retrieval associative retrieval Et-Space Webpage Et-Space Webpage Et-Space Webpage Et-Space Webpage

26 Carol Butler Fall 98 Experiment with Yahoo Browsing tested with Kohonen SOM Browsing tested with Kohonen SOM subjects who started with Yahoo were less successful in repeating the task with the SOM than vice versa subjects who started with Yahoo were less successful in repeating the task with the SOM than vice versa useful more for broad exploring than for searching useful more for broad exploring than for searching Searching tested with AGT Searching tested with AGT suggested terms came from web pages suggested terms came from web pages most useful in further refining an initially too broad search most useful in further refining an initially too broad search

27 Carol Butler Fall 98 Future Development Effects of different information sources Effects of different information sources cohesion cohesion consistent with user’s mental model consistent with user’s mental model User Interface design User Interface design flexibility flexibility spelling errors and typos spelling errors and typos pan-zoom pan-zoom help screens or instructions (or more intuitive design, or both) help screens or instructions (or more intuitive design, or both)

28 Carol Butler Fall 98 Review and Discussion Overviews Overviews Category Labels Category Labels when docs stored “inside” categories, users cannot create queries based on combinations of categories when docs stored “inside” categories, users cannot create queries based on combinations of categories display of hierarchies takes up large amounts of screen space display of hierarchies takes up large amounts of screen space tightly coupled with queries? tightly coupled with queries? Other starting points Other starting points

29 Carol Butler Fall 98 Overviews in the User Interface Unsupervised Groupings Unsupervised Groupings Clustering Clustering Kohonen Feature Maps Kohonen Feature Maps Supervised Categories Supervised Categories Yahoo! Yahoo! Superbook Superbook HiBrowse HiBrowse Cat-a-Cone Cat-a-Cone Combinations Combinations DynaCat DynaCat SONIA SONIA

30 Carol Butler Fall 98 Category Labels (from Hearst slide) Advantages: Advantages: Interpretable Interpretable Capture summary information Capture summary information Describe multiple facets of content Describe multiple facets of content Domain dependent, and so descriptive Domain dependent, and so descriptive Disadvantages Disadvantages Do not scale well (for organizing documents) Do not scale well (for organizing documents) Domain dependent, so costly to acquire Domain dependent, so costly to acquire May mis-match users’ interests May mis-match users’ interests

31 Carol Butler Fall 98 Other Starting Points Approaches Co-citation Links Co-citation Links Examples, Guided Tours Examples, Guided Tours

32 Carol Butler Fall 98 Review and Discussion (cont..) Interface Design Interface Design Visualization Visualization textual vs. 2D spatial representation textual vs. 2D spatial representation Search Strategies Search Strategies integration with non-search parts of process (reading, annotating, analysis) integration with non-search parts of process (reading, annotating, analysis) Evaluation Methodology Evaluation Methodology


Download ppt "SIMS 296a-3: Aids for Source Selection Carol Butler Fall ‘98."

Similar presentations


Ads by Google