SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.

SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000

Last Time l Starting Points for Search –Lists –Overviews »Categories

Today and Next Time l Starting points (cont) –Clusters –Examples as starting points –Automated Source Selection l UIs for Query Specification l UIs for Putting Results in Context l UIs to support the Search Process

Starting Points for Search l Faced with a prompt or an empty entry form … how to start? –Lists of sources –Overviews »Clusters »Category Hierarchies/Subject Codes »Co-citation links –Examples, Wizards, and Guided Tours –Automatic source selection

Category Combinations l HiBrowse Problem: –Search is not integrated with browsing of categories –Only see the subset of categories selected (and the corresponding number of documents)

Cat-a-Cone: Multiple Simultaneous Categories l Key Ideas: –Separate documents from category labels –Show both simultaneously l Link the two for iterative feedback l Distinguish between: –Searching for Documents vs. –Searching for Categories

Cat-a-Cone Interface

Cat-a-Cone l Catacomb: (definition 2b, online Websters) “A complex set of interrelated things” l Makes use of earlier PARC work on 3D+animation: Rooms Henderson and Card 86 IV: Cone Tree Robertson, Card, Mackinlay 93 Web Book Card, Robertson, York 96

Category Hierarchy browse

search Category Hierarchy

Collection Retrieved Documents search Category Hierarchy query terms

Collection Retrieved Documents search Category Hierarchy browse query terms

ConeTree for Category Labels l Browse/explore category hierarchy –by search on label names –by growing/shrinking subtrees –by spinning subtrees l Affordances –learn meaning via ancestors, siblings –disambiguate meanings –all cats simultaneously viewable

Virtual Book for Result Sets –Categories on Page (Retrieved Document) linked to Categories in Tree –Flipping through Book Pages causes some Subtrees to Expand and Contract –Most Subtrees remain unchanged –Book can be Stored for later Re-Use

Improvements over Standard Category Interfaces Integrate category selection with viewing of categories Integrate category selection with viewing of categories Show all categories + context Show all categories + context Show relationship of retrieved documents to the category structure Show relationship of retrieved documents to the category structure

Text Clustering l Finds overall similarities among groups of documents l Finds overall similarities among groups of tokens l Picks out some themes, ignores others

S/G Example: query on “star” Encyclopedia text 14 sports 8 symbols47 film, tv 68 film, tv (p) 7 music 97 astrophysics 67 astronomy(p)12 steller phenomena 10 flora/fauna 49 galaxies, stars 29 constellations 7 miscelleneous Clustering and re-clustering is entirely automated

Using Clustering in Document Ranking l Cluster entire collection l Find cluster centroid that best matches the query l This has been explored extensively –it is expensive –it doesn’t work well

Two Queries: Two Clusterings AUTO, CAR, ELECTRICAUTO, CAR, SAFETY The main differences are the clusters that are central to the query 8 control drive accident … 25 battery california technology … 48 import j. rate honda toyota … 16 export international unit japan 3 service employee automatic … 6 control inventory integrate … 10 investigation washington … 12 study fuel death bag air … 61 sale domestic truck import … 11 japan export defect unite …

Another use of clustering l Use clustering to map the entire huge multidimensional document space into a huge number of small clusters. l “Project” these onto a 2D graphical representation –Group by doc: SPIRE/Kohonen maps –Group by words: Galaxy of News/HotSauce/Semio

Clustering Multi-Dimensional Document Space (image from Wise et al 95)

Kohonen Feature Maps on Text (from Chen et al., JASIS 49(7))

UWMS Data Mining Workshop Study of Kohonen Feature Maps l H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7) l Comparison: Kohonen Map and Yahoo l Task: –“Window shop” for interesting home page –Repeat with other interface l Results: –Starting with map could repeat in Yahoo (8/11) –Starting with Yahoo unable to repeat in map (2/14)

UWMS Data Mining Workshop Study (cont.) l Participants liked: –Correspondence of region size to # documents –Overview (but also wanted zoom) –Ease of jumping from one topic to another –Multiple routes to topics –Use of category and subcategory labels

UWMS Data Mining Workshop Study (cont.) l Participants wanted: –hierarchical organization –other ordering of concepts (alphabetical) –integration of browsing and search –corresponce of color to meaning –more meaningful labels –labels at same level of abstraction –fit more labels in the given space –combined keyword and category search –multiple category assignment (sports+entertain)

Visualization of Clusters –Huge 2D maps may be inappropriate focus for information retrieval »Can’t see what documents are about »Documents forced into one position in semantic space »Space is difficult to use for IR purposes »Hard to view titles –Perhaps more suited for pattern discovery »problem: often only one view on the space

Summary: Clustering l Advantages: –Get an overview of main themes –Domain independent l Disadvantages: –Many of the ways documents could group together are not shown –Not always easy to understand what they mean –Different levels of granularity

Automated Source Selection l Compare the query against summaries of what is contained in the collection –GLOSS (Tomasic et al. 97) »Predict which of several sources is most likely »Based on how many instances of each query term occurs in the collection –SavvySearch (Howe & Dreilinger 97, in reader) »Predict which of several search engines is likely to produce a good answer to a given query »Based on number of pages returned and amount of time users spend on retrieved pages

Query Specification

l Interaction Styles (Shneiderman 97) –Command Language –Form Fillin –Menu Selection –Direct Manipulation –Natural Language l Example: –How do each apply to Boolean Queries

Command-Based Query Specification command attribute value connector … –find pa shneiderman and tw user# l What are the attribute names? l What are the command names? l What are allowable values?

Form-Based Query Specification (Altavista)

Form-Based Query Specification (Melvyl)

Form-based Query Specification (Infoseek)

Direct Manipulation Spec. VQUERY (Jones 98)

Menu-based Query Specification (Young & Shneiderman 93)

Context

Putting Results in Context l Visualizations of Query Term Distribution –KWIC, TileBars, SeeSoft l Visualizing Shared Subsets of Query Terms –InfoCrystal, VIBE, Lattice Views l Table of Contents as Context –Superbook, Cha-Cha, DynaCat l Organizing Results with Tables –Envision, SenseMaker l Using Hyperlinks –WebCutter

Putting Results in Context l Interfaces should –give hints about the roles terms play in the collection –give hints about what will happen if various terms are combined –show explicitly why documents are retrieved in response to the query –summarize compactly the subset of interest

KWIC (Keyword in Context) l An old standard, ignored by internet search engines –used in some intranet engines, e.g., Cha-Cha

Display of Retrieval Results Goal: minimize time/effort for deciding which documents to examine in detail Idea: show the roles of the query terms in the retrieved documents, making use of document structure

TileBars v Graphical Representation of Term Distribution and Overlap v Simultaneously Indicate: –relative document length –query term frequencies –query term distributions –query term overlap

Query terms: What roles do they play in retrieved documents? DBMS (Database Systems) Reliability Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability Mainly about high-tech layoffs Example

Exploiting Visual Properties –Variation in gray scale saturation imposes a universal, perceptual order (Bertin et al. ‘83) –Varying shades of gray show varying quantities better than color (Tufte ‘83) –Differences in shading should align with the values being presented (Kosslyn et al. ‘83)

Key Aspect: Faceted Queries l Conjunct of disjuncts l Each disjunct is a concept –osteoporosis, bone loss –prevention, cure –research, Mayo clinic, study l User does not have to specify which are main topics, which are subtopics l Ranking algorithm gives higher weight to overlap of topics

Main Topic Context l Potential Problem with TileBars Given retrieved documents in which no query terms are well-distributed, The user does not know the context in which the query terms are used l Solution: Accompany with main topic display

TileBars Summary l Compact, graphical representation of term distribution for full text retrieval results –simultaneously display term frequency, distribution, overlap, and doc length –allow for simple user-determined ordering strategies l Part of a larger effort: user-centric, content-sensitive information access

TileBars Summary v Preliminary User Studies vusers understand them vfind them helpful in some situations vsometimes terms need to be disambiguated

SeeSoft: Showing Text Content using a linear representation and brushing and linking (Eick & Wills 95)

Query Term Subsets Show which subsets of query terms occur in which subsets of documents occurs in which subsets of retrieved documents

Other Approaches Show how often each query term occurs in retrieved documents –VIBE (Korfhage ‘91) –InfoCrystal (Spoerri ‘94) –Problems: »can’t see overlap of terms within docs »quantities not represented graphically »more than 4 terms hard to handle »no help in selecting terms to begin with

InfoCrystal (Spoerri 94)

VIBE (Olson et al. 93, Korfhage 93)

Superbook (Remde et al. 87) l Next-generation hyper-media book l Functions: –Word Lookup: »Show a list query words, stems, and word combinations –Table of Contents: Dynamic fisheye view of the hierarchical topics list »Search words can be highlighted here too –Page of Text: show selected page and highlighted search terms l Hypertext features linking through search words rather than page links

Superbook (http://superbook.bellcore.com/SB)

DynaCat (Pratt 97) l Decide on important question types in an advance –What are the adverse effects of drug D? –What is the prognosis for treatment T? l Make use of MeSH categories l Retain only those types of categories known to be useful for this type of query.

DynaCat (Pratt, Hearst, & Fagan 99)

DynaCat Study l Design –Three queries –24 cancer patients –Compared three interfaces »ranked list, clusters, categories l Results –Participants strongly preferred categories –Participants found more answers using categories –Participants took same amount of time with all three interfaces

Cha-Cha (Chen & Hearst 98) l Shows “table-of-contents”-like view, like Superbook l Takes advantage of human-created structure within hyperlinks to create the TOC

Supporting the Process l Interfaces to support the process of information seeking –Standard Model »Infogrid »Superbook –Berry Picking Model »SketchTrieve »DLITE –Retaining Search History

How to Present the Search Process? l What sequence of operations is allowed? l Which GUI layout style is used? –One window –Overlapping windows –Tiled windows –Monolithic layout »One big window containing specialized internal windows that always occupy the same position and function

Slide by Shankar Raman l A general search interface architecture –Itemstash -- retrieved docs –Search Event -- current query –History -- history of queries –Result Item -- view selected docs + metadata InfoGrid/Protofoil (Rao et al. 92)

Infogrid (design mockup) (Rao et al. 92)

Infogrid Design Mockups (Rao et al. 92)

Protofoil (Rao et al. 94)

Monolithic Layouts Protofoil Layout (Hypothetical)Superbook Layout

l Experimented with many variations of the layout and interaction sequence. –Several studies have shown that too many different options are worse than an interface that is too restrictive. l Considered different screen sizes –Monolithic layout favored, however... –Sequence of interactions is what matters –Smaller screen can force designers to consider the interaction sequence carefully SuperBook (Egan et al. 89)

Supporting the Information Seeking Process Two recent similar approaches that focus on supporting the process –SketchTrieve (Hendry & Harper 97) –DLITE (Cousins 97)

Informal Interface l Informal does not necessarily mean less useful l Show how the search is –unfolding or evolving –expanding or contracting l Prompt the user to –reformulate and abandon plans –backtrack to points of task deferral –make side-by-side comparisons –define and discuss problems

Slide by Shankar Raman DLITE l UI to a digital library l Direct manipulation interface to a distributed info. system – must show network, remote server status l Workcenter approach –lots of handy tools for one task –experts create workcenters –contents persistent –concurrently shareable across sites l Web browser used to display document or collection metadata

Slide by Shankar Raman DLITE (Cousins 97) l Drag and Drop interface l Reify queries, sources, retrieval results l Animation to keep track of activity

Slide by Shankar Raman Components/tools in DLITE l Documents (search results, or local documents) l Collections of components (e.g. result sets) l Queries -- translator used to apply same query to many sources l Services -- search services, summarization, OCR, translation … l People (for access control, payment …)

Slide by Shankar Raman Interaction Pointing at object brings up tooltip -- metadata Activating object -- component specific action –5 types for result set component Drag-and-drop data onto program Animation used to show what happens with drag-and-drop (e.g. “waggling”)

Slide by Shankar Raman Comments l Users seem to have lots of problem with flexibility (result set icon activation) l Workcenter -- customization, acts as reminder l Animation used to track progess, (partial) results

Keeping Track of History l Examples –List of prior queries and results (standard) –Graphical hierarchy for web browsing –“Slide sorter” view, snapshots of earlier interactions

Slide by Shankar Raman PadPrints (Hightower et al. 98) l Tree-based history of recently visited web-pages history map placed to left of browser window l Zoomable, can shrink sub-hierarchies] l Node = title + thumbnail

PadPrints (Hightower et al. 98)

Slide by Shankar Raman l 13.4% unable to find recently visited pages only 0.1% use History button, 42% use Back l problems with history list (according to authors) –incomplete, lose out on every branch –textual (not necessarily a problem! ) –pull down menu cumbersome -- cannot see history along with current document Initial User Study of PadPrints

Slide by Shankar Raman Second User Study of Padprints l Changed the task to involve revisiting web pages –CHI database, National Park Service website l Only correctly answered questions considered –20-30% fewer pages accessed –faster response time for tasks that involve revisiting pages –slightly better user satisfaction ratings

Summary: UIs for Information Access l The part of the system that the user sees and interacts with l Better interfaces in future should produce better search experiences l UIs for search should –Help users keep track of what they have done –Suggest next choices –Support the process of search l It is very difficult to design good UIs l It is very difficult to evaluate search UIs

SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.

Similar presentations

Presentation on theme: "SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000.

Similar presentations

Presentation on theme: "SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000."— Presentation transcript:

Similar presentations

About project

Feedback