Presentation is loading. Please wait.

Presentation is loading. Please wait.

Document Collections cs5984: Information Visualization Chris North.

Similar presentations


Presentation on theme: "Document Collections cs5984: Information Visualization Chris North."— Presentation transcript:

1 Document Collections cs5984: Information Visualization Chris North

2 Where are we? Multi-D 1D 2D Hierarchies/Trees Networks/Graphs Document collections 3D Design Principles Empirical Evaluation Java Development Visual Overviews Multiple Views Peripheral Views

3 Structured Document Collections Multi-dimensional author, title, date, journal, … Trees dewey decimal Networks web, citations

4 Envision Ed Fox, et al. Multi-D similar to Spotfire

5 Unstructured Document Collections Focus on Full Text Examples: digital libraries, encyclopedia Web, homepages, photo collections Tasks: search, keyword Browse Themes, subjects, topics, library coverage Size, distributions

6 Visualization Strategies Cluster Maps Keyword Query Relationships Reduced representation User controlled layout today

7 Cluster Map Create a “map” of the document collection Similar documents near Dissimilar document far “Grocery store” concept

8 Document Vectors Doc1Doc2Doc3 … “aardvark”120 “banana”210 “chris”003 … Similarity between pair of docs = Layout documents in 2-D map by similarity similar to spring model for graph layout

9 Cluster Algorithms Partition clustering: Partition into k subsets Pick k seeds Iteratively attract nearest neighbors Hierarchical clustering: Dendrogram Group nearest-neighbor pair Iterate

10 Kohonen Maps Xia Lin, “Document Space” samal, ying http://faculty.cis.drexel.edu/sitemap/index.html

11

12 Themescapes, Cartia PNL Mountain height = Cluster size

13 WebSOM http://websom.hut.fi/websom/

14 Map.net http://maps.map.net/start

15 Cluster Map Good: Map of collection Major themes and sizes Relationships between themes Scales up Bad: Where to locate documents with multiple themes? »Both mountains, between mountains, …? Relationships between documents, within documents? Algorithm becomes (too) critical

16 Keyword Query Keyword query, Search engine Rank ordered list “Information Retrieval”

17 Tilebars Hearst, “Tilebars” reenal, xueqi http://elib.cs.berkeley.edu/tilebars/

18 VIBE Korfhage, http://www.pitt.edu/~korfhage/interfaces.htmlhttp://www.pitt.edu/~korfhage/interfaces.html Documents located between query keywords using spring model

19 VR-VIBE

20 Keyword Query Good: Reduces the browsing space Map according to user’s interests Bad: What keywords do I use? What about other related documents that don’t use these keywords? No initial overview Mega-hit, zero-hit problem

21 Assignment Thurs: Document Collections Bederson, “Image Browsing” » Rui, anusha Card, “Web Book and Web Forager” » mrinmayee, ming Demo your hw3: tues or thurs

22 Next Week Tues: 3-D data Kniss, “Interactive Volume Rendering with Direct Manip” » xueqi, mahesh Thurs: Workspaces Robertson, “Task Gallery” » supriya, varun Upson, “AVS” » christa, jun Thanksgiving break Tues 27: Debates Kobsa, “Empirical comparison of comm infovis systems” » kunal, zhiping

23 Upcoming Sched Tues: 3-D data Thurs: Workspaces Thanksgiving break Tues 27: Debates Thurs 29: How (not) to lie with visualization Dec: project presentations Dec 7: CHI 2-pagers due, student posters due


Download ppt "Document Collections cs5984: Information Visualization Chris North."

Similar presentations


Ads by Google