Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014.

Similar presentations


Presentation on theme: "Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014."— Presentation transcript:

1 Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

2 Introduction Text is ubiquitous –Documents, and more generally text, are a primary information source (Verbal has its place!) –Access to documents and text has grown exponentially in recent years due to networking infrastructure WWW Digital libraries Social media Visualization to aid users in understanding and gathering information from text and document collections

3 Introduction Visualization can aid in performing tasks For example: –Which documents contain text on topic XYZ? –Which documents are of interest to me? –Are there other documents that are similar to this one (so they are worthwhile)? –How are different words used in a document or a document collection? –What are the main themes and ideas in a document or a collection? –Which documents have an angry tone? –How are certain words or themes distributed through a document? –Identify “hidden” messages or stories in this document collection. –How does one set of documents differ from another set? –Quickly gain an understanding of a document or collection in order to subsequently do XYZ. –Understand the history of changes in a document. –Find connections between documents. From Stasko, 2013

4 Introduction Challenges of Text Visualization Text is unlike other data types seen so far, for example Context and Semantics –Context relevant to understanding and meaning –Indeed, natural language understanding a challenge of the n th + 1 century Dimensionality –Inherently, “not dimensional”, so must create “visually realizable” visual encoding –Often, first step is n-D, then 2- or 3-D Modeling Abstraction –Consider level of “understanding” require for task –Match analysis task with appropriate tools and models

5 Introduction Related topics Information Retrieval –Active search process that brings back particular/specific items (will discuss that some today, but not always focus) –InfoVis and HCI can help some… Visualization may be most useful when not sure precisely what you’re looking for when retrieving information –More of a browsing paradigm than a search one –But, this is part of the information retrieval task Define information need, formulate “query”, examine/evaluate results, … repeat Sensemaking –Gaining better understanding of facts at hand in order to take some next steps A principle focus in visual analytics –Visualization can help make large document collection more understandable more rapidly Which is good: “Overview, zoom and filter, details on demand”

6 Recall, Visualization Pipeline: Visualization Stages Data transformations: –Map raw data (idiosynchratic form) into data tables (relational descriptions including metatags) Text is nominal data –A word, or any text unit, does not map easily to any quantitative representation! –The “Raw data --> Data Table” mapping is a principle element of creating any visual representation How do you get numbers from words, sentences, …?? –Will see several solutions Raw Information Visual Form Dataset Views User - Task Data Transformations Visual Mappings View Transformations F F -1 Interaction Visual Perception

7 Recall, Visualization Pipeline: Visualization Stages Visual Mappings: –Transform data tables into visual structures that combine spatial substrates, marks, and graphical properties And … visual mappings, as well, requires at least “the usual level” of creativity Raw Information Visual Form Dataset Views User - Task Data Transformations Visual Mappings View Transformations F F -1 Interaction Visual Perception

8 Understanding Text Content

9 Visual representations of words, phrases, and sentences –Main goal of understanding, versus search Visual presentation always part of text presentation – –Standard typography uses layout, font, style, color … –Electronic media, especially – pick a web page –“Single text content”

10 Single Text Content Word Counts 2012 National Conventions NY Times: http://www.nytimes.com/interactive/2012/08/28/us/politics/convention-word-counts.htmlhttp://www.nytimes.com/interactive/2012/08/28/us/politics/convention-word-counts.html

11 Tag / Word Clouds

12 Lots of popular interest –E.g., on web Idea is to show word/concept importance through visual means –Tags: User-specified metadata (descriptors) about something –Sometimes generalized to just reflect word frequencies Not a new technique –Milgram’s ‘76 experiment to have people label landmarks in Paris –Flanagan’s ‘97 “Search referral Zeitgeist” –Fortune’s ‘01 Money Makes the World Go Round

13 Tag / Word Clouds Example: US State of the Union Speeches Guardian http://www.guardian.co. uk/news/datablog/2011/j an/25/state-of-the- union-text-obama#http://www.guardian.co. uk/news/datablog/2011/j an/25/state-of-the- union-text-obama# http://image.guardian.co.uk/sys- files/Guardian/documen ts/2011/01/26/State_of_ the_union_2011.pdf?gu ni=Graphic:in body link

14 Flickr Tag Cloud

15 delicious Tag Cloud

16 Alternate Order

17 Many Eyes Tag Cloud Word pairs

18 Wordle

19 Wordle “Beautiful Word Clouds”, http://www.wordle.net/http://www.wordle.net/ Tightly packed words –Horizontal, vertical or diagonal Size correlated with frequency Multiple color palettes User gets some control Layout Algorithm –Details not published –Sort words by weight, decreasing order for each word –Init position randomly chosen according to distribution for target shape –Update position moves out radially

20 Wordle “Beautiful Word Clouds”, http://www.wordle.net/http://www.wordle.net/ Course schedule, table of topics, and assignments

21 Wordle “Beautiful Word Clouds”, http://www.wordle.net/http://www.wordle.net/ Course schedule, table of topics, and assignments

22 Wordle “Beautiful Word Clouds”, http://www.wordle.net/http://www.wordle.net/ Course schedule, table of topics, and assignments

23 Can be many variations … A bit more order Order the words more by frequency

24 Mani-Wordle User control Mani-Wordle –Start with nice default algorithm –Give user more control over design Alter color (within a palette) Pin words, redo the rest Move and rotate words –http://www.cg.tuwien.ac.at/courses/InfoVis/HallOfFame/2012/Gruppe03/Homepage/index.html –Koh et al TVCG (InfoVis) ‘10

25 Tag / Word Clouds Conclusions Weaknesses –Sub-optimal visual encoding (size vs. position) –Inaccurate size encoding (long words are bigger) –Font sizes are hard to compare –May not facilitate comparison (unstable layout) –Word frequency may not be meaningful Most use words vs. stems –Does not show structure of the text –Studies have even shown they underperform (Gruen et al CHI ’06) Why so popular? –OK for “quick look” –Serve as social signifiers that provide a friendly atmosphere that provide a point of entry into a complex site –Act as individual and group mirrors –Fun, not business-like

26 BTW - Text Analysis Tools voyeur: http://voyeurtools.org/http://voyeurtools.org/ Book + tools for text analysis and visualization

27 BTW - Text Analysis Tools voyeur: http://voyeurtools.org/http://voyeurtools.org/.

28

29 Visualization and Information Retrieval Examples so far have focused on representing a single document –…, or, really, set of words as no consideration of even word order, let alone sentence structure, etc. Principle question is how might visual representations aid text, or document, search –I.e., how to find the proverbial needle in a haystack, where the haystack is all the documents on the www or a digital library –Term information retrieval refers to this search and its history antedates computers IR entails: –Determine information need –Query formulation –Retrieval –Assessment of results –Reformulation of query or even information need –Repeat (until information need met)

30 Visualization and Information Retrieval … IR entails: –Determine information need –Query formulation –Retrieval –Assessment of results –Reformulation of query or even information need –Repeat (until information need met) Provide visual representations that during this process –Document collection visually, support browsing, … Even for determining information need! –Show query results visually –Show how query terms relate to results –… any aspect

31 Visualization and Information Retrieval Provide visual representations that during this process –Document collection visually, support browsing, … Even for determining information need! –Show query results visually –Show how query terms relate to results –… any aspect From Stasko, 2013

32 Evaluating Query Results TileBars, Hearst, 1996 Hearst points out that query responses do not include: –How strong the match is –How frequent each term is –How each term is distributed in the document –Overlap between terms –Length of document Document ranking is opaque Inability to compare between results Input limits term relationships

33 TileBars Overview Goal : Minimize time and effort for deciding which documents to view in detail Show the role of the query terms in the retrieved documents, making use of document structure Graphical representation of term distribution and overlap Simultaneously indicate: –Relative document length –Frequency of term sets in document –Distribution of term sets with respect to the document and each other From Stasko, 2013

34 TileBars Screen TileBars screen: From Stasko, 2013

35 TileBars Document representation Visual representation of retrieved documents Video: TileBars-80mb-chi96_05_m1.mpeg From Stasko, 2013

36 TileBars Video

37 TileBars Conclusions Clearly visually provides the information intended about each document Ease/effort/time of comparison? –Surely would improve with use … ?

38 Evaluating Query Results Sparkler Abstract result documents more –Havre et al InfoVis ‘01 Show “distance” from query in order to give user better feel for quality of match(es) Also shows documents in responses to multiple queries Visualizing One Query –Triangle – query –Square – document Distance between query and documents represents their relevance From Stasko, 2013

39 Sparkler Visualizing Multiple Queries Six queries here Bullseye allows viewer to select quality results From Stasko, 2013

40 Sparkler Test Example Text Retrieval Conference (TREC-3) test document collection AP news stories from June 24–30, 1990 TREC topic: Japan Protectionist Measures Sparkler found 16 of 17 relevant documents From Stasko, 2013

41 Evaluating Query Results RankSpiral Compare search results from different search engines –Spoerri InfoVis ’04 poster From Stasko, 2013

42 RankSpiral Color represents different search engines Compare search results from different search engines From Stasko, 2013

43 RankSpiral Color represents different search engines Compare search results from different search engines

44 Evaluating Query Results ResultMaps Treemap-style vis for showing query results in a digital library –Clarkson, Desai & Foley TVCG (InfoVis) ‘09 From Stasko, 2013

45 Representing Multiple Documents

46 Previously, have seen various techniques for comparing multiple documents that are results of query, i.e., a subset of all documents Also, may want to just show everything, and then let user do “manual search”, or user-directed search Such displays of all documents also support the type of search common in visual analytics –Query, browse, connect, drill-down Will see: –Parallel word clouds –Tree layout of synonyms –…

47 Multiple Documents Parallel Tags Clouds Tag clouds increase size of word as f(frequency) Showing multiple documents as tag clouds allows visual inspection –Automated and user directed, visual analytics Parallel Tag Clouds - name says it all –Video - Collins et al VAST ‘09 – different circuit courts –http://www.youtube.com/watch?v=rL3Ga6xBgLwhttp://www.youtube.com/watch?v=rL3Ga6xBgLw

48 Multiple Documents Do different district courts differ in cases they handle?.

49 .

50 Multiple Documents Counting Words: Overview & Timeline Ex., across speeches can count words State of the Union Addresses http://www.nyti mes.com/ref/wa shington/20070 123_STATEOF UNION.html?ini tialWord=iraqhttp://www.nyti mes.com/ref/wa shington/20070 123_STATEOF UNION.html?ini tialWord=iraq NY Times demo

51 Multiple Documents Counting Words: Overview & Timeline State of the Union Addresses http://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html?initialWord=iraq NY Times demo

52 Multiple Document Word Use DocuBurst Sets of synonyms grouped together –Uses WordNet –show words from a document in terms of their hypernym (ISA) links –Size – # of leaves in subtree –Hue – diff synsets of word – Shade – frequency of use Demo, etc. –http://vialab.science.uoit.ca/portfolio/docuburst- visualizing-document-content-using-language-structure

53 FeatureLens Show patterns of words or n-grams –Don et al. CIKM ‘07 Video

54 FeatureLens Show patterns of words or n-grams –Don et al. CIKM ‘07 Check Video

55 Combinations of words, phrases, and sentences

56 Multiple Sentences SeeSoft Display Originally for software visualization One line of text on each horizontal line Color highlight for attributes –E.g., for software, how often modified, days since modification –E.g., for text where a particular word appears in a sentence, Conversations might be revealed Detail view in pop up window

57 Multiple Sentences TextArc - Simple Single Document Visualization Visualize an entire book –Word appearances –Sentences –… –http://textarc.orghttp://textarc.org Sentences laid out on circumference in order of appearance in spiral Frequently occurring words inside spiral Selecting word draws line on to sentences with word –A kind of “visual concordance” Significant interaction

58 TextArc

59 Concordances and Word Frequencies From field of literary analysis Concordance –An alphabetical index of the principal words in a book or the works of an author with their immediate context Word of interest in center, with text in which appears to left and right As, KWIC –Key word in context

60 Word Tree Shows context of a word or words –Follow word with all the phrases that follow it Wattenberg & Viégas TVCG (InfoVis) ‘08 Font size shows frequency of appearance Continue branch until hitting unique phrase Clicking on phrase makes it the focus Ordered alphabetically, by frequency, or by first appearance

61 Word Tree Interaction

62 Word Tree From King James Bible From King James Bible

63 WordTree Many Eyes

64 Finding Structure: Phrase Nets

65 Find Structure: Phrase Nets Concordances show local, repeated structure of word context Phrase Nets In Many Eyes, van Ham et al. Other types of patterns –Lexical: at, and, at, (is|are|was|were) –Syntactic: Visualize extracted patterns in a node-link view –Occurrences -> Node size –Pattern position -> Edge direction

66 Phrase Net (larger next slide) Portrait of the Artist as a Young Man and

67 Phrase Net Portrait of the Artist as a Young Man and

68 Phrase Nets The Bible: begat

69 Phrase Nets Old and New Testaments: of

70 Phrase Nets ( and ) and ( at )

71 End.

72 References F. Viegas, M. Wattenberg, "Tag Clouds and the Case for Vernacular Visualization", interactions, Vol. 15, No. 4, Jul-Aug 2008, pp. 49-52.


Download ppt "Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014."

Similar presentations


Ads by Google