Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wise, Thomas, Pennock, Lantrip, Pottier, Schur, and Crow

Similar presentations


Presentation on theme: "Wise, Thomas, Pennock, Lantrip, Pottier, Schur, and Crow"— Presentation transcript:

1 Wise, Thomas, Pennock, Lantrip, Pottier, Schur, and Crow
Visualizing the Non-Visual Spatial Analysis and Interaction with Information from Text Documents Wise, Thomas, Pennock, Lantrip, Pottier, Schur, and Crow Presented By: Cyntrica Eaton

2

3 Presentation of documents where
similar ones cluster together Clicking on a star in Galaxies, the words that occur most in the document(s) appear on the screen To further explore, selecting a “docustar” retrieves the document header or complete text Enter queries on terms of interest and “docustar” of interest light up like “novas” on the screen.

4 Exhibit in the landscape reveal
intricate interconnection of themes the transformation of themes across the whole of the document corpus

5 Presentation Overview
Paper Description Contributions Current State Critique References

6 Paper Description Motivation Approach Visualization Paradigms Galaxies
Themescapes

7 MVAB Multidimensional Visualization and Advanced Browsing Project
Researchers at the Pacific Northwest National Laboratories were interested in solving the problem of information overload for Intelligence analysts. Large corpora: digital libraries, regulations and procedures, archived reports

8 Motivation Modern information technologies have contributed to an increased availability of information. Accompanying the increasing quantity of available information is a subsequently decreasing quantity of time to locate and absorb it. The ability to overview large document corpora and get information without the heavy cognitive processes involved in language processing will improve the search process. Peruse large amounts of text to detect and recognize informational patterns and pattern irregularities across various sources Market analysis Environmental assessment Law enforcement Intelligence for national security Enhance visual browsing and analysis Need to read and assess large amounts of text

9 Approach Problem of processing large amounts of text can be solved if text is spatialized in manner that takes advantage of human perceptual abilities. Visual processing take place in parallel on the retinal level and is: Relatively effortless Exceptionally fast Not additive to cognitive workload Prewired elements used to quickly build up components of complex visual images.

10 Approach Transform text into visualizations that:
Communicate through images instead of prose. Preserve information characteristics from documents. Represent textual content and meaning without the need to read it in the normal manner. Reveal thematic patterns and relationships between documents in ways in which the natural world is perceived. Enhance visual browsing and analysis Combat information overload Large corpora: digital libraries, regulations and procedures, archived reports For the purpose of perusing text, better to transform text information to a spatial representation which can be explore and processed by visual processes alone

11 SPIRE Spatial Paradigm for Information Retrieval and Exploration
Developed to facilitate the browsing and selection of documents from large corpora Two major approaches: Galaxies Themescapes (20,000 documents)

12 Galaxies and Themescapes
Display metaphor rationale: Each paradigm offers a rich variety of cognitive spatial affordances that naturally address the problems of text visualization. Spatial perceptual mechanisms that operate on the real world will respond analogously to synthetic cues.

13 Paradigm Overviews Galaxies Themescapes
Point clusters suggest patterns of interest Themescapes Topographies of peaks and valleys that can easily be detected based on contour patterns.

14 Paradigm Overviews Both allow for overview + detail without a change of view. Each view offers a different perspective of the same information.

15 Galaxies Two-dimensional scatterplot of ‘docupoints’ that appear like stars in the night sky. Computes word similarities and patterns in documents and communicates similarity via proximity. Provides a first cut at sifting through information and determining how the contents of a document base are related. context and content

16 Types Treatment Case Studies ….. Presentation of documents where
similar ones cluster together Clicking on a star in Galaxies, the words that occur most in the document(s) appear on the screen To further explore, selecting a “docustar” retrieves the document header or complete text Enter queries on terms of interest and “docustar” of interest light up like “novas” on the screen. Spatial representation reveals patterns and trends fundamental topics found within corpus and clicking on a cluster of interest reveals the documents within Temporal slicer When viewed in terms of known historical events and trends, these cluster patterns can provide insight into external causal relationships mirrored in the corpus. Types Treatment Case Studies …..

17 Types Treatment Case Studies ….. Presentation of documents where
similar ones cluster together Clicking on a star in Galaxies, the words that occur most in the document(s) appear on the screen To further explore, selecting a “docustar” retrieves the document header or complete text Enter queries on terms of interest and “docustar” of interest light up like “novas” on the screen. Spatial representation reveals patterns and trends fundamental topics found within corpus and clicking on a cluster of interest reveals the documents within Temporal slicer When viewed in terms of known historical events and trends, these cluster patterns can provide insight into external causal relationships mirrored in the corpus. Types Treatment Case Studies …..

18

19 Themescapes Three-dimensional relief map of themes within the document corpora themes. Complex surfaces convey information about topics or themes found within the corpus without cognitive load of reading Terrain simultaneously communicates: Primary themes of an arbitrarily large collection of documents. Measure of relevance in the corpus. Similarity of themes. Themescapes reads large collections of documents and organizes the content by topic as a  topographical map.

20 Themescapes Glance provides visual thematic summary of the entire corpus Elevation: Theme strength Shapes: Information distribution Proximity: Content Similarity The mountains in Themescape indicate where themes are dominant; valleys indicate weak themes. Their shapes--a broad butte or high pinnacle--reflect how the thematic information is distributed and related across documents Themes close in content will be close visually based on the many relationships within the text spaces.

21 Themescapes Utilizes human abilities for pattern recognition and spatial reasoning Employs communicative invariance across levels of textual scale Entire document corpus Cluster of documents Individual documents innate human Greatly expands bandwidth of communication between tool and user Themes close in content will be close visually based on the many relationships within the text spaces.

22

23 Summarization Reading is a slow, serial process of mentally encoding a document. Text visualizations can overcome much of the user limitations that result from accessing and trying to read from large document bases. One very interesting result has been the discovery of a limited set of visual properties that are processed preattentively, without the need for focused attention. Typically, tasks that can be performed on large multi-element displays in less than 200 to 250msec are considered preattentive. Eye movements take at least 200msec to initiate, and random locations of the elements in the display ensure that attention cannot be prefocused on any particular location, yet subjects report that these tasks can be completed with very little effort. This suggests that certain information in the display is processed in parallel by the low-level visual system. A simple example of a preattentive task is the detection of a red circle in a group of blue circles (see Figure 1 below).

24 Summarization Visual cues can offer readers a way to employ their primarily preattentive, parallel processing powers of visual perception. Galaxy and landscape metaphors allow the cognitive and visual processes that enable our spatial interactions with the natural world to be applied to the search process.

25 Contributions Prior visualization approaches offered methods for visualization of structured, hierarchical text. Free text visualization was relatively unexamined. MVAB Project produced novel methods for interaction with large amounts of text. : organization charts Directories Open text fields or raw prose candidates for visualization not obvious

26 Current Project Status
Correlation Tool WebTheme ThemeRiver Rainbow

27 Tool deployed apart of the SPIRE Info Viz suite Spatial Paradigm for Information Retrieval and Exploration Researchers can refine their search by using several built-in support functions inducing a document characterization or gisting tool, a work search tool, a time analyzers and an annotation tool. Correlation tool gives users the power to examine relationships between user defined variables

28 Web enable version of SPIRE provides a new way to investigate and understand large volumes of textual information Harvest data from WWW using search terms, or by following links derived form user specified URLs. Currently underdevelopment Initially created for U.S. Intelligence community

29 Help users identify time related patterns, trends, and relationships across large collection of documents Themes represented by a river that flows from left to right through time Widens or narrows to show changes in collective strength of themes in the doc Individual themes, colored currents flowing in the river Theme currents narrow or widen to indicate changes in individual theme strength at any point in time. Proof of concept prototype constructed

30 Love Tybalt Caesar Romeo
Portrays three different classes of relationships Entities are dots Location represents summarization of one kind of relationships among them Arcs above plane: white arc between two planes indicates that such a rel. exits between entities within the clusters….white arc can be expanded to show mult. Colors in spectrum indicating existence of a particular kind of relationship within this class User wants to see if relationships between to Arcs below the plane Interactive proof of concept prototype developed Correlation tool gives users the power to examine relationships between user defined variables Romeo

31 Critique The visualization paradigms were discussed in a straight-forward manner. There was, however, a deficiency of example figure explanations.

32 My Favorite Sentence [The] perceptual processes involved are the results of millions of years of selective mammalian and primate evolution, and have become biologically tuned to seeing in the natural world.

33 References Information Retrieval Information Visualization

34 Visualizing the Non-Visual Spatial Analysis and Interaction with Information from Text Documents
Questions?

35 Technical Considerations
Clear definition of text Way to transform text into a different visual form that retains high dimensional invariants of natural language. Suitable mathematical procedures and analytical measures must be defined as the foundation of the visualizations Database management system must be designed to store and manage text How can text be distinguished from everything else Yet better enables visual exploration and analysis Clear definition of text Text is written alphabetical form of natural languages (no diagrams or tables or other symbolic representations of language) Suitable mathematical procedures and analytical measures must be defined as the foundation of the visualizations Written text stored in digital form can be treated statistically to extract information about its content and context and semantic meaning Database management system must be designed to store and manage text Primarily comprised of textual material Text processing engine which transforms natural language from the the document database to spatial data. Other components GUI Display software Applications interface Auxiliary tools

36 Technical Considerations
Way to transform text into a different visual form that retains high dimensional invariants of natural language. Text has statistical and semantic attributes such as frequency and context and combination of words in themes and topics Differences between texts statistical and semantic compositions provide much of opportunity for text visualizations described in this paper. How can text be distinguished from everything else Yet better enables visual exploration and analysis

37 Approach A set of measures which characterize the text in meaningful ways provide for multiple perspective of documents and their relationships to one another. One measure is similarity Based on occurrences and context of key words or other extracted features measure of similarity can be computed that reflect relatedness between documents. In a visualization, similarity can be shown as proximity or congruity to form.


Download ppt "Wise, Thomas, Pennock, Lantrip, Pottier, Schur, and Crow"

Similar presentations


Ads by Google