Interactive Visualization of Large Graphs and Networks

Interactive Visualization of Large Graphs and Networks
Tamara Munzner Stanford University Computer Science Department

Contributions analysis of three software systems
relating intended tasks to spatial layout, visual encoding choices two novel layout/drawing algorithms scalable targeted This thesis has contributions in both analysis and algorithms. We present three case studies, and analyze them by relating the intended tasks to our choices for spatial layout and visual encoding. We also present two novel algorithms for layout and drawing. One is aimed at scalability, the other is highly targeted to be effective for a very specific task.

Three Visualization Systems
general domain specific graph drawing infovis H3 PM Const H3 web hyperlinks quasi-hierarchical Planet Multicast MBone tunnels find poorly placed Constellation parsed dictionaries refine algorithms The three case studies of interactive graph visualization systems fall onto a range of domain specificity, but all three are more targeted than most traditional graph drawing systems. The H3 system is the most general. It began as an effort to show the hyperlink structure of a web site, and it is well-suited for an entire class of graphs that we call “quasi-hierarchical”, which I’ll define later in the talk. The Planet Multicast system shows the tunnel structure of the Internet’s multicast backbone, or MBone, and was intended to help maintainers of the MBone find badly placed tunnels that were potentially wasting scarce bandwidth resources. It is more targeted than H3, but geographic network display is useful in many more contexts than this one. Constellation is the most targeted system of the three. The datasets that it displays are paths through a large semantic network created by parsing entire online dictionaries. The goal of the project was to help a small group of computational linguists refine their algorithms for creating and querying this large network by creating a specialized visualization system that incorporated a great deal of domain-specific information into its design. I present a detailed analysis of interactive graph viz systems aimed at three different domans

Talk Outline graph drawing, information visualization background
software systems goal previous work video discussion evaluation general discussion conclusion I’ll begin this talk with some background on graph drawing. I’ll then present each case study in turn, starting with the goal, showing the previous work for context, playing a video of the system, discussing some of the design issues, and evaluating its success. I’ll then discuss some general design issues that pertain to all three systems, and end with concluding remarks.

Graph Drawing automatic layout and drawing of node-link graphs
Hofstadter. Godel, Escher, Bach. Gansner and North. Improved force-directed layouts. The field of graph drawing is about the automatic layout and drawing of node-link graphs. Drawing graphs by hand on paper is difficult and time-consuming. The figure on the left from Godel, Escher, Bach is a heroic effort which we can regard as an upper bound on what’s possible to do by hand. The picture on the right was generated totally automatically by North and Gansner, using the program neato followed by a Voronoi-diagram based refinement pass. GEB: 150? nodes, 500+ edges?

Goal: help humans understand
aesthetic criteria minimize crossings expose structure: hierarchy, symmetry, circular The purpose of drawing a graph in this context is to help humans understand it, as opposed to some other goal like VLSI layout. In order to do this, researchers have proposed aesthetic criteria like minimizing the number of edge-edge and edge-node crossing, and built programs that attempt to expose underlying structure. The figures below show the Tom Sawyer Toolkits in action for hierarchical, symmetric, and circular layouts. Tom Sawyer Software. Hierarchical Toolkit Tom Sawyer Software. Symmetric Toolkit Tom Sawyer Software. Circular Toolkit

System Scalability, Data Set Size
previous systems H3 data sets my systems Planet Multicast mid-size web sites Constellation Web (pages) exceptional GD systems (dot, Gem3D) MBone (tunnels) Stanford graphics site most GD systems my site Net (routers) dictionary Net (hosts) manual GEB figure The Achilles heel of all of these systems is severely limited scalability. Very few of the graph drawing systems in the literature can handle more than one hundred nodes. A few exceptions like dot and Gem3D scale to several hundred nodes. But most real-world datasets aremuch larger than this, some by several orders of magnitude. The horizontal axis of node count on this figure is log scale. The hand-drawn GEB figure shown earlier is a few hundred nodes, my own personal web site has a few thousand individual URLs, and the mid-range Stanford graphics web site has well over 100,000 documents. The size of network examples ranges from several thousand MBone tunnels in 1996, to somewhere around 90,000 core Internet routers, to estimates of 70 million hosts on the whole Net. A natural language dictionary can contain millions of words, all defined in terms of each other. Finally, recent estimates of the Web put its size at over a billion pages. My three software systems are all attempts to handle large real-world datasets. Planet Multicast at several hundred, Constellation up to a thousand or so, and H3 is specifically designed for scalability and can handle over 100,000 nodes. 100 10M 1K 100M 1M 100K 10K 10 1B node count, log scale

Fundamental Idea extend reach of graph drawing with information visualization approach techniques interactivity incorporate domain-specific information The premise underlying my work is that we can extend the capabilities of graph drawing by taking an information visualization approach to the problem. I exploit sophisticated interaction techniques and incorporating domain-specific information.

Information Visualization
external visual representation of data, exploits perceptual system to reduce human cognitive load find appropriate visual metaphor for data that is not implicitly spatial The field of information visualization approaches is built around the idea that an external visual representation of data can exploit the perceptual system to reduce the cognitive load on a human. The key problem is usually posed as finding an appropriate visual metaphor for data that is not inherently spatial.

Interactivity mimic reality beyond 2D paper: pan, zoom
3D object: rotate, translate, scale beyond semantics impossible in real world distortion, multi-scale Interactivity is the great challenge & opportunity of computer-based viz. There’s a long and successful historical tradition of exposition using static paper and physical objects. There’s been more recent efforts with the dynamic media of film and video. But the advent of computers brings interactivity to the table, and that offers unprecedented power and flexibility A very basic way to interact with a computer display is to mimic reality: we can program virtual paper that we can zoom and pan like real paper, and virtual 3D objects that can rigidly rotate, translate, and scale. We can go beyond the simple imitations of reality, and tie user input to the visual display to get semantics impossible in real world. For instance, distortion-based methods for seeing a large context around an area of focus, or multiscale methods where the visual appearance of an object changes radically depending on distance.

Domain/Task Focus user-centered design, ethnography
understand high level goals maintain web site break down into lower level tasks minimize user navigation to important pages find and fix broken links design visual encoding evaluate effectiveness A hallmark of many infovis sytems is a focus on the domains and tasks of a group of intended users. This involves methods from user centered design and ethnography, working with people to understand their high level goals. For example, the goal of webmasters would be to create and maintain a web site. But these goals are too high level to address with software. We have to break them down into tasks at a lower level, like minimizing the number of hops users have to make to reach to important pages, or finding and fixing broken links. That’s specific enough that we can design a visual encoding to help support it, and gives us a handle on evaluating the effectiveness of the resulting system.

Evaluating Visualization Systems
quantitative algorithmic improvements conceptual framework analysis impact/adoption user studies anecdotal evidence Evaluating a visualization system is much more difficult than evaluating most graphics systems, because it’s hard to judge whether some piece of software really helped somebody get something done better. Something like a rendering system can be quantitatively evaluated based on whether it’s faster or more photorealistic than previous work. The goals are clear because low-level psychophysics are reasonably well understood. Part of visualization evaluation is similarly quantitative: algorithmic improvements to show that some technique is faster or scales to larger datasets. But that’s only a small part of the picture. Conceptual frameworks provide a very powerful way to analyze the design choices in a system. The impact that a system has in terms of the number of people that choose to use it is another way to judge its worth. User studies can be more rigorous, since they can test not only for whether people liked it, but whether it actually improved their performance. Finally, anecdotal evidence of discoveries made that a the user attributes to insights from the system is important in cases where user studies are infeasible because the target audience is small and/or the task is something like scientific discovery.

System 1: H3 time: 1996-8 data: web hyperlinks goal: scalability
quasi-hierarchical graphs: can find reasonable spanning tree using domain-specific information goal: scalability method: 3D hyperbolic I created the H3 system between 1996 and The initial dataset was web hyperlinks. It’s intended for what I call quasi-hierarchical graphs: graphs that are considerably more dense than trees, but where we can use domain-specific information to find a reasonable spanning tree that’s close to the user’s mental model of the structure. My fundamental choice was to trade off generality for scalability. Both layout and drawing occur in 3D hyperbolic space.

Background: Hyperbolic Space
Focus+Context distortion project from infinite hyperbolic to finite euclidean pick best model for useful distortion conformal: geodesics warped projective: angles warped I’ll begin with a little bit of background about the two important properties of hyperbolic geometry that I exploit. There are known methods from projecting infinite hyperbolic to finite portion of euclidean space, which provides a view with a large amount of context around a particular focus point. In any projection from one metric space to different one, distortion is inevitable. The distortion of a 2D map created from a 3D globe is a familiar example. There are multiple cartographic projections: those suited for ocean navigation distort areas. If you have a projection that minimizes area distortion, you can’t draw straight lines on it that work for navigation. The same holds for hyperbolic projections: we want to choose the distortions that are the best for information visualization. The projection that we want (they’re called models by mathematicians) is one where 4x4 matrix

Background: Hyperbolic Space
exponential room in space exponential number of tree nodes 2D hyperbolic plane hyperbolic hemisphere area exponential: 2p sinh r 2 euclidean hemisphere area geometric: 2pr 2 Thurston and Weeks, The Mathematics of Three Dimensional Manifolds, Scientific American

Previous Work: Hierarchies
Cone Trees [Robertson, Mackinlay, Card 91] Tree Maps [Johnson, Shneiderman 91] There’s been a fair amount of work in visualizing hierarchies. The most influential paper was on the Cone Tree System from Robertson, Mackinlay, and Card at Xerox PARC. My H3 layout algorithm is one of many extensions. Treemaps are a very different approach to showing hierarchies that are much better suited for spotting outliers than understanding structure. Area-based methods like the treemp-bsed one on the right are not suitable for scaling to really big datases. Robertson, Mackinlay and Card. "Cone Trees: Animated 3D visualizations of hierarchical information. Johnson and Shneiderman. Treemaps: A Space-filling Approach to the Visualization of Hierarchical Information distortion: [Furnas, Brown, Carpendale, Keahey]

Previous Work: Distortion & Hierarchy
2D Hyperbolic Tree [Lamping, Rao, Pirolli 94,95] scalability analysis later Fractal [Koike, Yoshihara 93] SHriMP [Storey, Muller 95] don’t scale taxonomy [Noik 94] The systems most relevant for H3 are those that combine distortion views with hierarchy or graph drawing. I’ll compare the 2D hyperbolic tree browser and the 3D hyperbolic webviz system to H3 in detail later in the talk. The paper on fractal approaches for visualizing huge hierarchies has no explicit scaling claims, but no figure has more than nodes at most. The SHriMP multiscale viewer had only extremely small datasets in the example figures. Noik’s taxonomy covers work before 1994. Lamping, Rao, and Pirolli. A Focus+Content Technique Based on Hyperbolic Geometry for Viewing Large Hierarchies.

Concurrent Work: Nicheworks
Nicheworks [Wills 97] layout scales to 1M nodes linked views multiple layout approaches very different visual metaphor There are only two recent systems that have a similar approach to merging graph drawing and information visualization, both of which were published during the time range of my H3 papers. The Nicheworks system scales to huge graphs, one order of magnitude larger than what H3 claims. The user can choose one of multiple layout algorithms, and the system supports linked views. The visual metaphor is quite different from H3. Wills et al. Nicheworks.

Concurrent Work: Skeletonization
Skeletonization [Herman 98] abstractions for tree structure Herman’s skeletonization work is also a mixture of infovis and graph drawing, and addresses a complementary problem to H3. Skeletonization provides an abstracted global overview, while H3 strives to created the largest possible local view. Herman et al. Skeletonization

H3 Layout novel layout algorithm detailed in thesis
hemisphere surface instead of linear circumference bottom-up pass: compute hemisphere sizes top-down pass: place child on parent surface

Information Density: Scale
Lamping, Rao, and Pirolli. A Focus+Content Technique Based on Hyperbolic Geometry for Viewing Large Hierarchies.

Information Density: Codimension
want balance between clutter and void topological approach to describing density difference between structure and surrounding space sparse dense Carpendale, Cowperthwaite, and Fracchia. Extending Distortion Viewing from 3D to 2D.

Evaluation: Scalability
drawing: constant incremental exception: precision layout: linear in |E| 110,000 edges in 12 seconds given DFS input limits: computational: global layout in main memory cognitive: disorientation past ~100K nodes large neighborhood not global overview future: landmarks, LOD, abstraction

Evaluation: Impact product from SGI research use of library viewer use
Site Manager aimed at web content creators bundled starting with Irix 6.3 research use of library interface for Skitter Internet tomography data analysis of Autonomous System data viewer use 6 researchers converted data to use viewer image use 6 reprint requests 6 or so: function call graphs, co-citation graphs, biodiversity taxonomies, medical informatics knowledge base, ASes

Evaluation: User Study
[Risden, Czerwinski, Munzner, Cook 00] compared 3 browsers for adding content to collection of web pages snap portal (Yahoo style) XML3D: H3 + lists collapsible tree

User Study Results reliably faster for existing category task
no decline in quality for new category task differences statistically significant differences statistically insignificant could use more studies to tease apart influence of h3, how and why it’s effective, which view components can be usefully linked. nevertheless it is gratifying to have some readl data abou ttasks for which it’s effective. augment othe rviews? more info than other views: snap: siblings not displayed collapsible shows parent/child/sibling, but only for one prent

System 2: Planet Multicast
time: 1996 joint work: Hoffman, Claffy, Fenner data: MBone tunnels task: find badly placed tunnels goal: simple baseline method: 3D geographic We built the Planet Multicast system in 1995 and 1996 to help the maintainers of the Internet’s multicast backbone find badly placed tunnels that were potentially wasting scarce network resources. We used a known 3D geographic metaphor of arcs on a globe. We wanted to try the obvious thing and see how well it worked as a simple baseline. This project was joint work with Eric Hoffman, K. Claffy, and Bill Fenner.

Previous Work: Geographic Network
SeeNet3D [Cox, Eick 95] arcs on globe layout Cox and Eick. 3D Displays of Network Traffic. SeeNet [Becker, Eick, Wilks 95] The most relevant previous work was the 1995 SeeNet3D paper by Kenneth Cox and Steven Eick, which introduced the arcs on globe visual metaphor that we used. The previous SeeNet system was a 2D geographic approach. The NSFNet visualization by Donna Cox and Robert Patternson was highly visible at Siggraph 92. NSFNet [Cox, Patterson 92] Becker, Eick, and Wilks. Visualizing Network Data Cox and Patterson. Visualization Study of the NSFNet.

Geographic Layout distance as stand-in for resource usage
partially correlated geographical determination arduous major scalability problem immediate comprehension evocative, many image reprints Wired, National Geographic still picture captures much of function The idea behind the geographic layout is that badly placed long distance tunnels are more likely than short ones to waste bandwidth. This is partially true: distance is partially correlated with resource usage, but the match is imperfect. We used distance as a stand-in because the true resource usage data was not available. It’s very hard to gather data about traffic and congestion for the unicast hops underneath a tunnel, or even to know the route taken through the underlying unicast topology. We might have chosen a different visual metaphor if that data was readily available. One implication of the arcs-on-globe visual metaphor is that tunnels that begin and end in the same city are not visible. The scale of the globe acts as a hardcoded filter, so we see only 700 of the 4400 tunnels. The imperfect correllation between geographic distance and resource bottlenecks is a disadvantage. The main roadblock for the system was a data issue, not a visualization issue: the geographical determination is both arduous and imperfect. Gathering this sort of data for the public Internet with ad-hoc methods simply wouldn’t scale. Claffy’s group at CAIDA is working on the problem.

Evaluation: Anecdotal Insights
… > pen-mbone-1.sprintlink.net( ) dc-mbone-1.sprintlink.net( ) [1/64/tunnel] > elm.can.net( ) dc-mbone-1.sprintlink.net( ) [1/64/tunnel] > boston.terra.net( ) dc-mbone-1.sprintlink.net( ) [1/0/tunnel/querier] > NS.FLSIG.ORG( ) dc-mbone-1.sprintlink.net( ) [1/64/tunnel] > ace.mid.net( ) dc-mbone-1.sprintlink.net( ) [1/64/tunnel] > fw-mbone-1.sprintlink.net( ) dc-mbone-1.sprintlink.net( ) [1/16/tunnel] > gateway10.crawford.com( ) dc-mbone-1.sprintlink.net( ) [1/32/tunnel] > csce-2--rngm-nb-f-1.net.tamu.edu( ) dc-mbone-1.sprintlink.net( ) [1/64/tunnel] ...

System 3: Constellation
time: joint work: Guimbretière data: MindNet query results task: plausibility checking for linguists method: 2D custom goal: targeted The Constellation system was created between 1998 and 1999, and was aimed at thetask of plausibility checking, done by a group of computational linguists who wanted to refine the algorithms used to create and query MindNet, a very large semantic network. Our custom spatial layout was two dimensional, and the goal in this project was to create a highly targeted visualization system that was maximally effective for the task. The second phase of the Constellation system was joint work with Francois Guimbretiere

Definition Graph dictionary entry sentence nodes: word senses
links: relation types

Semantic Network definition graphs as building blocks
unify shared words large network millions of nodes grammar checking now, translation future global structure known: dense probes return local info

Path Query best N paths between two words words on path itself
definition graphs used in computation

Task: Plausibility Checking
paths ordered by computed plausibility researcher hand-checks results high-ranking paths believable? believable paths high-ranked? stop words

Top 10 Paths: kangaroo - tail

Goal create unified view of relationships between paths and definition graphs shared words are key thousands of words (not millions) special-purpose algorithm debugging tool not understand the structure of English

Previous Work: Semantic Networks
SemNet [Fairchild, Poltrock, Furnas 88] multiple 3D layouts Visual Thesaurus [Thinkmap applet] casual browsing, constant motion < 20 nodes Fairchild, Poltrock, and Furnas. SemNet: Three-Dimensional Graphic Representations of Large Knowledge Bases. There has not been as much work on visualizing semantic networks as some of the other case study domains. The SemNet system from Fairchild, Poltrock, and Furnas used a few different 3D layouts, and had algorithms that tried to avoid edge crossings. It’s relatively old, dating back to 1998, so it has fairly limited scalability. The Visual Thesaurus is one of several lightweight applets for casual browsing that have appeared on the web lately. The layout algorithm is quite simple and only scales to a few dozen nodes at best, and the constant motion make it unsuitable for any real analytical work. Thinkmap applet. cited 3/09/00.

Traditional Layout avoid crossings reason: avoid false attachments
B B C D C artifact salience ambiguity

Information Visualization Approach
spatial position is strongest perceptual cue encode domain specific attribute plausibility gradient

Constellation Semantic Layout
novel layout algorithm detailed in thesis paths as backbone, definition graphs attached curvilinear grid iterative design for maximum semantics with reasonable information density allow crossings for long-distance proxy links

Selective Emphasis highlight sets of boxes and edges
interaction additional perceptual channels avoid perception of false attachments

Evaluation: Layout Effectivness

Evaluation: Layout Comparison
dot H3

Talk Outline graph drawing background software systems
goal previous work video discussion evaluation general discussion conclusion I’ll begin this talk with some background and previous work in the areas of graph drawing, information visualization, and the domains of each case study. I will summarize my research contributions, and move on to the case studies for the main body of the talk. I”ll start each case study by showing a video, and then discussing some of the interesting design decisions. I’ll then make comparisons across all three systems, and end with concluding remarks.

Visual Salience Planet Multicast H3 Constellation
long-distance tunnels H3 distant points of possible interest fringe: aggregate information Constellation selective emphasis word size tied to importance

Canonical Word Size

Hidden State Constellation avoids hidden state closed world assumption
change salience instead of toggle drawing closed world assumption if not visible, doesn’t exist easy to forget previous actions false negative conclusions H3, PM do have hidden state non-tree links sometimes drawn intra-city tunnels never drawn

Graph Functions structure discovery contextual backdrop linked view
pure spatial layout implicit in traditional graph drawing contextual backdrop linked view

additional visual encoding color, linewidth, shape, enclosure combination more than sum of parts linked view

Contextual Backdrop

brushing [Becker and Cleveland 88] invoke other software components

Linked View

Contributions detailed analysis of three software systems
interactive, range of domain specificity relate intended tasks to spatial layout, visual encoding two novel layout/drawing algorithms Constellation targeted design H3: scales 100x beyond previous work product, user study interactive systems for graph drawing using infovis techniques along spectrum. all incorporate domain-specific info, some more targeted than others. h3 least, const most performance advantage for certain tasks

Interactive Visualization of Large Graphs and Networks

Similar presentations

Presentation on theme: "Interactive Visualization of Large Graphs and Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interactive Visualization of Large Graphs and Networks

Similar presentations

Presentation on theme: "Interactive Visualization of Large Graphs and Networks"— Presentation transcript:

Similar presentations

About project

Feedback