Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady.

Similar presentations


Presentation on theme: "Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady."— Presentation transcript:

1 Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady

2 Multivariate Data Types In general, each point has many attributes and/or measurements –Type 1: measurements are continuous in nature, and combining dimensions might make sense Weather data - for each x, y, z location we have water density (scalar), temperature (scalar), wind velocity (vector), air pressure (scalar) –Type 2: data is discrete, more like attribute list, and cannot in general be combined Baseball statistics - for each player we have at bats, walks, hits, doubles, homeruns, RBIs. Populations - eye color of residents in NC, income level, voting record

3 Approaches Dimensional Reduction –Principle Component Analysis –Independent Component Analysis –Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting Dimensional Subsetting Dimensional Organization Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000

4 Dimensional Subsetting - Scatter Plots Invoke the concept of small multiples Show all pair- wise dimensions in a matrix Easily see clusters, trends and correlations Problem: How do you see a trend that requires 2 or more dependent variables? Source: Matt Ward, Multivariate Vis talk Sept 2000

5 Dimensional Organization Show each variable with an explicit visual representation Spatial Shape Color Size Orientation Texture The combination of these visual variables can produce information that “pops out”, but it is not additive Images: Chris Healey

6 Dimensional Organization - Glyphs (show star glyph demo) Image: Matt Ward, Multivariate Vis talk Sept 2000

7 Dimensional Organization - Parrallel Coords Parallel Coordinates creates parallel, rather than orthogonal, dimensions. Data point corresponds to polyline across axes Clusters, trends, and anomalies discernable as groupings or outliers, based on intercepts and slopes Source: Matt Ward, Multivariate Vis talk Sept 2000 Show Parrallel Coords Demo

8 Parrallel Coords - Useful? Source: http://www.ccs.neu.edu/home/mattsp/

9 Parrallel Coords - Useful?

10 Parrallel Coords - Extended Visualizating Hierarchical clusters, Fua et al. 1999

11 Approaches Dimensional Reduction –Principle Component Analysis –Independent Component Analysis –Kohonen Self Organizing Map http://davis.wpi.edu/~matt/courses/soms/Dimensional Subsetting Dimensional Subsetting Dimensional Organization Dimensional Embedding Source: Matt Ward, Multivariate Vis talk Sept 2000

12 Dimensional Embedding Dimensional stacking divides data space into bins Each N-D bin has a unique 2-D screen bin Screen space recursively divided based on bin count for each dimension Clusters and trends manifested as repeated patterns Source: Matt Ward, Multivariate Vis talk Sept 2000

13 Dimensional Embedding - not so easy What Dimensions do you choose at what hierarchy? How do you keep coordinates consistent? How do you layout tiles on page with consistency? Can we do this automatically? Producing a good plot is hard Trellis - an attempt by Rick Becker and Bill Cleveland Incorporated in to the S/S-PLUS statistical Package

14 A Digression into Plot design…

15 Effective use of space Which graph is better? Government payrolls in 1931 [how to lie with stats, huff 93] Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

16 Aspect Ratio - fill space with data Yearly CO2 concentrations [Cleveland 85] Don’t worry about showing zero Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

17 Banking to 45 Degrees http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

18 Clearly mark scale breaks Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

19 Scale break vs. Log scale Both Increase Visual Resolution Log scale allows easy comparisons of all data Scale break is more difficult to compare across the break Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

20 Transforming Data for Graphing How well does the curve fit the data? Plot vertical distance from best fit curve Residual graph shows accuracy of fit Slide Source: Maneesh Agrawal, Lecutre Notes Fall 2005

21 A Trellis Example Lead Concentration vs. Setback Distance Given Day-of-the-Week, Week, and Height On the next slide is a trellis display of lead concentration against setback distance given day- of-the-week (thu-wed), week (1-3), and height (3 values). There are 63 panels arranged into 31 columns and 3 rows. Each row conditions on a different value of height; as we go from bottom to top, the heights increase. The panels in each row are in time order because the panels first cycle through the days of the week and then through the weeks. The display reveals much about the structure of the data. There is a strong interaction between height and setback distance. For the lowest height, lead decreases with setback. But for the middle value of height, lead typically first increases with setback and then decreases. For the highest height, lead occasionally has the increase-decrease pattern for about 1/3 of the days, most of them days with large concentrations, and is relatively stable for the remaining days. This behavior is consistent with air transport mechanisms. Lead is emitted at ground level from automobile tail pipes. The closest of the 9 monitors, the one with the lowest height and the closest setback, has the largest concentrations because it is close to the pollution source. From the source, the lead is carried laterally by the wind, spreading upward as it moves. This plume-like behavior can cause the concentrations to be relatively small at the higher monitors at the closest setback. Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html

22 A Trellis Example Source: http://cm.bell-labs.com/cm/ms/departments/sia/project/trellis/wwww.html

23 Tensor Visualization High Dimensional Scientific Data Visualization Not Today

24 Some Interesting Web Sites The best and worst of statistical graphs –http://www.math.yorku.ca/SCS/Gallery/http://www.math.yorku.ca/SCS/Gallery/ Chris Healey’s Preattentive Vision Applet –http://www.csc.ncsu.edu/faculty/healey/PP/index.html#Preattentive OpenDX Gallery –http://www.opendx.org/highlights.php IVTK: An Information Visualization Toolkit –Ivtk.sourceforge.net Information Visualization Repository –http://www.cs.umd.edu/hcil/InfovisRepository/index.shtml

25 Resources Great sources for theory behind multivariate display and perception are –Bertin 1983 –Cleveland 1993 –Tufte 1983, 1990 –Colin Ware, 2000 A couple of good papers are –Shneiderman, “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations” –Marc Green, “Toward a Perceptual Science of Multidimensional Data Visualization: Bertin and Beyond”


Download ppt "Visualization Techniques for Multivariate Discrete and Continuous Data March 4, 2005 Rachael Brady."

Similar presentations


Ads by Google