Presentation on theme: "13.1 Vis_04 Data Visualization Lecture 13 Information Visualization Part 1."— Presentation transcript:
13.1 Vis_04 Data Visualization Lecture 13 Information Visualization Part 1
13.2 Vis_04 What is Visualization? n Generally: – The use of computer-supported, interactive, visual representations of data to amplify cognition Card, McKinlay and Schneiderman n Two branches: – Scientific Visualization – Information Visualization.. But first… an experiment
13.3 Vis_04 The Experiment n You need a watch with a second-hand n Without using pencil and paper (or a calculator!!), multiply 72 by 34 n How long did it take? n Now you need pencil and paper as well as watch n Multiply 47 by 54 n How long did it take? n Conclusion?
13.4 Vis_04 Visualization – Twin Subjects n Scientific Visualization – Visualization of physical data n Information Visualization – Visualization of abstract data Ozone layer around earth Automobile web site - visualizing links … but this is only one characterisation
13.5 Vis_04 Scientific Visualization – Another Characterisation n Focus is on visualizing an entity measured in a multi-dimensional space – 1D – 2D – 3D – Occasionally nD n Underlying field is recreated from the sampled data n Relationship between variables well understood – some independent, some dependent Image from D. Bartz and M. Meissner
13.6 Vis_04 Scientific Visualization Model n Visualization represented as pipeline: – Read in data – Build model of underlying entity – Construct a visualization in terms of geometry – Render geometry as image n Realised as modular visualization environment – IRIS Explorer – IBM Open Visualization Data Explorer (DX) – AVS visualizemodeldatarender
13.7 Vis_04 Information Visualization n Focus is on visualizing set of observations that are multi-variate n Example of iris data set – 150 observations of 4 variables (length, width of petal and sepal) – Techniques aim to display relationships between variables
13.8 Vis_04 Dataflow for Information Visualization n Again we can express as a dataflow – but emphasis now is on data itself rather than underlying entity n First step is to form the data into a table of observations, each observation being a set of values of the variables n Then we apply a visualization technique as before visualize data table datarender ABC 1.. 2 variables observations
13.9 Vis_04 Applications of Information Visualization n Data Collections – Census data – Astronomical Data – Bioinformatics Data – Supermarket checkout data – and so on – Can relationships be discovered amongst the variables? n Networks of Information – E-mail traffic - Web documents – Hierarchies of information (eg filestores) We shall see that all can be described as data tables
13.10 Vis_04 Multivariate Visualization n Software: – Xmdvtool Matthew Ward n Techniques designed for any number of variables – Scatter plot matrices – Parallel co-ordinates – Glyph techniques Acknowledgement: Many of images in following slides taken from Wards work..and also IRIS Explorer!
13.11 Vis_04 Scatter Plot n Simple technique for 2 variables is the scatter plot This example from NIST shows linear correlation between the variables www.itl.nist.gov/div898/ handbook/eda/section3/ scatterp.htm
13.12 Vis_04 3D Scatter Plots n There has been some success at extending concept to 3D for visualizing 3 variables XRT/3d http://www.ist.co.uk/XRT/xrt3d.html
13.13 Vis_04 Extending to Higher Numbers of Variables n Additional variables can be visualized by colour and shape coding n IRIS Explorer used to visualize data from BMW – Five variables displayed using spatial arrangement for three, colour and object type for others – Notice the clusters… Kraus & Ertl
13.14 Vis_04 IRIS Explorer 3D Scatter Plots n Try this…. Thanks to: http://www.mpa-garching.mpg.de/MPA-GRAPHICS/scatter3d.html
13.15 Vis_04 Scatter Plots for M variables n For table data of M variables, we can look at pairs in 2D scatter plots n The pairs can be juxtaposed: A B C C B A With luck, you may spot correlations between pairs as linear structures..................
13.16 Vis_04 Scatter Plot Data represents 7 aspects of cars: what relationships can we notice? For example, what correlates with high MPG? Pictures from Xmdv tool developed by Matthew Ward: davis.wpi.edu/~xmdv
13.17 Vis_04 Parallel Coordinates: Visualizing M variables on one chart ABCDEF - create M equidistant vertical axes, each corresponding to a variable - each axis scaled to [min, max] range of the variable - each observation corresponds to a line drawn through point on each axis corresponding to value of the variable
13.18 Vis_04 Parallel Coordinates ABCDEF - correlations may start to appear as the observations are plotted on the chart - here there appears to be negative correlation between values of A and B for example - this has been used for applications with thousands of data items
13.19 Vis_04 Parallel Coordinates Example Detroit homicide data 7 variables 13 observations 1961 -1973
13.20 Vis_04 The Screen Space Problem n All techniques, sooner or later, run out of screen space n Parallel co- ordinates – Usable for up to 150 variates – Unworkable greater than 250 variates Remote sensing: 5 variates, 16,384 observations)
13.21 Vis_04 Brushing as a Solution n Brushing selects a restricted range of one or more variables n Selection then highlighted
13.22 Vis_04 Scatter Plot Use of a brushing tool can highlight subsets of data..now we can see what correlates with high MPG
13.23 Vis_04 Parallel Coordinates Brushing picks out the high MPG data Can you observe the same relations as with scatter plots? More or less easy?
13.24 Vis_04 Parallel Coordinates Here we highlight high MPG and not 4 cylinders
13.25 Vis_04 Clustering as a Solution n Success has been achieved through clustering of observations n Hierarchical parallel co-ordinates – Cluster by similarity – Display using translucency and proximity-based colour
13.28 Vis_04 Reduction of Dimensionality of Variable Space n Reduce number of variables, preserve information n Principal Component Analysis – Transform to new co- ordinate system – Hard to interpret n Hierarchical reduction of variable space – Cluster variables where distance between observations is typically small – Choose representative for each cluster
13.29 Vis_04 Further Reading n Information Visualization – Robert Spence – published 2000 by Addison Wesley n See also resources section of the module web site