Critical Design in Statistical Visualization

Slides:



Advertisements
Similar presentations
Visual Literacy.
Advertisements

Excursions in Modern Mathematics, 7e: Copyright © 2010 Pearson Education, Inc. 14 Descriptive Statistics 14.1Graphical Descriptions of Data 14.2Variables.
Introductory Statistics Options, Spring 2008 Stat 100: MWF, 11:00 Science Center C. Stat 100: MWF, 11:00 Science Center C. –General intro to statistical.
NSW Curriculum and Learning Innovation Centre Tinker with Tinker Plots Elaine Watkins, Senior Curriculum Officer, Numeracy.
Human-Computer Interaction Lecture 2: Visual representation.
VISUAL PERCEPTION 1. Developed by the German school called Gestalt Psychology –The relation between the figure and the background –Termination or closure.
©2007 by the McGraw-Hill Companies, Inc. All rights reserved. 2/e PPTPPT.
Introduction to ArcGIS for Environmental Scientists Module 1 – Data Visualization Chapter 3 – Symbology and Labeling.
VERITAS Confidential Graphic Design Shashank Deshpande VERITAS Software July, 2003.
 Developed by the German school called Gestalt Psychology The relation between the figure and the background Termination or closure principle Other perceptive.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-5 Chapters 9 and 10.
1 INTRODUCTION TO COMPUTER GRAPHICS. Computer Graphics The computer is an information processing machine. It is a tool for storing, manipulating and correlating.
Stem-and-Leaf Plots …are a quick way to arrange a set of data and view its shape or distribution A key in the top corner shows how the numbers are split.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Graphics and Desktop Publishing Objective 1.02: Investigate Design Principles and Elements.
Section 2.2 Graphical Displays of Distributions. Graphical Displays Always plot your data first! To see shape of distribution of data set, you need a.
Data organization and Presentation. Data Organization Making it easy for comparison and analysis of data Arranging data in an orderly sequence or into.
Presenting Author, Co-Author Name, PI Name, Dept
Exploring Data: Summary Statistics and Visualizations
Chapter 12 Visual Representation of Data
“Visual Representation” – for CompSci’s
Guilford County SciVis V105.01
Tennessee Adult Education 2011 Curriculum Math Level 3
Frequency Distributions and Graphs
3.2 Picturing Distributions of Data
Unit 4 Statistical Analysis Data Representations
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Objective: Given a data set, compute measures of center and spread.
Analyzing and Interpreting Quantitative Data
Describing Distributions Numerically
Quantitative Skills : Graphing
Maths Unit 6 – Representing Data
Data Representation and Mapping
VISUAL LANGUAGE 2.
Graphical Descriptive Techniques
Module 6: Presenting Data: Graphs and Charts
Principles of DESIGN.
Theme 3 Describing Variables Graphically
Statistical Tables and Graphs
CHAPTER 1 Exploring Data
Principles of DESIGN.
Technical Writing (AEEE299)
Graphic Communication
Building Worksheet Charts
DS4 Interpreting Sets of Data
10.2 Statistics Part 1.
Theme 3 Describing Variables Graphically
Satoru Suzuki, Marcia Grabowecky  Neuron 
Principles of DESIGN.
Common Core Math I Unit 2: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Graphs with SPSS.
1.1 Cont’d.
Identifying key characteristics of a set of data
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Chapter 3 Scatterplots and Correlation.
Satoru Suzuki, Marcia Grabowecky  Neuron 
Chapter 2 Similarity and Dilations
Creating Visuals and Data Displays
Mapping Quantities: Choropleth Maps Gary Christopherson
Statistical Reasoning
CHAPTER 1 Exploring Data
Figure 1. Prediction result for birch pollen allergen Bet v 1 (PDB: 1bv1), as obtained by comparison to the cherry ... Figure 1. Prediction result for.
Warmup Find the marginal distribution for age group.
Displaying Distributions with Graphs
Scatter Diagrams Slide 1 of 4
Writing Technical Reports
Understanding the values of a good thematic map
Data exploration and visualization
Essentials of Statistics 4th Edition
Maths Unit 5 – Representing Data
Presentation transcript:

Critical Design in Statistical Visualization Alan Blackwell Professor of Interdisciplinary Design University of Cambridge

Overview Brief background Principles of visualisation design Four critical case studies The Automated Statistician ICUMAP Gatherminer Self-Raising Data

Background: Metaphor in Diagrams

Some Principles of visualization

Typography and text

Maps and graphs

Schematic drawings

Node-and-link diagrams

Icons and symbols

Visual metaphor

Microsoft “Bob” (1995)

Microsoft “Task Gallery” (2000)

MIUI “Warm Space MiHome Desktop” (2015)

Pictures

Unified theories of visual representation

Graphic Resources Correspondence Design Uses Marks Shape Orientation Size Texture Saturation Colour Line Literal (visual imitation of physical features) Mapping (quantity, relative scale) Conventional (arbitrary) Mark position, identify category (shape, texture colour) Indicate direction (orientation, line) Express magnitude (saturation, size, length) Simple symbols and colour codes Symbols Geometric elements Letter forms Logos and icons Picture elements Connective elements Topological (linking) Depictive (pictorial conventions) Figurative (metonym, visual puns) Connotative (professional and cultural association) Acquired (specialist literacies) Texts and symbolic calculi Diagram elements Branding Visual rhetoric Definition of regions Regions Alignment grids Borders and frames Area fills White space Gestalt integration Containment Separation Framing (composition, photography) Layering Identifying shared membership Segregating or nesting multiple surface conventions in panels Accommodating labels, captions or legends Surfaces The plane Material object on which the marks are imposed (paper, stone) Mounting, orientation and display context Display medium Literal (map) Euclidean (scale and angle) Metrical (quantitative axes) Juxtaposed or ordered (regions, catalogues) Image-schematic Embodied/situated Typographic layouts Graphs and charts Relational diagrams Visual interfaces Secondary notations Signs and displays

Analysis examples

Graphic Resources Correspondence Design Uses Marks Size Colour Mapping (quantity, relative scale) Mark position identify category (colour) Express magnitude (size) Symbols Geometric elements Connective elements Topological (linking) Diagram elements Visual rhetoric Regions Alignment grids Containment Separation Framing (composition) Segregating or nesting multiple surface conventions in panels Accommodating labels, captions or legends Surfaces Display medium (web browser) Metrical (quantitative axes) Image-schematic? Graphs and charts

Graphic Resources Correspondence Design Uses Marks Shape Conventional (arbitrary) Mark position identify category (shape) Symbols Geometric elements Letter forms Connective elements Topological (linking) Acquired (specialist literacies) Texts Definition of regions Regions Alignment grids White space Containment Separation Segregating and nesting multiple surface conventions in panels Accommodating labels Surfaces Material object on which the marks are imposed (paper) Metrical (quantitative axes) Juxtaposed and ordered (regions) Musical score

“Big data” = too much data to see

Automated statistician https://www.automaticstatistician.com/static/assets/auto-report-affairs.pdf

Automated statistician output 3. Model description In this section I have described the model I have constructed to explain the data. A quick summary is below, followed by quantification of the model with accompanying plots of model fit and residuals. 3.1. Summary. The output affairs decreases linearly with input religiousness increases linearly with input yearsmarried decreases linearly with input age increases linearly with input occupation decreases linearly with input education

Automated statistician output Decrease with religiousness The correlation between the data and the input religiousness is -0.22 (see figure 2a). This correlation does not change when accounting for the rest of the model (see figure 2b). Increase with yearsmarried The correlation between the data and the input yearsmarried is 0.16 (see figure 3a). Accounting for the rest of the model, this changes slightly to a part correlation of 0.24 (see figure 3b).

Automated statistician output Low negative deviation between quantiles of test residuals and model There is an unexpectedly low negative deviation from equality between the quantiles of the residuals and the assumed noise model (see figure 8a). The minimum of this deviation occurs at the 99th percentile indicating that the test residuals have unexpectedly light positive tails. The minimum value of this deviation is -6.7 which is moderately lower than its median value under the proposed model of -1.9 (see figure 8c). To demonstrate the discrepancy in simpler terms I have plotted histograms of test set residuals and those expected under the model in figure 8b.

Automated statistician – critical notes Inspired by Turing Test (and Lenat’s AM) Actual report text is clearly written by a human Key ability is scanning for interest (what is interesting on average? blue?) Repair, Attribution and all That The reader scans for interest Names of the variables are essential How to pass the Turing Test? Lower the threshold! Reduce expectations? Or offer mixed initiative

Mixed initiative visualization

ICUMAP http://www.cl.cam.ac.uk/~as2388/patients/web_content/

ICUMAP – critical notes Big data measures too many things to visualise Provide ‘dimensionality reduction’ From ‘clusters’ to abstract ‘landscape’ Journey metaphor Use colour classes for a gestalt view Encourage guided browsing Support human expert judgment Show values within distributions Reveal broad spread of ‘similarity’ Use overlays to compare original variables

Gatherminer http://advaitsarkar.github.io/Gatherminer/

Gatherminer – critical notes Use visual rhythm to think about time Mundane comparison is automated Collect for pattern A human is the judge of interest Diagnosis through density But related to individual cases

Self-Raising Data http://www.cl.cam.ac.uk/~mcm79/srd/self-raising-data.html

Self-Raising Data – critical notes How can we do big data without data? Real data is produced by questions Offer data as imagination Rehearse statistical reasoning Structured in a way statisticians can understand subverts the ‘automated statistician’ fantasy Are statistics objective? AI is always (normatively) subjective Uncertainty in information is seen as noise

Summary Visualisation is a design discipline The marks we make communicate and persuade When statistics becomes artificial intelligence … What are we trying to prove? Where does the judgment take place? Mixed initiative interaction Respects expertise Revealing, not concealing, the models used