1 How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 Chapter 1: Sampling and Descriptive Statistics.
Chapter 5: Understanding and Comparing Distributions
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Types of Data Displays Based on the 2008 AZ State Mathematics Standard.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter Two Treatment of Data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Understanding and Comparing Distributions
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Chapter 1 Exploring Data
Quantitative Skills: Data Analysis and Graphing.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 3 Organizing and Displaying Data.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Association between 2 variables We've described the distribution of 1 variable in Chapter 1 - but what if 2 variables are measured on the same individual?
Numerical Descriptive Techniques
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
Quantitative Skills 1: Graphing
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
StatisticsStatistics Graphic distributions. What is Statistics? Statistics is a collection of methods for planning experiments, obtaining data, and then.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Chapter 2 Describing Data.
VCE Further Maths Chapter Two-Bivariate Data \\Servernas\Year 12\Staff Year 12\LI Further Maths.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
Categorical vs. Quantitative…
Unit 4 Statistical Analysis Data Representations.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Describing Distributions Numerically.
Chapter 4 Scatterplots and Correlation. Chapter outline Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots.
Copyright 2011 by W. H. Freeman and Company. All rights reserved.1 Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 2 Tables.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Charts Overview PowerPoint Prepared by Alfred P.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Box and Whisker Plot Chapter 3.5. Box and Whisker Plot A Box-and-Whisker Plot or Box plot is a visual device that uses a 5-number summary to reveal the.
Interpreting Categorical and Quantitative Data. Center, Shape, Spread, and unusual occurrences When describing graphs of data, we use central tendencies.
Introduction to Statistics
Introduction to Statistics
Exploratory Data Analysis
Describing Distributions Numerically
Chapter 2: Methods for Describing Data Sets
Statistics Unit Test Review
Quantitative Skills : Graphing
How could data be used in an EPQ?
Statistical Reasoning
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Chapter 3 Describing Data Using Numerical Measures
Class Data (Major) Ungrouped data:
Topic 5: Exploring Quantitative data
10.5 Organizing & Displaying Date
Advanced Algebra Unit 1 Vocabulary
Biostatistics Lecture (2).
Presentation transcript:

1 How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics

2 A brief introduction Graphics: –One of the most important aspects of presentation and analysis of data; help reveal structure and patterns. Graphical perception (ie, interpretation of a graph): –The visual decoding of the quantitative and qualitative information encoded on graphs. Objective: –To discuss how to interpret some common graphs.

3 Sidebar: Types of variables Continuous (quantitative data): –Have any number of possible values (eg, weight). –Discrete numeric – set of possible values is a finite (ordered) sequence of numbers (eg, a pain scale of 1, 2, …, 10). Categorical (qualitative data): –Have only certain possible values (eg, race); often not numeric. –Binary (dichotomous) – a categorical variable with only two possible value (eg, gender). –Ordinal – a categorical variable for which there is a definite ordering of the categories (eg, severity of lower back pain as none, mild, moderate, and severe).

4 Graphs for a single variable’s distribution

5 Histograms Continuous variable. Values are divided into a series of intervals, usually of equal length. Data are displayed as a series of vertical bars whose heights indicate the number (count) or proportion (percentage) of values in each interval. What is the overall shape? Is it symmetric? Is it skewed? –Affected by the size of the interval. Is there more than one peak? What is the range of the intervals? Is the shape wide or tight (ie, what’s the variability?) Look for concentration of points and/or outliers, which can distort the graph.

6 Boxplots Continuous variable. Displays a numerical summary of the distribution. –Most include the 25 th, 50 th (median), and 75 th percentiles. –Optionally includes the mean (average). –May extend to the min & max or may use a rule to indicate outliers. –Graphed either horizontally or vertically. Interpretation: –What statistics are displayed? –Most often, the central box includes the middle 50% of the values. –Whiskers (& outliers) show the “range”. –Symmetry is indicated by box & whiskers and by location of the median (and mean).

7 Boxplot with raw data Going one step beyond just a boxplot. –Boxplot is overlaid with the raw values of the continuous variable. –Therefore, displays both a numerical summary as well as the actual data. –Gives a better idea the number of values the numerical summary (ie, boxplot) is based on and where they occur. Raw values are often “jittered” – that is, in order to visually depict multiple occurrences of the same value, a random amount of noise is added in the horizontal direction (if boxplot is vertical; in the vertical direction if the boxplot is horizontal). Look for concentration of points and (as before) outliers.

8 Barplots (aka, bar charts) Categorical variable. Data are displayed as a series of vertical (or horizontal) bars whose heights indicate the number (count) or proportion (percentage) of values in each category. –Visual representation of a table. –How do the heights of the bars compare? Which is largest? Smallest?

9 Dot plots (aka, dotcharts) Categorical variable. Alternative to a barplot (bar chart). Height of the (vertical) bars are indicated with a dot (or some other character) on a (often horizontal dotted) line. –Line represents the counts or percentages. Same interpretation as barplot (bar chart).

10 Graphs for the association/relation between two variables

11 Side-by-side boxplots A continuous variable and a categorical variable. Displays the distribution of the continuous variable within each category of the categorical variable. Width of the boxes can also be made proportional to the number of values in each category. Here, side-by-side boxplots are overlaid with the raw values. How does the symmetry of each boxplot differ across categories? How do they compare to the boxplot of the continuous variable ignoring the categorical variable? Is there a concentration of points and/or outliers in one particular category? Is the number of values in each category fairly consistent?

12 Barplots Two categorical variables. –Visual representation of a two-way table. Bars are most often “nested”. –The count/proportion of the 2 nd variable’s categories is displayed within each of the 1 st variable’s categories. –Allows you to compare the 2 nd variable’s categories (1) within each of the 1 st variable’s categories, and (2) across the 1 st variable’s categories. Bars can also be “stacked”. –A single bar is constructed for each category of the 1 st variable & divided into segments, which are proportional to the count/ percentage of values in each category of the 2 nd variable. –Counts should sum to the no. of values in the dataset; percentages should sum to 100%. –Unlike “side-by-side”, segments do not have a common axis – makes difficult to compare segment sizes across bars.

13 Dot plots Two categorical variables. –Alternative visual representation of a two-way table. Like barplots, can be “nested”. –Have different lines for each category of the 2 nd variable grouped for each category of the 1 st variable. Can also be “stacked”. –Categories of the 2 nd variable are shown on a single line; one line for each category of the 2 nd variable; 1 st variable’s categories are distinguished with different symbols. –Unlike “stacked” barplots, do have a common axis for comparisons. Same interpretation as barplot (bar chart). –Same comparisons – within and across categories.

14 Scatterplots Two continuous variables. Usually, the “response” variable (ie, outcome) is plotted along the vertical (y) axis and the explanatory variable (ie, predictor; risk factor) is plotted along the horizontal (x) axis. –Doesn’t matter if there is no distinction between the two variables. Each “subject” is represented by a point. Often include lines depicting an estimate of the linear/non-linear relation/ association, and/or confidence “bands”. What to look for : –Overall pattern: Positive association/ relation? Negative association/ relation? No association/relation? –Form of the association/relation: Linear? Non-linear (ie, a curve)? –Strength of the relation/association: How tightly clustered are the points (ie, how variable is the relation/ association)? –Outliers –“Lurking” variables: A 3 rd (continuous or categorical) variable that is related to both continuous variables and may confound the association/relation. Often incorporated into graph – see “Graphs for mutlivariate data” slides.

15 Example Scatterplots

16 Graphs for multivariate data (ie, more than two variables)

17 (More complex) Scatterplots Two continuous variables and a categorical variable. Often, categorical variable is a confounder – the association/relation between the two continuous variables is (possibly) different between the categories of the categorical variable. Categorical variable incorporated using different symbols and/or line types for each category. What to look for: –Same as mentioned for general scatterplot. –Does the association/relation between the two continuous variables differ between the categories of the categorical variable? If so, how?

18 Examples of other graphs you might encounter

19 Modified “side-by-side boxplot” (great alternative to a “dynamite plot” –next slide)

20 “Dynamite plot” (often, height of bar = mean; error bar = standard deviation) IMPORTANT Even though commonly seen, not a good graph to generate. –Interested in the height of the bar (rest of the bar is just unnecessary ink). –Have no idea how many values the mean and standard deviation are based on (often quite small) or how the raw values are distributed. –Both affect the values of the mean and standard deviation. –Bars can also be “hanging”, which may represent negative values – very confusing.

21 Survival & Hazard plots Each step down represents one or more “deaths”; “+” signs represent censoring. Each step up represents one or more “deaths”; “+” signs represent censoring.

22 “Spaghetti” & Line plots Each line plots the raw data points of a single “subject”. Each line plots summary measures (eg, mean) from a group of subjects.

23 WARNING: Very easy for a graph to lie What are the limits of the axis/axes? Is the scale consistent? How do the height and width of the graph compare to each other? Is the graph a square? A rectangle (ie, short & wide; tall & skinny)? If two or more graphs are shown together (eg, side-by-side, or in a 2x2 matrix), do all of the axes have the same limits? Same scale? Do they have the same relative dimensions? Are there two x- or y-axes in the same graph? If so, do they have the same scale? Can you get a feel for the raw data? The number of data points? Does a graph of a continuous variable show outliers? Does the data look too “pretty”?

24 General steps Do I understand this graph? –If NO: (1) it might be a really bad graph; or (2) it might be a type of graph you don’t know about. Carefully examine the axes and legends, noting any oddities. Scan over the whole graph, to see what it is saying, generally. If necessary, look at each portion of the graph. Re-ask “Do I understand this graph?” –If YES, what is it saying? –If NO, why not? “Overview of Statistical Graphs”, Peter Flom