Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 Chapter 1. Displaying data with graphs.

Similar presentations


Presentation on theme: "Lecture 3 Chapter 1. Displaying data with graphs."— Presentation transcript:

1 Lecture 3 Chapter 1. Displaying data with graphs

2 Objectives (PSLS Chapter 1, plus suppl.) Picturing Distributions with Graphs  Individuals and variables (foundation for D+ awards)  Two types of data: categorical and quantitative (foundation for C+ awards)  Ways to chart categorical data: bar graphs, pie charts (univariate graph award)  Ways to chart quantitative data: histogram and dot density plots (univariate award)  Interpreting histograms (univariate award)  Graphing time series: time plots (univariate award)

3 Individuals and variables Individuals are the objects described in a set of data. Individuals may be people, animals, plants or things. These are often the units of measure.  Student, shells, trials participant, tomato plant A variable is any property that characterizes an individual. A variable can take different values for different individuals.  Age, gender, hair color, head circumference, leaf length, flower color

4 Two types of variables A variable can be either  quantitative Some quantity assessed or measured for each individual. We can then report the average of all individuals. Quantitative variables can be continuous or discrete. Continuous variables have units of measure that can be highly divided.  Age (in seconds), blood pressure (in mm Hg), leaf length (in cm) Discrete variables provide counts.  Number of fingers, number of leaves, number of units  categorical Some characteristic describing each individual. We can then report the count or proportion of individuals with that characteristic.  Gender ( male, female ), blood type ( A, B, AB, O ), flower color ( white, red )

5 How do you decide if a variable is categorical or quantitative? Ask:  What are the n individuals examined (in the sample or population)?  What is being recorded about those n individuals?  Is that a number (  quantitative) or a statement (  categorical)? Individuals studiedDiagnosisAge at death Patient AHeart disease56 Patient BStroke70 Patient CStroke75 Patient DLung cancer60 Patient EHeart disease80 Patient FAccident73 Patient GDiabetes69 Each individual is given a meaningful number Each individual is given a description

6 A study examined the condition of deer after a particularly nasty winter. Sex and condition (good and poor) of a random sample of 61 deer are noted. Data from such a study could appear in either of these two formats: Who/what are the individuals? What are the sampling units? What are the variables, and are they quantitative or categorical? Raw data Frequency table

7 Ways to chart categorical data Most common ways to graph categorical data:  Bar graphs Each characteristic, or level, is represented by a bar. The height of a bar represents either the count of individuals with that characteristic, the frequency, or the percent of individuals with that characteristic, the relative frequency.  Mosaic Plots (bivariate) Mosaic plots are graphical displays that allow the examination of relationships among two or more categorical variables. All dimensions of each bar represent the proportion found in the sample.  Pie charts Are well know to be bad for human consumption.

8 Do you like…? SubjectCarrotsPeasSpinach 1yes 2 no 3yes no 4 yes 5 no 6 yes CarrotsPeasSpinach Percent who like67%50%33% Which one do you prefer? SubjectPreference 1Peas 2Carrots 3 4Spinach 5Carrots 6Peas Percent who prefer Carrots50% Peas33% Spinach17% Bar graph only Bar graph or pie chart

9

10 Percent of current marijuana users in each of four age groups: USA, 2004 Who/what are the individuals? What are the variables, and are they quantitative or categorical?

11 Common ways to chart quantitative data  Histograms This is a summary graph for a variable. Histograms are useful to understand the pattern of variability in the data, especially for large data sets.  Dot density plots (aka dotplot) These are graphs which show every data point. They are useful to describe the pattern of variability in the data.  Line graphs: time plots Use them when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.  Other graphs to display numerical summaries (see chapter 2)

12 28 12 23 14 40 18 22 33 26 27 29 11 35 30 34 22 23 35 11 12 14 18 22 22 23 23 26 27 28 29 30 33 34 35 35 40 Sorted data Making a dotplot 1)Create a single axis representing the quantitative variable’s range 2)Represent each data point as a dot positioned according to its numerical value 3)When two or more data points have the same value, stack them up

13 Making a histogram 1) The range of values that the quantitative variable takes is divided into equal-size intervals, or classes. This makes up the horizontal axis. 2) The vertical axis represents either  the frequency (counts) or  the relative frequency (percents of total). 3) For each class on the horizontal axis, draw a column. The height of the column represents the count (or percent) of data points that fall in that class interval.

14 Guinea pig survival time (in days) after inoculation with a pathogen (n = 72) Let’s build a histogram with classes of size 50, starting at zero (zero is included in the first class). 43 45 53 56 56 57 58 66 67 73 74 79 80 80 81 81 81 82 83 83 84 88 89 91 91 92 92 97 99 99 100 100 101 102 102 102 103 104 107 108 109 113 114 118 121 123 126 128 137 138 139 144 145 147 156 162 174 178 179 184 191 198 211 214 243 249 329 380 403 511 522 598

15

16 Choosing the classes for a histogram It is an iterative process – try and try again.  Not too many classes with either 0 or 1 counts  Not overly summarized that you loose all the information  Not so detailed that it is no longer summary Try starting with 5 to10 classes, then refine your class choice. (There isn’t a unique or “perfect” solution) Statistical Applets: One Variable Statistical CalculatorOne Variable Statistical Calculator Art or Science?

17 Interpreting histograms We look for the overall pattern and for striking deviations from that pattern. We describe the histogram’s:  Shape  Center  Spread  Possible outliers

18 Symmetric distribution Most common unimodal distribution shapes Left skew The left side extends much farther out than the right side. Right skew The right side (side with larger values) extends much farther out than the left side.

19 Not all distributions have a simple shape (especially with few observations). Describe the shape of these histograms.

20 AlaskaFlorida Outliers An important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. Alaska and Florida have unusual percents of elderly in their population. A large gap in the distribution is typically a sign of an outlier.

21 Mauna Loa [CO2]

22 Mauna Loa Data in a Bar Chart

23 Graphing time series Monthly atmospheric CO 2 levels recorded at the Mauna Loa Hawaii observatory (March 1958 – August 2009) Data collected over time are displayed in a time plot, with time on the horizontal axis and the variable of interest on the vertical axis. We look for a possible trend (a clear overall pattern) and possible cyclical variations (variations with some regularity over time)

24 Mona Loa Examples

25 Graphic Context From Science News (2/11/2012)

26 Explanation of the Figure  Caption to the figure: India and China would see the most deaths prevented by 14 measures reducing methane and soot. The circles at left are proportional to the number of deaths that would be prevented annually by country in 2030.

27 Bubble Graph

28 Bar Graph

29 Graphic Context From Science News (2/11/2012)

30 Cleveland’s Hierarchy (From Stewart, Brandi 2005)

31 Tufte’s Take-homes  Increase the data to ink ratio Does each bit of ink provide unique information? Erase everything that is not needed.  Deemphasize non ‐ data elements Make the data points darker and bolder than other elements.  Quick notes Label the axes to avoid repeating percent signs on all data points. Too many decimal places or trailing zeros are a distraction Avoid putting extra dimensions in your charts. The pseudo three ‐ dimensional charts are difficult to read and provide no information If you know categories and values for each category, a two ‐ dimensional chart will be clearer than a pseudo three ‐ dimensional one. True three ‐ dimensional charts are even more difficult to read. Make grid lines barely perceptible, if used.

32

33 Guinea pig survival time (in days) after inoculation with a pathogen (n = 72) Let’s build a histogram with classes of size 50, starting at zero (zero is included in the first class). 43 45 53 56 56 57 58 66 67 73 74 79 80 80 81 81 81 82 83 83 84 88 89 91 91 92 92 97 99 99 100 100 101 102 102 102 103 104 107 108 109 113 114 118 121 123 126 128 137 138 139 144 145 147 156 162 174 178 179 184 191 198 211 214 243 249 329 380 403 511 522 598

34 A picture is worth a thousand words, BUT there is nothing like hard numbers.  Look at the scales. Scales matter How you stretch the axes and choose your scales can give a different impression. Death rates from cancer (U.S., 1945 – 95)

35

36 Shark River water salinity in the Everglades National Park, over a seven-day period in the fall of 2009. Describe these two graphs.


Download ppt "Lecture 3 Chapter 1. Displaying data with graphs."

Similar presentations


Ads by Google