Presentation on theme: "Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative."— Presentation transcript:
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative Ways to chart categorical data: bar graphs and pie charts Ways to chart quantitative data: histograms and stem plots Interpreting histograms Time plots
Example BPS chapter 1 Indicate whether each of the following variables is categorical or quantitative. a. We have data on 20 individuals measuring amount of time it takes to climb five flights of stairs. b. During a clinical trial, an experimental pain relief drug is administered to individuals. Each individual is then asked whether s/he experienced any pain relief. Quantitative Categorical
Objectives (BPS chapter 2) Describing distributions with numbers Measure of center: mean and median Measure of spread: quartiles and standard deviation The five-number summary and boxplots IQR and outliers Choosing among summary statistics
The mean or arithmetic average To calculate the average, or mean, add all values, then divide by the number of individuals. It is the “center of mass.” Sum of heights is 1598.3 Divided by 25 women = 63.9 inches Measure of center: the mean
Mathematical notation: Learn right away how to get the mean using your calculators.
Measure of center: the median The median(M) is the midpoint of a distribution—the number such that half of the observations are smaller and half are larger. 1. Sort observations from smallest to largest. 2. Find the location of the median (L) n = 24 L=(n+1)/2 = 12.5 M= (3.3+3.4) /2 = 3.35 (2). If n is even, the median is the mean of the two center observations n = 25 L=(n+1)/2 = 26/2 = 13 M = 3.4 (1). If n is odd, the median is observation (n+1)/2 down the list n = number of observations
Mean and median for skewed distributions Mean and median for a symmetric distribution Left skewRight skew Mean Median Mean Median Mean Median Comparing the mean and the median The mean and the median are the same only if the distribution is symmetrical. In a skewed distribution, the mean is usually farther out in the long tail than is the median. The median is a measure of center that is resistant to skew and outliers. The mean is not.
The median, on the other hand, is only slightly pulled to the right by the outliers (from 3.4 to 3.6). The mean is pulled to the right a lot by the outliers (from 3.4 to 4.2). Percent of people dying Mean and median of a distribution with outliers Without the outliers With the outliers
Disease X: Mean and median are the same. Mean and median of a symmetric distribution Multiple myeloma: and a right-skewed distribution The mean is pulled toward the skew. Impact of skewed data
Example: STAT 200 Midterm Score Midterm 30 35 40 45 50 55 60 65 70 100 Descriptive Statistics: Midterm Variable N Mean StDev Minimum Q1 Median Q3 Maximum Midterm 20 53.75 18.98 30.00 40.00 47.50 63.75 100.00
M = median = 3.4 Q 1 = first quartile = 2.2 Q 3 = third quartile = 4.35 Measure of spread : quartiles The first quartile, Q 1, is the value in the sample that has 25% of the data at or below it. The third quartile, Q 3, is the value in the sample that has 75% of the data at or below it.
M = median = 3.4 Q 3 = third quartile = 4.35 Q 1 = first quartile = 2.2 Largest = max = 6.1 Smallest = min = 0.6 “Five-number summary” Center and spread in boxplots
Comparing box plots for a normal and a right-skewed distribution Boxplots for skewed data Boxplots remain true to the data and clearly depict symmetry or skewness.
IQR and outliers The interquartile range (IQR) is the distance between the first and third quartiles (the length of the box in the boxplot) IQR = Q 3 - Q 1 An outlier is an individual value that falls outside the overall pattern. How far outside the overall pattern does a value have to fall to be considered an outlier? The 1.5 X IQR Rules for Outliers Low outlier: any value < Q 1 – 1.5 IQR High outlier: any value > Q 3 + 1.5 IQR
Example: STAT 200 Midterm Score IQR = Q 3 - Q 1 =63.75-40.00=23.75 Low outlier: any value < Q 1 – 1.5 IQR = 40.00 - 1.5(23.75) = 4.375 High outlier: any value > Q 3 + 1.5 IQR = 63.75 + 1.5(23.75) =99.375 Midterm 30 35 40 45 50 55 60 65 70 100 Outliers !!
The standard deviation is used to describe the variation around the mean. 1) First calculate the variance s 2. 2) Then take the square root to get the standard deviation s. Measure of spread: standard deviation Mean ± 1 s.d.
Calculations … We’ll never calculate these by hand, so make sure you know how to get the standard deviation using your calculator. Mean = 63.4 Sum of squared deviations from mean = 85.2 Degrees freedom (df) = (n − 1) = 13 s 2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = √6.55 = 2.56 inches Women’s height (inches)
Choosing among summary statistics Because the mean is not resistant to outliers or skew, use it to describe distributions that are fairly symmetrical and don’t have outliers. Plot the mean and use the standard deviation for error bars. Otherwise, use the median in the five-number summary, which can be plotted as a boxplot. Box plot Mean ± s.d.
Example 1 Suppose a sample of twelve lab rats is found to have the following glucose levels: 3 4 4 6 6 6 8 8 9 10 12 15 1. Find the five-number summary of the data and construct box-plot. 2. Based on the box plot, the data set is a. Skewed to left b. roughly symmetric c. skewed to right Min=3, Q1=5, M=7, Q3=9.5, Max=15
Example 2 Suppose a researcher is recording fifty values in a database. Suppose she records every value correctly except the lowest value, which is supposed to be “2” but which she incorrectly types as “200”. In the above scenario, the effect of the researcher’s error on mean and Median is: a. Her calculated mean will be lower than it would have been without the error, but her calculated Median will remain unchanged. b. Her calculated mean will be higher than it would have been without the error, but her calculated Median will remain unchanged. c. Her calculated mean will remain unchanged, but her calculated Median will be lower than it would have been without the error. d. Her calculated mean will remain unchanged, but her calculated Median will be lower than it would have been without the error.
Example 2 In the above scenario, the effect of the researcher’s error on standard deviation is: a. The error will not affect standard deviation. b. Her calculated standard deviation will be smaller than it would have been without the error. c. Her calculated standard deviation will be larger than it would have been without the error. d. The error is likely to make the calculated standard deviation negative.
Example 3 There are three children in a room -- ages 3, 4, and 5. If a four-year-old child enters the room, the a.mean age and variance will stay the same. b.mean age and variance will increase. c.mean age will stay the same but the variance will increase. d.mean age will stay the same but the variance will decrease.