Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.

Slides:



Advertisements
Similar presentations
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Advertisements

Exploratory Data Analysis I
Describing Data: One Variable
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Describing Data: One Quantitative Variable
Sullivan – Statistics: Informed Decisions Using Data – 2 nd Edition – Chapter 3 Introduction – Slide 1 of 3 Topic 16 Numerically Summarizing Data- Averages.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Quartiles and the Interquartile Range.  Comparing shape, center, and spreads of two or more distributions  Distribution has too many values for a stem.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
M08-Numerical Summaries 2 1  Department of ISM, University of Alabama, Lesson Objectives  Learn what percentiles are and how to calculate quartiles.
STAT 250 Dr. Kari Lock Morgan
1.3: Describing Quantitative Data with Numbers
Applied Quantitative Analysis and Practices LECTURE#08 By Dr. Osman Sadiq Paracha.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Describing distributions with numbers
Lecture 3 Describing Data Using Numerical Measures.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Categorical vs. Quantitative…
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Describing Data: Two Variables
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
STAT 101: Day 5 Descriptive Statistics II 1/30/12 One Quantitative Variable (continued) Quantitative with a Categorical Variable Two Quantitative Variables.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Describing Data: Two Variables
Chapter 1: Exploring Data
Statistics 200 Lecture #4 Thursday, September 1, 2016
Chapter 1: Exploring Data
Description of Data (Summary and Variability measures)
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
Topic 5: Exploring Quantitative data
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
One Quantitative Variable: Measures of Spread
Organizing Data AP Stats Chapter 1.
1.3 Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Exploratory Data Analysis
Chapter 1: Exploring Data
Honors Statistics Review Chapters 4 - 5
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative variable (2.2, 2.3)

Statistics: Unlocking the Power of Data Lock 5 The Big Picture Population Sample Sampling Statistical Inference Descriptive Statistics

Statistics: Unlocking the Power of Data Lock 5 Descriptive Statistics In order to make sense of data, we need ways to summarize and visualize it Summarizing and visualizing variables and relationships between two variables is often known as descriptive statistics (also known as exploratory data analysis) Type of summary statistics and visualization methods depend on the type of variable(s) being analyzed (categorical or quantitative) Today: One quantitative variable

Statistics: Unlocking the Power of Data Lock 5 Question of the Day How obese are Americans?

Source: Behavioral Risk Factor Surveillance System, CDC Obesity Trends* Among U.S. Adults BRFSS, 1990, 2000, 2010 (*BMI 30, or about 30 lbs. overweight for 5’4” person) No Data <10% 10%–14% 15%–19% 20%–24% 25%–29% ≥30%

Statistics: Unlocking the Power of Data Lock 5 Obesity in America Obesity is a HUGE problem in America We’ll explore the topic of obesity in America question with two different types of data, both collected by the CDC:  Proportion of adults who are obese in each state  BMI for a random sample of Americans

Statistics: Unlocking the Power of Data Lock 5 Behavioral Risk Factor Surveillance System

Statistics: Unlocking the Power of Data Lock 5 Obesity by State: 2013

Statistics: Unlocking the Power of Data Lock 5 Dotplot In a dotplot, each case is represented by a dot and dots are stacked. Easy way to see each case Minitab: Graph -> Dotplot -> One Y -> Simple ? ?

Statistics: Unlocking the Power of Data Lock 5 Histogram The height of the each bar corresponds to the number of cases within that range of the variable 5 states with obesity rate between and Minitab: Graph -> Histogram -> Simple to 33.75

Statistics: Unlocking the Power of Data Lock 5 Shape SymmetricLeft-SkewedRight-Skewed Long right tail

Statistics: Unlocking the Power of Data Lock 5 National Health and Nutrition Examination Survey

Statistics: Unlocking the Power of Data Lock 5 BMI of Americans

Statistics: Unlocking the Power of Data Lock 5 BMI of Americans The distribution of BMI for American adults is a) Symmetric b) Left-skewed c) Right-skewed

Statistics: Unlocking the Power of Data Lock 5 Notation The sample size, the number of cases in the sample, is denoted by n We often let x or y stand for any variable, and x 1, x 2, …, x n represent the n values of the variable x x 1 = 32.4, x 2 = 28.4, x 3 = 26.8, …

Statistics: Unlocking the Power of Data Lock 5 Mean Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics

Statistics: Unlocking the Power of Data Lock 5 Mean The average obesity rate across the 50 states is µ =

Statistics: Unlocking the Power of Data Lock 5 Median The median, m, is the middle value when the data are ordered. If there are an even number of values, the median is the average of the two middle values. The median splits the data in half. Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics

Statistics: Unlocking the Power of Data Lock 5 Measures of Center For symmetric distributions, the mean and the median will be about the same For skewed distributions, the mean will be more pulled towards the direction of skewness

Statistics: Unlocking the Power of Data Lock 5  = Measures of Center m = Mean is “pulled” in the direction of skewness

Statistics: Unlocking the Power of Data Lock 5 Skewness and Center A distribution is left-skewed. Which measure of center would you expect to be higher? a) Mean b) Median

Statistics: Unlocking the Power of Data Lock 5 Outlier An outlier is an observed value that is notably distinct from the other values in a dataset.

Statistics: Unlocking the Power of Data Lock 5 Outliers More info here

Statistics: Unlocking the Power of Data Lock 5 Resistance A statistic is resistant if it is relatively unaffected by extreme values. The median is resistant while the mean is not. MeanMedian With Outlier Without Outlier

Statistics: Unlocking the Power of Data Lock 5 Outliers When using statistics that are not resistant to outliers, stop and think about whether the outlier is a mistake If not, you have to decide whether the outlier is part of your population of interest or not Usually, for outliers that are not a mistake, it’s best to run the analysis twice, once with the outlier(s) and once without, to see how much the outlier(s) are affecting the results

Statistics: Unlocking the Power of Data Lock 5 Standard Deviation The standard deviation for a quantitative variable measures the spread of the data Sample standard deviation: s Population standard deviation:  (“sigma”) Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics

Statistics: Unlocking the Power of Data Lock 5 Standard Deviation The larger the standard deviation, the more variability there is in the data and the more spread out the data are The standard deviation gives a rough estimate of the typical distance of a data values from the mean

Statistics: Unlocking the Power of Data Lock 5 Standard Deviation Both of these distributions are bell-shaped

Statistics: Unlocking the Power of Data Lock 5 Two Ways of Measuring Obesity Differences? States as cases Individual people as cases

Statistics: Unlocking the Power of Data Lock 5 95% Rule If a distribution of data is approximately symmetric and bell-shaped, about 95% of the data should fall within two standard deviations of the mean.

Statistics: Unlocking the Power of Data Lock 5 The 95% Rule

Statistics: Unlocking the Power of Data Lock 5 95% Rule Give an interval that will likely contain 95% of obesity rates of states. 47/50 = 94%

Statistics: Unlocking the Power of Data Lock 5 95% Rule Could we use the same method to get an interval that will contain 95% of BMIs of American adults? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 The 95% Rule StatKey

Statistics: Unlocking the Power of Data Lock 5 The 95% Rule The standard deviation for hours of sleep per night is closest to a) ½ b) 1 c) 2 d) 4 e) I have no idea

Statistics: Unlocking the Power of Data Lock 5 z-score z-score measures the number of standard deviations away from the mean

Statistics: Unlocking the Power of Data Lock 5 z-score A z-score puts values on a common scale A z-score is the number of standard deviations a value falls from the mean Challenge: For symmetric, bell-shaped distributions, 95% of all z-scores fall between what two values?

Statistics: Unlocking the Power of Data Lock 5 z-score Which is better, an ACT score of 28 or a combined SAT score of 2100? ACT:  = 21,  = 5 SAT:  = 1500,  = 325 Assume ACT and SAT scores have approximately bell-shaped distributions a) ACT score of 28 b) SAT score of 2100 c) I don’t know

Statistics: Unlocking the Power of Data Lock 5 Other Measures of Location Maximum = largest data value Minimum = smallest data value Quartiles: Q 1 = median of the values below m. Q 3 = median of the values above m.

Statistics: Unlocking the Power of Data Lock 5 Five Number Summary Five Number Summary: MinMaxQ1Q1 Q3Q3 m  25%  Minitab: Stat -> Basic Statistics -> Display Descriptive Statistics

Statistics: Unlocking the Power of Data Lock 5 Five Number Summary The distribution of number of hours spent studying each week is a) Symmetric b) Right-skewed c) Left-skewed d) Impossible to tell > summary(study_hours) Min. 1st Qu. Median 3rd Qu. Max

Statistics: Unlocking the Power of Data Lock 5 Percentile The P th percentile is the value which is greater than P% of the data We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better We could also have used percentiles:  ACT score of 28: 91st percentile  SAT score of 2100: 97th percentile

Statistics: Unlocking the Power of Data Lock 5 Five Number Summary Five Number Summary: MinMaxQ1Q1 Q3Q3 m  25%  0 th percentile 100 th percentile 50 th percentile 75 th percentile 25 th percentile

Statistics: Unlocking the Power of Data Lock 5 Measures of Spread Range = Max – Min Interquartile Range (IQR) = Q 3 – Q 1 Is the range resistant to outliers? a) Yes b) No Is the IQR resistant to outliers? a) Yes b) No

Statistics: Unlocking the Power of Data Lock 5 Comparing Statistics Measures of Center:  Mean (not resistant)  Median (resistant) Measures of Spread:  Standard deviation (not resistant)  IQR (resistant)  Range (not resistant) Most often, we use the mean and the standard deviation, because they are calculated based on all the data values, so use all the available information

Statistics: Unlocking the Power of Data Lock 5 Boxplot Median Q1Q1 Q3Q3 Lines (“whiskers”) extend from each quartile to the most extreme value that is not an outlier Minitab: Graph -> Boxplot -> One Y -> Simple Middle 50% of data

Statistics: Unlocking the Power of Data Lock 5 Boxplot Outlier *For boxplots, outliers are defined as any point more than 1.5 IQRs beyond the quartiles (although you don’t have to know that)

Statistics: Unlocking the Power of Data Lock 5 Boxplot This boxplot shows a distribution that is a) Symmetric b) Left-skewed c) Right-skewed

Statistics: Unlocking the Power of Data Lock 5 Summary: One Quantitative Variable Summary Statistics  Center: mean, median  Spread: standard deviation, range, IQR  5 number summary  Percentiles Visualization  Dotplot  Histogram  Boxplot Other concepts  Shape: symmetric, skewed, bell-shaped  Outliers, resistance  z-scores

Statistics: Unlocking the Power of Data Lock 5 To Do Read Sections 2.2 and 2.3 Do Homework 2.2, 2.3 (due Friday, 9/18)