Last chapter... Four Corners: Go to your corner based on if your birthday falls in the Winter, Spring, Summer, or Fall; 1 minute In your group, come to.

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Displaying & Summarizing Quantitative Data
Chapter 1 & 3.
1.2: Describing Distributions
CHAPTER 2: Describing Distributions with Numbers
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Momentary detour... Ideas for collecting data from our classroom; what would YOU like to collect? So far, social media, piercings, # pets, first pet,
Chapter 2:.  Come up to board and write the number of different types of social media YOU have used TODAY  If you are male, please use a blue marker.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
The Standard Deviation as a Ruler and the Normal Model
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 4 Describing Numerical Data.
Categorical vs. Quantitative…
Momentary detour... Ideas for collecting data from our classroom; what would YOU like to collect? So far, social media, piercings, # pets, first pet,
Chapter 2:.  Come up to board and write the number of different types of social media YOU have used TODAY  If you are male, please use a blue marker.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Numerical descriptors BPS chapter 2 © 2006 W.H. Freeman and Company.
Copyright © 2011 Pearson Education, Inc. Describing Numerical Data Chapter 4.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
Chapter 5 The Standard Deviation as a Ruler and the Normal Model.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Chapter 2:.  Come up to board and write the number of different types of social media YOU have used TODAY; write anywhere; no need to organize in any.
Chapters 1, 2, & 3 Yellow Stickie Questions… …from 02/22/16.
Histograms. Histograms have some similar characteristics as other graphical representations... Shape: Left skewed, right skewed, symmetric, unimodal,
Last chapter... Four Corners: Go to your corner based on if your birthday falls in the Winter, Spring, Summer, or Fall; 1 minute In your group, come to.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers.
Chapter 2:.  Come up to board and write the number of different types of social media YOU have used TODAY; write anywhere; no need to organize in any.
Unit 1 - Graphs and Distributions. Statistics 4 the science of collecting, analyzing, and drawing conclusions from data.
Interpreting Categorical and Quantitative Data. Center, Shape, Spread, and unusual occurrences When describing graphs of data, we use central tendencies.
One-Variable Statistics. Descriptive statistics that analyze one characteristic of one sample  Where’s the middle?  How spread out is it?  How do different.
Introduction to Statistics
Descriptive Statistics ( )
Numerical Summaries of Center & Variation
Introduction to Statistics
Module 7 to 10: Summarizing Data Graphically & Numerically
CHAPTER 1 Exploring Data
One-Variable Statistics
CHAPTER 1 Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Measures of Center Math 075 Fall 2016.
OLI Module 1 to 4: Introduction & Exploratory Data Analysis
CHAPTER 2: Describing Distributions with Numbers
Do-Now-Day 2 Section 2.2 Find the mean, median, mode, and IQR from the following set of data values: 60, 64, 69, 73, 76, 122 Mean- Median- Mode- InterQuartile.
Distributions and Graphical Representations
Numerical Descriptive Measures
Please take out Sec HW It is worth 20 points (2 pts
Topic 5: Exploring Quantitative data
Describing Distributions with Numbers
Quartile Measures DCOVA
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Mean As A Balancing Point
Good research questions
Chapter 1: Exploring Data
Exploratory Data Analysis
CHAPTER 2: Describing Distributions with Numbers
Measures of Center Math 075 Summer 2016.
Chapter 1: Exploring Data
Summary (Week 1) Categorical vs. Quantitative Variables
Mean As A Balancing Point
CHAPTER 2: Describing Distributions with Numbers
Module 10.
Compare and contrast histograms to bar graphs
Presentation transcript:

Last chapter... Four Corners: Go to your corner based on if your birthday falls in the Winter, Spring, Summer, or Fall; 1 minute In your group, come to a consensus about the three most important topics we learned and list them on the board. 5 minutes.

Last chapter, we learned... Appropriate graphical representations (numerical & categorical data) Always graph the data; always. Always embed context. Always. Describing numerical distributions/data sets via SOCS (the basics; we will get more sophisticated with our descriptions soon); do we use SOCS to describe categorical data distributions? Why or why not?

SOCS... Shape, Outlier(s), Center, Spread We loosely defined ‘center’ and ‘spread’ Now we will be much more specific & detailed... And remember, always embed context Here we go...

Word association time... When I say a word, you immediately write down what you think it means; don’t think, just write. Don’t talk; don’t say anything to anyone. Ready?

Word association time... Average

Patrons in a diner... The annual salaries of 7 patrons in a diner are listed below. Find the mean and the median using Stat Crunch Are the mean and the median similar? Would they represent a ‘typical’ or ‘average’ customer’s salary? Should we use the mean or the median in this case? Graph the data (let’s practice a histogram; then a box plot) using Stat Crunch. What shape is the distribution? $45,000$48,000 $52,000$40,000 $35,000$58,000 $46,000

Now, Bill Gates walks into the diner... Find the mean and the median using Stat Crunch Are the mean and the median similar? Would both or either represent a ‘typical’ or ‘average’ customer’s salary? Should we use the mean or the median in this case? Graph the data (histogram; box plot) using Stat Crunch. What shape is the distribution? $45,000$48,000 $52,000$40,000 $35,000$58,000 $46,000$3,710,000,000

What’s the moral of this story? Means are excellent measures of central tendency if the data is (fairly) symmetric However, means are highly influenced by outlier(s) So, if the data has an outlier(s), then a better measure of central tendency is the median, which is not influenced by outliers; this is called ‘resistant’ So, consider the shape of data/distribution, then wisely choose an appropriate measure of central tendency

Which measure of central tendency should we use?.

Which is larger: mean or median? Which should we use to describe the ‘typical’ or middle value?

The ‘C’ in SOCS So, when we are analyzing a numerical distribution (like looking at a histogram, stem plot, box plot, etc.), we need to wisely choose which ‘C’ to use... mean or median Generally, if symmetric use mean (or median) as a measure of central tendency; they will be similar in value (or the same) If skewed (left or right) use median as a measure of central tendency; why?

Measures of Spread What is the median of each of the following data sets (you can use Stat Crunch if you need to): (4, 4, 5, 6, 6)(5, 5, 5, 5, 5) Are they the same distribution/data set? Another characteristic that is helpful in describing distributions/data sets is the measure of spread (or the typical distance from the center)

Spread... The second ‘S’ in SOCS Another characteristic that is helpful in describing distributions/data sets is the measure of spread (or the typical distance from the center) Two measures of spread that we will focus on in this course are the standard deviation & inter-quartile range

Standard Deviation is... a typical distance of the observations from their mean is a number that measures how far away the typical observation is from the center of the distribution

Let’s play the standard deviation game... Your team’s task: Create a data set of four whole numbers (from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) with the lowest standard deviation value possible Input your four numbers (again use numbers from 0 to 10 only) into Stat Crunch, then calculate the standard deviation Change a value or values until you get the lowest possible standard deviation you can. 3 minutes. Go. Now create a data set (again only from 0 to 10) with the largest possible standard deviation.

Which has the largest SD?

Calculating the standard deviation...

Variance... Another measure of spread Not used very often; usually, if we use a mean as a measure of central tendency, we use the standard deviation as our measure of spread Variance is related to standard deviation variance = (standard deviation) 2 standard deviation =

The Empirical Rule... When distributions are uni-modal, ≈ symmetric, & mean ≈ median, then... life is beautiful Distribution is said to be ≈ Normal 68% of data within 1 standard deviation of mean 95% of data within 2 standard deviations of mean 99.7% of data within 3 standard deviations of mean

Rule (Empirical Rule) For (≈)Normal Distributions Only

Empirical ‘Model’...

Let’s practice...What percentage of adult females have a height that is: as tall or taller than 64.5”? Is this typical? 64.5” or shorter? Is this unlikely? between 62” and 67”? How common is this? either shorter than 59.5” or taller than 69.5”? taller than 67”? Is this unlikely? between 62” and 64.5”? Between 57” and 59.5”?

More practice with the Empirical ‘Model’... The weight of a certain type of chocolate bar is Normally distributed with a mean weight of 8.1 ounces and a standard deviation of 0.1 ounces. Draw the density curve and label 1, 2, & 3 standard deviation values on it. What proportion of chocolate bars weigh: 1) between 8 ounces and 8.2 ounces? How likely is this? 2) more than than 8.3 ounces? Is this common? 3) less than 7.8 ounces? Is this weight expected? 4) either less than 7.9 ounces or more than 8.3 ounces?

Your turn... Suppose that the age of retirement in a country is Normally distributed with a mean of 64 years of age with a standard deviation of 3.5 years. 1.What would you consider a ‘common’ age range at which to retire? 2.What percentage of people retire at age 71? 3.What percentage of people retire either after age 74.5 or before age 53.5? 4.What percentage of people retire between the ages of 64 and 67.5? 5. How likely is it that someone will retire at the age of 74.5 or older?

Slight detour... We will get back to Empirical Rule in a few minutes... Is 120 big or small? Think – Pair - Share

TPS... Is 120 big or small? Big if... day’s temperature in LA in degrees Fahrenheit or # units a student takes during a semester (really big!) Small if... monthly rent paid for an apartment in LA Usual or ‘average’ if... weight in pounds for a 15-year-old girl or systolic blood pressure Nearly impossible to answer how unusual 120 is unless we know what we are comparing 120 to.

Something else to consider... A student’s ACT score was 25.9; their SAT score was Which is a better score? ACT scores’ (national) mean = 21, standard deviation 4.7 SAT (national) mean (critical reading & math) = 1010, standard deviation = 163

When we have a Normal distribution, then... z- Scores, standardizing... When we have a Normal distribution, we can calculate z-scores, or standardizing data, convert raw data into # of SD’s away from mean

Let’s practice with some of our heights...

Data gathering time again... # siblings you have on board & enter into Stat Crunch Numerical analysis (statistical summary in Stat Crunch) and graphical representation Describe the distribution

Skewed? Shouldn’t use mean & SD But we still need to describe the center and the spread of the distribution Use median and IQR (Inter-quartile Range) Median & IQR are not effected by outlier(s) (resistant) IQR = Q3 – Q1 IQR is amount of space the middle 50% of the data occupy

Range of data... Another measure of variability (used with any distribution) is range Range = maximum value – minimum value Range for our data =

Boxplots...based on 5-number summary

Boxplots...

Modified boxplot – shows outlier(s)

Two modified boxplots...

What are outliers? Boxplots are the only graphical representation where we specifically define an outlier Potential outliers are values that are more than 1.5 IQRs from Q1 or Q3 IQR x 1.5; add that product to Q3; any value(s) beyond that point is an outlier to the right Q1; any value(s) beyond that point is an outlier to the left

Go back to our Siblings data... Using Stat Crunch, calculate descriptive statistics Let’s calculate (by hand) to see if we have any outliers Q3 – Q1 = IQR IQR x 1.5; add this product to Q3; are there any values in our data set beyond this point to the right? IQR x 1.5; subtract product from Q1; are there any values in our data set beyond this point to the left? Now use Stat Crunch to create a boxplot; are our calculations confirmed with our boxplot?

Be careful with outliers... Are they really an outlier? Is your data correct? Was it input accurately? COC’s recent 99-year-old graduate Don’t automatically throw out an unusual piece of data; investigate

Be careful... one more thing...

Partner Practice...

Your turn... In pairs, choose a set of data from the Math 140 spreadsheet that is skewed (to left or right); you probably won’t know if the data is skewed until you copy and paste into Stat Crunch and create a graph Create a box plot; print out; put your names on it Label (on the graph) the 5-number summary (with arrows pointing to each value on the graph) Analyze through SOCS (which measure of central tendency should you use? Which measure of spread should you use?); be sure you show your work to justify that a point/points are outliers Now, using the same data, create a histogram. What characteristics of the data does the histogram show that the box plot does not?

Homework... Practice , 3.3, 3.4, 3.11, 3.13, 3.14, 3.21, 3.28, 3.33, 3.34, 3.42, 3.49

Let’s talk about Exam #1...