POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Chapter 1 Exploring Data
Objectives 1.2 Describing distributions with numbers
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
1.1 Displaying Distributions with Graphs
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
1.3: Describing Quantitative Data with Numbers
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
CHAPTER 1 Exploring Data
UNIT ONE REVIEW Exploring Data.
Quantitative Data Continued
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
1.2 Describing Distributions with Numbers
Click the mouse button or press the Space Bar to display the answers.
Describing Distributions of Data
Warmup Draw a stemplot Describe the distribution (SOCS)
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
CHAPTER 1 Exploring Data
Describing Quantitative Data with Numbers
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1 Warm Up .
Exploratory Data Analysis
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Measures of Center.
Histograms and Measures of Center vs. Spread
Measures of Center and Spread
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Lesson Plan Day 1 Lesson Plan Day 2 Lesson Plan Day 3
Chapter 1: Exploring Data
Presentation transcript:

POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population What are some examples of population? Ex: BHS What are some examples of samples? Ex: Sample of BHS could be the Senior Class 1

INTERPRETING HISTOGRAMS Look at OVERALL PATTERN Center Spread Shape Symmetric? Skewed Right (tail right)? Skewed Left (tail left)? Unimodal, bimodal, multimodal? Look at striking DEVIATIONS Called OUTLIERS (lies outside the overall pattern)

PANCAKES VS. SKYSCRAPERS Histograms with too many intervals  Pancakes Histograms with too few intervals  Skyscrapers

HISTOGRAM ON CALCULATOR Stat, 1 (to enter data) Stat plot (to choose histogram) Zoom 9 (to set up axis) Window (to modify widths of bars)

WHY USE A STEMPLOT? Details: Easier to find the middle Shows the shape of the distribution Details: If data has too many digits, you can round off: 4.1385  4.14 5.2273  5.23 If data falls into too few stems, you can split them up, 0-4 and 5-9 so each stem appears twice. Babe Ruth Home Runs becomes: 2 2 2 5 3 4 3 5 4 11 4 66679 5 44 5 9 6 0 key 7 2 = 72 5

Median is a resistant measure of center When are the mean and median the same? When are they different? Is the mean or median enough? Only gives information about the center. We want to know about spread and variability Range – includes outliers So, it is often best to look at the middle two quartiles (the middle half of the data)

Interquartile Range (IQR) = Q3 - Q1 = 54-35 = 19 Quartiles, divides data into 4 equal parts: 1. Arrange data in order and locate the median (also Q2) 2. The first quartile, Q1, is the median of the first half of the data 3. The third quartile, Q3, is the median of the second half of the data Example: 5, 7, 10, 14, 18 19, 25, 29, 31, 33 Q1 = 10 Q2 = 18.5 Q3 = 29 Note: If odd number of data, do not include the median in your Q1 and Q3 Babe Ruth Data: 22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60 | | | Q1 = 35 Q2 = 46 Q3 = 54 Interquartile Range (IQR) = Q3 - Q1 = 54-35 = 19

One rule of thumb to identify outliers is to compute 1.5 * IQR. If a value falls above Q3 + 1.5 * IQR or below Q1 - 1.5 * IQR, then the value is an outlier.

For example, with Babe Ruth the IQR = 19. So what is an outlier? 1.5*19 = 28.5 54 + 28.5 = 82.5 and 35 – 28.5 = 6.5 So, there are no outliers

Find the 5 number summary for this data set. FIVE NUMBER SUMMARY A convenient way to describe the center and spread of a data set is the five number summary. The five number summary is defined as: Min (value), Q1, Median, Q3, Max (value) Here is an example: A Swiss study looked at the # of hysterectomies performed by 15 male doctors: 20 25 25 27 28 31 33 34 36 37 44 50 59 85 86 Find the 5 number summary for this data set.

Here is a box plot of this data. A very powerful graph Min Q1 Med Q3 Max 20 27 34 50 86 Here is a box plot of this data. A very powerful graph ** Must have a scale below the box plot – make the scale first, then plot the five number summary.

CALCULATOR INSTRUCTIONS BOXPLOT Enter data into list Stat plot Use down arrows to select the box plot and press ENTER Press ZOOM Press down arrow to ZoomStat and press ENTER 5 NUMBER SUMMARY Press STAT Press the right arrow to display the choices for STAT CALC Press ENTER to choose 1-Var Stats

BOXPLOT WITH OUTLIERS

ANOTHER MEASURE OF SPREAD: STANDARD DEVIATION A measure of spread that we have discussed is the 5 number summary. We use that when using the median as measure of center. A measure of spread used when using the mean as measure of center is called standard deviation. Measure of Center Measure of Spread Median 5 number summary Mean standard deviation

CALCULATION OF STANDARD DEVIATION 1. Find the mean. 2. Find the difference between each data item and the mean. Distance from the mean. 3. Square each difference and add them. Gets rid of any negatives and makes the larger differences even larger. 4. Find the average (mean) of these squared differences, but need to divide by n-1 rather than n. Average squared distance from the mean. 5. Take the square root of this average. Just average distance from the mean. N-1 because sum of deviations, no squares, is always 0 (half on each side), so once we know n-1 we know n N-1 is also more liberal in case there are errors in calculations 15

xi xi-mean (xi-mean)2 12 13 27 28 12-20 = -8 64 49 13-20 = -7 HERE IS HOW TO COMPUTE THE STANDARD DEVIATION. DATA: SET 12, 13, 13, 27, 27, 28. WHAT IS THE MEAN? 20 xi xi-mean (xi-mean)2 12 13 27 28 12-20 = -8 64 49 13-20 = -7 13-20 = -7 49 49 27-20 = 7 27-20 = 7 49 28-20 = 8 64 Add all the (xi-mean)2 and divide by n-1 and take the square root. 64+49+49+49+49+64= 324/(6-1)=64.8. (64.8)=8.05

The standard deviation (s) is the square root of the variance. The variance (s2) is the average of the squares of the differences of each observation from the mean. Here is the formula for standard deviation: Where xi = each data point xbar = sample mean n = number of values

WHY N-1? The idea of variance is the average squares of the deviations of observations from the mean. So why do we average by dividing by n-1 instead of n? The of the deviations, no squares, is always 0. So once we know n-1 of the deviations the nth one is known also.We are not averaging n unrelated numbers. N-1 because sum of deviations, no squares, is always 0 (half on each side), so once we know n-1 we know n N-1 is also more liberal in case there are errors in calculations

SO… The numbers are related. Only n-1 of the squared deviations can vary freely so we average by dividing by n-1. n-1 is called the degrees of freedom. Ex: If you have 4 markers and 4 people to choose their markers, how many of them will have FREEDOM of choice?

Probability Histogram Cumulative Histogram

ALWAYS PLOT DATA FIRST!