AP Statistics Overview

Presentation on theme: "AP Statistics Overview"— Presentation transcript:

AP Statistics Overview
Text: Mind On Statistics, by Jessica Utts and Robert Heckard Pg. 1—Stress definition of Statistics.

What is Statistics? Statistics is the science of learning from data.
Ex. Take a sample of 50 seniors and record the number of AP classes they are taking. Use this to make a prediction, or educated guess, about how many AP classes ALL seniors are taking. Parameter – summary measurement (ex: p, µ) that describes the population Statistic – summary measurement (ex: 𝒑 , 𝒙 ) that describes the sample Thanks to Texas A&M University at College Station, TX for giving me a wonderful opportunity to advance my teaching of Statistics. A special thanks to Dr. Jim Matis and Dr. Julie H. Carroll for their inspiration and dedication to improving the field of teaching statistics at the undergraduate level. Ask students why would they want to learn statistics. Besides the requirement for graduation…ask them if they ever read the sports summary statistics after games, watch the analysts predict stock market movements, watched the weather news forecast tomorrow’s climatic changes,…ALL of which involves statistics. Have they ever participated in a survey or experiment? I.E. phone surveys, internet surveys, medical experiments So what is Statistics? See if any volunteers will attempt to answer.

AP Statistics – At a Glance
Exploring Data (Chapters 1 – 4) Create Distributions (graph of data) Describe / Compare Distributions Observational Studies and Experiments (Ch 5) Anticipating Patterns (Chapters 6 – 9) Statistical Inference (Chapters 10 – 15)

The key to AP Stats: THINK—SHOW—TELL
Think first! Know where you’re headed and why. It will save you a lot of work. Show is what most people think Statistics is about. The mechanics of calculating statistics and making displays is important, but not the most important part of Statistics. Tell what you’ve learned. Until you’ve explained your results so that someone else can understand your conclusions, the job is not done. Text: Mind On Statistics, by Jessica Utts and Robert Heckard Humpty Dumpty sat on the wall, Humpty Dumpty had a great fall. All the king’s horses And all the king’s men Couldn’t put Humpty Dumpty Together again. We could all make a moral for this story such as Stay focused or The higher you get the greater the fall. Ch 1 contains 7 case studies that will be referred to continuously in the textbook. Read the moral first then the case study. STAY FOCUSED!

WHO is being described? How many?
Individuals are the objects described by a set of data. These individuals go by different names depending on the situation. Respondents Individuals who answer a survey. Subjects/ Participants People on who we experiment. Experimental Units Animals, plants, Web sites, and other inanimate subjects on which we experiment. Keep this simple. May want to discuss Discrete vs. Continuous quantitative variables.

WHAT are the variables? Units?
Variables – characteristics recorded about each individual Categorical Group of category names w/no order Eye Color (brown, blue, green) Quantitative Numerical values Weight (117lbs 170oz) Univariate Data One Variable Final Exam Scores Bivariate Data 2 Paired Variables Homework % vs. Final Exam Scores Keep this simple. May want to discuss Discrete vs. Continuous quantitative variables. Discrete Numbers have specific values # of desks, money Continuous Estimated numbers Time, height, age

CHAPTER 1 Exploring Data
Text: Mind On Statistics, by Jessica Utts and Robert Heckard Pg. 1—Stress definition of Statistics.

Summarize Categorical Data using a Bar Chart or Pie Chart
AP Scores 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 FREQUENCY AP SCORES

Dotplot for Univariate Quantitative Data

Stemplot for Quantitative Data
Ages of Death of U.S. First Ladies 3 | 4, 6 4 | 3 5 | 2, 4, 5, 7, 8 6 | 0, 0, 1, 2, 4, 4, 4, 5, 6, 9 7 | 0, 1, 3, 4, 6, 7, 8, 8 8 | 1, 1, 2, 3, 3, 6, 7, 8, 9, 9 9 | 7 3 | 4 indicates 34 years old Stem Leaf Leaf – single digit Do not skip stems Leafs – smallest to largest Leaf must be a single digit. Do not skip stems. Leafs in order from smallest to largest.

Split Stemplot 1 | 7 1 | 8, 9, 9, 9, 9, 9 2 | 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 2 | 2, 2, 2, 3, 3 2 | 4, 5 2 | 2 | 8 3 | 0, 1 Stem is split for every 2 leaves— (0, 1), (2, 3), (4, 5), (6, 7), and (8, 9) Age of 27 students randomly selected from Stat 303 at A&M

Split Stemplot 1 | 1 | 7, 8, 9, 9, 9, 9, 9 2 | 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4 2 | 5, 8 3 | 0, 1 3 | Stem is split for every 5 leaves—(0 thru 4) AND ( 5 thru 9) Age of 27 students randomly selected from Stat 303 at A&M

Back-to-back Stemplot
Babe Ruth Roger Maris | 0 | 8 | 1 | 3, 4, 6 5, 2 | 2 | 3, 6, 8 5, 4 | 3 | 3, 9 9, 7, 6, 6, 6, 1, 1 | 4 9, 4, 4 | 5 | 0 | 6 | 1 Number of home runs in a season

Cumulative Relative Frequency
Frequency - # of times something occurs Cumulative Frequency – keep adding Relative Frequency – percents Cumulative Relative Frequency – add percents (AKA ogive) See graphs on page 62 Letter Grade Frequency Cumulative Frequency Relative Frequency Cumulative Relative Frequency A B C D F

Histogram—Univariate Quantitative data
Frequency Count Classes should be equal width Reasonable width Reasonable starting point Roughly 7 bars Bars should touch This is not a bar graph! Univariate Variable Age Histogram is used to graphically display univariate quantitative data as the example shows. Smaller data sets can be sketched by hand with 5 to 7 equal width intervals. (Note: In Stat 303, we will be using the computer to generate graphs.) The vertical axis represent count (frequency) or it could represent percent (relative frequency).

Histograms Discrete vs. Continuous

Location—pth Percentile
The pth percentile of a distribution (set of data) is the value such that p percent of the observations fall at or below it. Suppose your Math SAT score is at the 80th percentile of all Math SAT scores. This means your score was higher than 80% of all other test takers.

5 Number Summary Minimum, Q1, Median, Q3, Maximum
Q1 (Quartile 1) is the 25th percentile of ordered data or median of lower half of ordered data Median (Q2) is 50th percentile of ordered data Q3 (Quartile 3) is the 75th percentile of ordered data or median of upper half of ordered data Range = Maximum – minimum IQR = Interquartile Range (Q3 – Q1) middle 50% Percentiles concept is implied but should be stressed to students that there exists other percentiles such as 56th percentile or the 98th percentile and what do these mean. Kth percentile means that k% of the ordered data values are at or below that data value. For example, if the median is 100, then 50% of the ordered data values falls at or below Also, (100-k)% represents the amount of ordered data that falls above the percentile data value. Outliers found by using the formula above creates an interval that if any data value falls outside that interval it is considered an outlier. We use this often combined with boxplots.

IQR(Interquartile Range) = Q3 – Q1
Calculating OUTLIERS “1.5IQR above Q3 or below Q1” IQR(Interquartile Range) = Q3 – Q1 Any point that falls outside the interval calculated by Q1- 1.5(IQR) and Q (IQR) is considered an outlier. Percentiles concept is implied but should be stressed to students that there exists other percentiles such as 56th percentile or the 98th percentile and what do these mean. Kth percentile means that k% of the ordered data values are at or below that data value. For example, if the median is 100, then 50% of the ordered data values falls at or below Also, (100-k)% represents the amount of ordered data that falls above the percentile data value. Outliers found by using the formula above creates an interval that if any data value falls outside that interval it is considered an outlier. We use this often combined with boxplots.

Calculate the 5 Number Summary
121, 132, 134, 154, 164, 175, 188, 192, 201, 203, 203 3, 4, 4, 5, 10, 12, 13, 24 Calculate the 5 Number Summary and Check for Outliers

Boxplot - Using 5 Number Summary
5# Summary of Computers: 250, 1000, 2950, 5400, 8600 1000 2950 5400 250 8600 Data from The Presence of Computers in American Schools, by Ronald E. Anderson and Amy Ronnkvist Teaching, Learning, and Computing: 1998 Survey, Report #2, Center for Research on Information Technology and Organizations, The University of California, Irvine and The University of Minnesota. Stress 25% of the ordered data falls within the interval from min to Q1, as 25% of the ordered data set falls within the interval from Q1 and median, as this continues 25% of the ordered data falls within the interval from median to Q3, and the final 25% of the ordered data set falls within the interval from Q3 to max. Although the spreads appear different in length, the amount of data is the same within each interval. The difference in spread indicates a difference in data variation within each interval (not amount of data). Remember the IQR contains the middle 50% of the ordered data set. Q3 Max min Q1 median

Boxplot and Modified Boxplot
Modified – show outliers 25% of data in each section

Comparative Parallel (Side by Side) Boxplots
Outliers Boxplot can be used to graphically display univariate data. As in the example here, a quick comparison can be made by separating the data by the categorical variable, gender. The five number summary (minimum, quartile 1, median, quartile 3, and maximum) are the breaking points in boxplot. If outliers exist, then these points are not included in the modified boxplot.

Mean or Median?

Robust (Resistant) Statistic
Median is resistant to extreme values (outliers) in data set. Mean is NOT robust against extreme values. Mean is pulled away from the center of the distribution toward the extreme value (“tails of graph”).

Of the 2 segments, where is the Mean with respect to the Median?
Remember the mean is pulled toward extreme values.

Where’s the Mean with respect to the Median?

Roughly speaking, standard deviation is the average distance values fall from the mean (center of graph). Let the arrow mark the center, mean. Each ring measures an average distance from the center, mean. Stress the the rings are of equal width (standard).

Population and Sample Standard Deviation
2 population variance s2 sample variance Be sure to go back over what each letter stands for in both formulas. Remind students what operation is performed by the summation symbol AND that a calculator or computer software will calculate these for them. Variance is another measure of spread and is calculated by squaring the standard deviation value. Students may ask why there is a difference dividing by n instead of n-1 for their respective formulas. Later in the course it hopefully will become clearer. What is Variance???

Variance = (Standard deviation)2
What is Variance? Variance = (Standard deviation)2

Calculated Standard Deviation is a measure of Variation in data
Sample Data Set Mean Standard Deviation 100, 100, 100, 100, 100 100 90, 90, 100, 110, 110 10 30, 90, 100, 110, 170 50 90, 90, 100, 110, 320 142 99.85 The first data set contained all the same value, so the mean is obvious and hopefully the standard deviation value is too. Since there is no variation in the data set, the standard deviation is zero. The second data set has a simple mean to calculate (mentally) but the standard deviation can be calculated using the formula on a side board. The third example has data values that are more spread out therefore the standard deviation value should be higher. The fourth example contains an outlier and dramatically affects the mean and standard deviation.

LET’S CUSS! Thanks to Texas A&M University at College Station, TX for giving me a wonderful opportunity to advance my teaching of Statistics. A special thanks to Dr. Jim Matis and Dr. Julie H. Carroll for their inspiration and dedication to improving the field of teaching statistics at the undergraduate level. Ask students why would they want to learn statistics. Besides the requirement for graduation…ask them if they ever read the sports summary statistics after games, watch the analysts predict stock market movements, watched the weather news forecast tomorrow’s climatic changes,…ALL of which involves statistics. Have they ever participated in a survey or experiment? I.E. phone surveys, internet surveys, medical experiments So what is Statistics? See if any volunteers will attempt to answer.

To describe a distribution: LET’S CUSS! Center Unusual Features Spread Shape Thanks to Texas A&M University at College Station, TX for giving me a wonderful opportunity to advance my teaching of Statistics. A special thanks to Dr. Jim Matis and Dr. Julie H. Carroll for their inspiration and dedication to improving the field of teaching statistics at the undergraduate level. Ask students why would they want to learn statistics. Besides the requirement for graduation…ask them if they ever read the sports summary statistics after games, watch the analysts predict stock market movements, watched the weather news forecast tomorrow’s climatic changes,…ALL of which involves statistics. Have they ever participated in a survey or experiment? I.E. phone surveys, internet surveys, medical experiments So what is Statistics? See if any volunteers will attempt to answer.

Mean, Median Unusual Features Gaps, Outliers, Clusters Spread Standard Deviation, Range, IQR Shape Normal, Symmetric, Skewed Right (left) CSS—Center, Spread, Shape. You need to be able to eyeball this information from a graph. Dotplot has 500 temperatures recorded at the Southpole for 379 months. Approximately where is the center? Median –54.6 or Mean –49.4 Approximately how spread out is the data? Overall from –69 to –21 or 48 degree variability Approximately what shape does the data show? Trimodal representing the “3” seasons (short summer, short fall & spring, and long winter) Any outliers? Potentially but not positive without further investigation.

CENTER Mean(, ) —add up data values and divide by number of data values Median—list data values in order, locate middle data value Data Set: 19, 20, 20, 21, 22 Mean and median of a data set may or may NOT be one of the values in the data set. Inform students that mu symbol represents the population mean and x bar represents the sample mean. Remind students that the data must be ranked prior to finding median by hand. Ask them to refer to their textbook for step by step instructions. Mean is Median is 20 since it is the middle number of the ranked (ordered) data values.

Cluster---Gaps---Potential Outliers
UNUSUAL FEATURES Cluster---Gaps---Potential Outliers

skewed left or symmetric or uniform.
SHAPE “Tail” points to right Skewed Right Normal – bell-shaped The shape can also be skewed left or symmetric or uniform.

Standard Deviation (about 10) or Range (80 – 150 or 70) or IQR (about 100 – 130)

Summary Features of Quantitative Variables
Center – Location Unusual Features – Outliers, Gaps, Clusters Spread – Variability Shape – Distribution Pattern CSS—Center, Spread, Shape. You need to be able to eyeball this information from a graph. Dotplot has 500 temperatures recorded at the Southpole for 379 months. Approximately where is the center? Median –54.6 or Mean –49.4 Approximately how spread out is the data? Overall from –69 to –21 or 48 degree variability Approximately what shape does the data show? Trimodal representing the “3” seasons (short summer, short fall & spring, and long winter) Any outliers? Potentially but not positive without further investigation.

How to Choose Measures of Center and Spread?
NON - SKEWED DISTRIBUTIONS – use mean and standard deviation SKEWED DISTRIBUTIONS – use 5# Summary

Comparing Distributions
CUSS COMPARE in CONTEXT GENERAL CONCLUSION

Linear Transformations using the height of all LHS Seniors (in inches)
What happens to center and spread if everyone is put in 3 inch heels (add 3 inches)? What happens to the center and spread if we change everyone height to feet (divide by 12)?

Summary of Linear Transformations
Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (IQR and standard deviation) by b. Adding the same number a (either positive, negative, or zero) to each observation adds a to measures of center and to quartiles but does not change measures of spread. NOTE: The shape NEVER changes!