AP Statistics Student Curriculum Review- Fall 2018

AP Statistics Student Curriculum Review- Fall 2018
Mrs. Daniel Alonzo & Tracy Mourning Sr. High

This PowerPoint is posted at: teachers.dadeschools.net/sdaniel

Agenda Warm-Up: 2018 #5 Smelling Parkinson's Activity
Chapter 2: Z-scores & Normal Distributions FRQ: 2011 #1 FRQ: 2013 #6 Chapter 3: Scatterplots, Correlation & Linear Regression FRQ: 2016 #6

Warm-Up: 2018 #5

Smelling Parkinson's Activity

Video Link

Two Normal curves, showing the mean µ and standard deviation σ.
Normal Distributions All Normal curves are symmetric, single-peaked, and bell-shaped A Specific Normal curve is described by giving its mean µ and standard deviation σ. Two Normal curves, showing the mean µ and standard deviation σ.

Normal Distributions We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ. The mean of a Normal distribution is the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change-of-curvature points on either side.

Normal Distributions are Useful…
Normal distributions are good descriptions for some distributions of real data. Normal distributions are good approximations of the results of many kinds of chance outcomes. Many statistical inference procedures are based on Normal distributions. Normal Distributions will appear AGAIN and AGAIN. Chapter 6, Chapter 8, Chapter 9 and Chapter 10!

Importance of Standardizing
There are infinitely many different Normal distributions; all with unique standard deviations and means. In order to more effectively compare different Normal distributions we “standardize”. Standardizing allows us to compare apples to apples. We can compare SAT and ACT scores by standardizing.

The Standardized Normal Distribution
All Normal distributions are the same if we measure in units of size σ from the mean µ as center. The standardized Normal distribution is the Normal distribution with mean 0 and standard deviation 1.

x= variable µ= mean σ= standard deviation Z-score Formula

Let’s Practice… Venus Williams has a very fast first serve. Historically, Ms. Williams’ first serve averages 88 mph with a standard deviation of 12 mph. What is the standard normal score (z-score) for a first serve at 75 mph? What is the standard normal score (z-score) for a first serve at 105 mph?

Let’s Practice… Venus Williams has a very fast first serve. Historically, Ms. Williams’ first serve averages 88 mph with a standard deviation of 12 mph. A. What is the standard normal score for a first serve at 75 mph? z = 𝟕𝟓 −𝟖𝟖 𝟏𝟐 = B. What is the standard normal score for a first serve at 105 mph? z = 𝟏𝟎𝟓 −𝟖𝟖 𝟏𝟐 = 1.42

FRQ: 2011 #1

FRQ 2011 #1a No, it is not reasonable to believe that the distribution of 40-yard running times is approximately normal, because the minimum time is only 1.33 standard deviations below the mean. In a normal distribution we expect about 3 deviations below the mean. z = 4.4 − = -1.33

FRQ 2011 #1b The z-score for a player who can lift a weight of 370 pounds is z = 370− = 2.4. The z-score indicates that the amount of weight the player can lift is 2.4 standard deviations above the mean for all previous players in this position. He is very strong!

FRQ 2011 #1c Player A. Although the z-score are similair for weight, Player A has a significantly lower 40 yard dash time as evidenced by (-1.2 for A vs for B).

Calculating Probabilities/ Percentages using z-scores

By Hand/Table Using z-score, look up p-value in Standard Normal Table.
Draw and label Normal curve. Use z-score formula, plug in values and solve. Using z-score, look up p-value in Standard Normal Table. Conclude in context.

Calculator Draw and label Normal curve.
Plug in lower bound, upper bound, mean and standard deviation. Conclude in context.

By Hand vs. Calculator AP awards FULL credit for answers done “by hand/table” or using the calculator. Must show work Hand/Table: z-score formula plugged in Calculator: syntax with labels Lower/upper bounds, mean and standard deviation Calculator leads to less errors and is faster

TI-84 Calculator: NormalCDF
2nd, VARS (Distr) 2:normalcdf( Enter the following information: Lower: (the lower bound of the region OR 1^-99) Upper: (the upper band of the region OR 1,000,000) µ: (mean) 𝜃: (standard deviation) Press enter, number that appears is the p-value

TI-Nspire: NormalCDF Select Calculator (on home screen), press center button. Press menu, press enter. Select 6: Statistics, press enter. Select 5: Distributions, press enter. Select 2: Normal Cdf, press enter. Enter the following information: Lower: (the lower bound of the region OR 1^-99/-∞) Upper: (the upper band of the region OR 1,000,000/+∞) µ: (mean) 𝜃: (standard deviation) Press enter, number that appears is the p-value

Let’s Practice… According to Edmunds, 2015 Honda Civics have an average fuel efficiency of 25 mpg with a standard deviation of 4.5mpg. What is the probability that a randomly selected car with have a gas mileage of 30 or lower?

Solution

Let’s Practice… According to ACT, the average ACT score for college bound seniors was 20.8 with a standard deviation of 4.8. Jose knows he was in the 82nd percentile. What was his ACT score?

Solution

Normal Calculations on Calculator
Calculates Example NormalCDF Probability of obtaining a value BETWEEN two values What percent of students scored between 70 and 95 on the test? InvNorm X-value given probability or percentile Tommy scored in the 92nd percentile on the test; what was his raw test score? NormalPDF (RARE) Probability of obtaining PRECISELY or EXACTLY a specific x-value What is the probability that Suzy scored exactly a 75 on the test?

Let’s Practice… According to ACT, the average ACT score for college bound seniors was 20.8 with a standard deviation of 4.8. A. What percentage of college bound seniors scored lower than 19 on the ACT? B. What percentage of college bound seniors scored between 27 and 32 on the ACT?

Let’s Practice… According to ACT, the average ACT score for college bound seniors was 20.8 with a standard deviation of 4.8. A. What percentage of college bound seniors scored lower than 19 on the ACT? Normalcdf(0, 19, 20.8, 4.8)= … B. What percentage of college bound seniors scored between 27 and 32 on the ACT? Normalcdf(27, 32, 20.8, 4.8)= …

Let’s Practice… According to ACT, the average ACT score for college bound seniors was 20.8 with a standard deviation of 4.8. C. What percentage of college bound seniors scored a 33 or greater on the ACT? D. If Juan scored in the 90th percentile, what was his ACT score?

Let’s Practice… According to ACT, the average ACT score for college bound seniors was 20.8 with a standard deviation of 4.8. C. What percentage of college bound seniors scored a 33 or greater on the ACT? Normalcdf(33, 36, 20.8, 4.8)= … D. If Juan scored in the 90th percentile, what was his ACT score? Invnorm(.90, 20.8, 4.8)= (or 27 on ACT)

MC #1 Scores on the ACT college entrance exam follow a bell-shaped distribution with mean 18 and standard deviation 6. Wayne’s standardized score on the ACT was −0.7. What was Wayne’s actual ACT score? (a) 4.2 (b) −4.2 (c) 13.8 (d) 17.3 (e) 22.2

MC #2 Which of the following is least likely to have a nearly Normal distribution? (a) Heights of all female students taking STAT 001 at State Tech. (b) IQ scores of all students taking STAT 001 at State Tech. (c) SAT Math scores of all students taking STAT 001 at State Tech. (d) Family incomes of all students taking STAT 001 at State Tech. (e) All of (a)–(d) will be approximately Normal.

MC #3 The scores on the real estate licensing exam given in Florida are Normally distribution with a standard deviation of 70. What is the mean test score if 25% of the applicants score above 475? 416 b. 428 c. 468 d. 522 e. Not enough information to answer question.

MC #4 Polly takes three standardized tests. She scores 600 on all three tests. The scores are Normal distributed. Rank her performance on the three tests. I, II and III b. III, II, and I c. I, III and II d. III, I, and II e. II, I and III

MC #5 The heights of American men aged 15 to 24 are approximately normally distributed with a mean of 68 inches and a standard deviation of 2.5 inches. About 20% of these men are taller than… a. 66 inches b. 68 inches c. 70 inches d. 72 inches e. 74 inches

FRQ: 2013 #6

Solutions a. The Western Pacific Ocean had more typhoons than the Eastern Pacific Ocean in all but one of these years. The average seems to have been about 31 typhoons per year in the Western Pacific Ocean, which is higher than the average of about 19 typhoons per year in the Eastern Pacific Ocean. The Western Pacific Ocean also saw more variability (in number of typhoons per year) than the Eastern Pacific Ocean; for example, the range of the frequencies for the Western Pacific is about 21 typhoons and only 10 typhoons for the Eastern Pacific.

Solutions b. The Western Pacific Ocean had a decreasing trend in number of typhoons per year over this time period, especially from about 2001 through In contrast, the Eastern Pacific Ocean was fairly consistent in the number of typhoons per year over this time period, with a slight increasing trend in the later years from 2005 through 2010.

Solutions c.

Solutions d.

Solutions The overall trends across this time period were more apparent with the moving averages than with the original frequencies. The moving averages reduce variability, making more apparent the overall decreasing trend in number of typhoons in the Western Pacific Ocean and the slight increasing trend in the number of typhoons in the Eastern Pacific Ocean. (ii) The year-to-year variability in number of typhoons is less apparent with the moving averages than with the original frequencies.

Scatterplots & Correlation

Scatterplots Make a scatterplot of the relationship between body weight and pack weight. Body weight is our eXplanatory variable. Body weight (lb) 120 187 109 103 131 165 158 116 Backpack weight (lb) 26 30 24 29 35 31 28

Making a Scatterplot: TI-84
Using List editor enter data into list1 and list2. Press 2nd, Y = (Stat Plot), 1, enter, Select: On Select: Scatter X: list 1 Y: list2 Select: Box Press “Zoom”, 9

Constructing a Scatterplot: TI-Nspire
Enter x values into list 1 and enter y values into list 2. Label each column. Label column x : weight and column y: bpack. Press HOME/On, click Add Data & Statistics

Constructing a Scatterplot: TI-Nspire
4. Move the cursor to the bottom of the screen and “click to add variable”. Select weight. 5. Move the cursor to the left of the screen and “click to add variable”. Select bpack.

What is Correlation? A mathematical value that describes the strength of a linear relationship between two quantitative variables. Correlation values are between -1 and 1. Correlation is abbreviated: r The strength of the linear relationship increases as r moves away from 0 towards -1 or 1.

What does “r” tell us?! Notice that the formula is the sum of the z-scores of x multiplied by the z-scores of y.

Scatterplots and Correlation

What does “r” mean? R Value Strength -1 Perfectly linear; negative
-0.75 Strong negative relationship -0.50 Moderately strong negative relationship -0.25 Weak negative relationship nonexistent 0.25 Weak positive relationship 0.50 Moderately strong positive relationship 0.75 Strong positive relationship 1 Perfectly linear; positive

Calculate Correlation: TI-84
Enter x values in list 1 and y values in list 2. Press Stats, arrow right to Calc Option 4: LinReg(ax + b) Enter Information: Xlist: L1, Ylist: L2 Calculate Correlation should be 0.79

Calculate Correlation: TI-Nspire
Enter x values in list 1 and y values in list 2. Press MENU, then 4: Statistics Option 1: Stat Calculations Option 3: Linear Regression mx + b X: a[] , Y: b[] , ENTER Correlation = r Correlation should be 0.79

Facts about Correlation
Correlation requires that both variables be quantitative. Correlation does not describe curved relationships between variables, no matter how strong the relationship is. Correlation is not resistant. r is strongly affected by a few outlying observations. Correlation makes no distinction between explanatory and response variables. r does not change when we change the units of measurement of x, y, or both. r does not change when we add or subtract a constant to either x, y or both. The correlation r itself has no unit of measurement.

R: Ignores distinctions between X & Y

R: Highly Effected By Outliers

Why?! Since r is calculated using standardized values (z-scores), the correlation value will not change if the units of measure are changed (feet to inches, etc.) Adding a constant to either x or y or both will not change the correlation because neither the standard deviation nor distance from the mean will be impacted.

Correlation Formula: Suppose that we have data on variables x and y for n individuals. The values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on. The means and standard deviations of the two variables are x-bar and sx for the x-values and y-bar and sy for the y- values. The correlation r between x and y is:

Least Squares Regressions

Regression Lines A regression line summarizes the relationship between two variables, but only in settings where one of the variables helps explain or predict the other. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x.

Least-Squares Regression Line
Different regression lines produce different residuals. The regression line we use in AP Stats is Least-Squares Regression. The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible.

Regression Line Equation
Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A regression line relating y to x has an equation of the form: ŷ = ax + b In this equation, ŷ (read “y hat”) is the predicted value of the response variable y for a given value of the explanatory variable x. a is the slope, the amount by which y is predicted to change when x increases by one unit. b is the y intercept, the predicted value of y when x = 0.

Regression Line Equation

Format of Regression Lines
𝑦 = x 𝑦 = predicted back pack weight x= student’s weight Format 2: Predicted back pack weight= (student’s weight)

TI-NSpire: LSRL Enter x data into list 1 and y data into list 2.
Press MENU, 4: Statistics, 1: Stat Calculations Select Option4: Linear Regression. Insert either name of list or a[] for x and name of list or b[] of y. Press ENTER.

TI-84: LSRL Enter x values in list 1 and y values in list 2.
Press Stats, arrow right to Calc Option 4: LinReg(ax + b) Enter Information: Xlist: L1, Ylist: L2, StoreRegEQ:Y1 (VARS, arrow right to Y-VARS, enter, enter, 1. Y1) Calculate To view: “Zoom”, 10

TI-NSPIRE: LSRL to View Graph
Enter x data into list 1 and y data into list 2. Be sure to name lists Press HOME/ON, Add Data & Statistics Enter variables to x and y axis. Click MENU, 4: Analyze Option 6: Regression Option 2: Show Linear (a + bx), ENTER

Interpreting Computer Regression Output
Be sure you can locate: the slope, the y intercept and determine the equation of the LSRL. 𝒚 = x 𝒚 = predicted.... x = explanatory variable

r2: Coefficient of Determination
r 2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset. In this example, r2 equals 60.6%. 60.6% of the variation in pack weight is explained by the linear relationship with bodyweight. (Insert r2)% of the variation in y is explained by the linear relationship with x.

Self Check Quiz! The data is a random sample of 10 trains comparing number of cars on the train and fuel consumption in pounds of coal. What is the regression equation? Be sure to define all variables. What is r2 telling you? Define and interpret the slope in context. Does it have a practical interpretation? Define and interpret the y-intercept in context. What is s telling you?

1. ŷ = x ŷ = predicted fuel consumption in pounds of coal x = number of rail cars % of the varation is fuel consumption is explained by the linear realtionship with the number of rail cars. 3. Slope = With each additional car, the fuel consuption increased by pounds of coal, on average. This makes practical sense. 4. Y-interpect = When there are no cars attached to the train the fuel consuption is pounds of coal. This has no practical intrepretation beacuse there is always at least one car, the engine. 5. S= On average, the model under predicts fuel consumption by pounds of coal using the least-squares regression line.

FRQ: 2016 #6

There is a positive linear association between number of semesters studied and starting salary in euros, which supports the newspaper’s claim.

The slope is , which means for each additional semester studied the starting salary increased by 1,159.4 euros, on average.

There is a strong, negative linear relationship between the number of semesters studied and starting salary when looking at each major individually.

Business majors have the lowest median starting salary, physics majors are in the middle and chemistry majors have the highest median starting salary.

Within each major there is a negative association between the number of semesters studied and median salary. The students that take longer within each major earn less, on average. However, the programs of study, like chemistry, that require more semesters of study result in higher median earnings.

MC #1 If women always married men who were 2 years older than themselves, what would the correlation between the ages of husband and wife be? (a) 2 (b) 1 (c) 0.5 (d) 0 (e) Can’t tell without seeing the data

MC #2 Smokers don’t live as long (on average) as nonsmokers, and heavy smokers don’t live as long as light smokers. You perform least-squares regression on the age at death of a group of male smokers y and the number of packs per day they smoked x. The slope of your regression line (a) will be greater than 0. (b) will be less than 0. (c) will be equal to 0. (d) You can’t perform regression on these data. (e) You can’t tell without seeing the data.

MC #3 Measurements on young children in Mumbai, India, found this least-squares line for predicting height (y) from arm span (x): 𝒚 = x. Measurements are in centimeters (cm). How much does height increase on average for each additional centimeter of arm span? (a) 0.93 cm (b) 1.08 cm (c) 5.81 cm (d) 6.4 cm (e) 7.33 cm

MC #4 Measurements on young children in Mumbai, India, found this least-squares line for predicting height (y) from arm span (x): 𝒚 = x. Measurements are in centimeters (cm). According to the regression line, the predicted height of a child with an arm span of 100 cm is about (a) cm. (b) 99.4 cm. (c) 93 cm. (d) 15.7 cm. (e) 7.33 cm.

MC #5 Measurements on young children in Mumbai, India, found this least-squares line for predicting height (y) from arm span (x): 𝒚 = x. Measurements are in centimeters (cm). One child in the Mumbai study had height 59 cm and arm span 60 cm. This child’s residual is (a) −3.2 cm. (b) −2.2 cm. (c) −1.3 cm. (d) 3.2 cm. (e) 62.2 cm.

Survey & Experimental Design

Sampling Design Sampling Design: method used to choose the sample from the population Types of Samples: Simple Random Sample Stratified Random Sample Systematic Random Sample Cluster Sample Multistage Sample

Simple Random Sample (SRS)
Consist of n individuals from the population chosen in such a way that every individual has an equal chance of being selected every set of n individuals has an equal chance of being selected

SRS Advantages Unbiased Easy Disadvantages
Large variance/high variability May not be representative Must be able to identify entire population

Stratified Random Sample
Population is divided into homogeneous (alike) groups called strata Strata 1: Seniors Strata 2: Juniors SRS’s are pulled from each strata Helps control for lurking variables

Advantages More precise unbiased estimator than SRS Less variability Cost reduced if strata already exists Disadvantages Difficult to do if you must divide stratum Formulas for SD & confidence intervals are more complicated

Common Strata What are some common stratas in the following areas?
Politics School

Systematic Random Sample
Pick a method of identifying subjects randomly before starting Requires strict adherence Example: Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample.

Cluster Sample Based upon location
Randomly pick a location & sample all there Examples: All houses on a certain block All houses in a specific zip code All students at specific schools in MDCPS All students in specific homeroom classes

Cluster Samples Disadvantages Advantages
Clusters may not be representative of population Formulas are complicated Advantages Unbiased Cost is reduced

Multistage Sample At least two separate levels/stages of SRS. Example:
Stage 1: Juniors vs. Seniors Stage 2: Divide the above groups (Juniors and Seniors) by AP, Regular and Honors….select 10 for each of the groups for a total of 60.

Sampling at a School Assembly
Describe how you would use the following sampling methods to select 80 students to complete a survey. (a) Simple Random Sample (b) Stratified Random Sample (c) Cluster Sample

Identify the Sampling Design
1)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, etc.) Then they randomly selected 3 colleges from each group.

2) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks.

3) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave.

Principal of Experimental Design:
Random Assignment of Treatment

Specific Types of Experimental Design
Double-Blind Single-Blind Matched Pairs Block Design

Double-Blind In a double-blind experiment, neither the subjects nor the experimenters know which treatment a subject received.

Matched Pair Design In a matched pair design, subjects are paired by matching common important attributes. Some times the results are a pre-test and post-test with the unit being “matched” to itself.

Matched Pair Design Example: Tire wear and tear.
Put one set of tires on the left side of the car and a different set on the right side of the car. This would help control the lurking variable of different driving styles (between teenage boys vs. teachers) and mileage driven.

FRQ 2011 # 3

a. Step 1: Generate a random integer between 1 and 9, inclusive, using a calculator, a computer program, or a table of random digits. Select all four apartments on the floor corresponding to the selected integer. Step 2: Generate another random integer between 1 and 9, inclusive. If the generated integer is the same as the integer generated in step 1, continue generating random integers between 1 and 9 until a different integer appears. Again select all four apartments on the floor corresponding to the second selected integer. The cluster sample consists of the eight apartments on the two randomly selected floors.

b. Because the amount of wear on the carpets in apartments with children could be different from the wear on the carpets in apartments without children, it would be advantageous to have apartments with children represented in the sample. The cluster sampling procedure in part (a) could produce a sample with no children in the selected apartments; for example, a cluster sample of the apartments on the third and sixth floors would consist entirely of apartments with no children. Stratified random sampling, where the two strata are apartments with children and apartments without children, guarantees a sample that includes apartments with and without children, which, in turn, would yield sample data that are representative of both types of apartments.

AP Statistics Student Curriculum Review- Fall 2018

Similar presentations

Presentation on theme: "AP Statistics Student Curriculum Review- Fall 2018"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

AP Statistics Student Curriculum Review- Fall 2018

Similar presentations

Presentation on theme: "AP Statistics Student Curriculum Review- Fall 2018"— Presentation transcript:

Similar presentations

About project

Feedback