Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conditions for Inference: What Are They and Why Do We Need Them?

Similar presentations


Presentation on theme: "Conditions for Inference: What Are They and Why Do We Need Them?"— Presentation transcript:

1 Conditions for Inference: What Are They and Why Do We Need Them?
Dr. Ellen Breazel Dr. Christy Brown Clemson University

2 Presenter Bios: Ellen Breazel
BS in Computer Engineering, Clemson, 2000 MS in Statistics, UGA, 2003 PhD in Statistics, UGA, 2008 Teaching Introductory Statistics at Clemson since 2008 Introductory Business Statistics Course Coordinator, 2013-present Clemson AP Statistics Practice Exam Coordinator, 2012 & 2013 AP Statistics Reader, Table Leader, Scoring Rubric Team, 2013 & 2014 Question Leader, 2015 & 2016 AP Statistics Consultant for the College Board, 2013 – present AP Statistics Test Development Committee, 2014 – present

3 Presenter Bios: Christy Brown
BSEd in Math Education, UGA, 2004 MS in Statistics, UGA, 2010 PhD in Psychometrics, UGA, 2013 AP Statistics Teacher at Loganville High School in GA, Teaching Introductory Statistics at Clemson since 2013 Introductory Statistics Course Coordinator starting Spring 2015 Clemson AP Statistics Practice Exam Coordinator, 2014 – 2016 AP Statistics Reader, Table Leader, 2016

4 Presentation Outline Model Solutions for Checking Inference Conditions
Inference for a Proportion Inference for the Difference of Two Proportions Inference for a Mean Inference for the Difference of Two Means Chi-Square Tests Activity: Conditions for Inference for a Proportion 2016 Free Response Question 5 Best Practices for AP Teachers

5 Student Perceptions of Checking Conditions
Many students have trouble checking the appropriate conditions for inference They forget to check conditions at all They make incorrect statements They write everything they can think of whether relevant or not They make a list of the conditions without showing any work demonstrating that the conditions are satisfied

6 Student Perceptions of Checking Conditions

7 WHY Do We Need to Check Conditions?
From Bock’s AP Central article: All of mathematics is based on If-Then statements. Example: If we have a right triangle then a2 + b2 = c2. Statistical inference is also based on If-Then statements, but most of the theorems and proofs are beyond the scope of an introductory course. The “If” part of a statement sets out the underlying assumptions in the proof that the statistical method works.

8 WHY Do We Need to Check Conditions?
Bock states that there are 3 types of assumptions: Unverifiable. We must simply accept these as reasonable – after careful thought (e.g., independent samples). Plausible, based on evidence. We test a condition to see if it’s reasonable to believe that the assumption is true (e.g., approximately normal population). False, but close enough (e.g., if np and n(1 – p) are both more than 10 the normal approximation to the binomial is close enough to provide reliable results).

9 WHY Do We Need to Check Conditions?
What happens if the distributional assumption in an inference procedure is not met? A 95% confidence interval based on the assumptions being met might not successfully capture the actual parameter value in 95% of all random samples. A calculated p-value might not be all that close to the actual probability that the test statistic takes the observed value or something more extreme when the null hypothesis is true. If your p-value is larger than what it should be, this leads to lower power and an increased Type II error rate (sometimes called a conservative test). If your p-value is smaller than what it should be, this leads to an increased Type I error rate.

10 WHY Do We Need to Check Conditions?
What should you do if the conditions aren’t met? If it’s a free response question on the AP exam, you should finish answering the question in hopes of partial credit (you probably made a mistake in checking the conditions!).  In reality, you should use nonparametric methods or resampling techniques, which are typically taught in a second course in Statistics.

11 What Are the Conditions for Inference?
What is the one condition required whenever we use sample data to make decisions or predictions about a population? RANDOMIZATION Random sampling for surveys/observational studies Random assignment in experiments

12 What Are the Conditions for Inference?
Several fun activities demonstrating that we aren’t that good at using our own judgment to pick representative samples: Random Rectangles Burrill, G., Franklin, C., Godbold, L., & Young, L. (2003). Navigating through Data Analysis in Grades 9 – 12. Reston, VA: NCTM. Gettysburg Address Rossman, A., & Chance, B. (2012). Workshop Statistics: Discovery with Data (4th ed.). Hoboken, NJ: Wiley. Show Me the Money! Breazel, E. H., Duggins, J. W., & Tyson, D. (2013). AP Statistics Curriculum Module: Understanding Random Sampling and Random Assignment. New York: The College Board.

13 Random Sampling vs. Random Assignment

14 Inference for ONE Proportion
Conditions: Inference for ONE Proportion

15 Inference for a Proportion
Conditions for Inference for a Proportion: Randomization in gathering data. If we expect at least 10 successes (np ≥ 10) and 10 failures (n(1 – p) ≥ 10), then the binomial distribution can be considered approximately normal (or 5 or 15…). Check for CIs: np̂ ≥ 10 and n(1 – p̂) ≥ 10 Check for Sig Tests: np0 ≥ 10 and n(1 – p0) ≥ 10

16 Inference for a Proportion – 2011 Q6a
Recently, a random sample of 9,600 twelfth-grade students were administered a multiple-choice United States history exam. One of the questions is below. (The correct answer is C.) Of the 9,600 students, 28 percent answered the question correctly. Let p be the proportion of all United States twelfth-grade students who would answer the question correctly. Construct and interpret a 99 percent confidence interval for p. In 1935 and 1936 the Supreme Court declared that important parts of the New Deal were unconstitutional. President Roosevelt responded by threatening to impeach several Supreme Court justices eliminate the Supreme Court appoint additional Supreme Court justices who shared his views override the Supreme Court’s decisions by gaining three-fourths majorities in both houses of Congress

17 Inference for a Proportion – 2011 Q6a
Model Solution: The appropriate inference procedure is a one-sample z-interval for a population proportion p, where p is the proportion of all United States twelfth-grade students who would answer the question correctly. The conditions for this inference procedure are satisfied because: The question states that the students are a random sample from the population, and 2. n×p̂ = 9,600×0.28 = 2,688 and n×(1 – p̂) = 9,600×0.72 = 6,912 are both much larger than 10.

18 Inference for a Proportion – 2011 Q6a
Scoring: Essentially correct (E) if the response includes the following three components: Identifies the correct inference procedure. Checks the randomness condition. Checks the large sample size condition. Notes: “Random sample given” is sufficient for the second component. To satisfy the third component, the response: Must check both the number of successes and the number of failures. Must use a reasonable criterion (for example, ≥ 5 or ≥ 10). Must provide numerical evidence (for example, 2,688 ≥ 10 and 6,912 ≥ 10, or 9,600×0.28 ≥ 10 and 9,600×0.72 ≥ 10). Any checks of reasonable conditions, such as independence of observations, sample size less than 10 percent of population size, 9,600 > 30, etc. should be considered extraneous. However, if a response includes an incorrect condition, such as population normality, reduce the score from E to P or from P to I. Any reference to the central limit theorem should be treated as extraneous and not sufficient for the large sample size condition.

19 Inference for a Proportion – 2011 Q6a
Common Student Errors: Many students did not check conditions for inference. Some students incorrectly checked the sample size condition, for example, by merely comparing the sample size to 30, or by simply stating that the Central Limit Theorem holds, or by commenting ambiguously that the distribution is normal. Recommendations for Teachers: Provide much guidance and practice for determining which inference procedure to use, depending on the type of parameter (e.g., proportion or mean) involved, which in turn depends on the type of variable(s) (categorical or quantitative) in the study. Make sure that students know they must check conditions for inference with confidence interval procedures, as well as with significance test procedures. Help students to realize that the specific conditions to be checked depend on the type of parameter involved (e.g., proportion or mean, one group or two).

20 Inference for a Proportion
What about the 10% condition? For 𝑝 = 𝑋 𝑛 to have an approximately normal distribution, X is assumed to be a binomial random variable with a probability of success that remains constant from trial to trial. In sampling without replacement, the probability of success in later trials is affected by the number of successes in previous trials (e.g., the probability of drawing an ace from a deck of cards changes with each card drawn). If the sample size is small relative to the size of the population (say, less than 10%), then these success probabilities will stay approximately the same (think about 4/52 to 3/51 versus 4000/5200 to 3999/5199).

21 Inference for TWO Proportions
Conditions: Inference for TWO Proportions

22 Inference for Two Proportions
Conditions for Inference for Two Proportions: The data come from independent random samples or from random assignment to two groups. If there are at least 10 successes and at least 10 failures in both groups, then the test statistic will follow an approximately normal distribution. In finding the standard error for the difference between two proportions, we are adding the variances of two random variables, which requires independence.

23 Inference for Two Proportions – 2015 Q4
A researcher conducted a medical study to investigate whether taking a low-dose aspirin reduces the chance of developing colon cancer. As part of the study, 1,000 adult volunteers were randomly assigned to the experimental group that took a low-dose aspirin each day, and the other half were assigned to the control group that took a placebo each day. At the end of six years, 15 of the people who took the low-dose aspirin and 26 of the people who took the placebo had developed colon cancer. At the significance level 𝛼 = 0.05, do the data provide convincing statistical evidence that taking a low-dose aspirin each day would reduce the chance of developing colon cancer among all people similar to the volunteers?

24 Inference for Two Proportions – 2015 Q4
Model Solution: The appropriate procedure is a two-sample z-test for comparing proportions. Because this is a randomized experiment, the first condition is that the volunteers were randomly assigned to one treatment group or the other. The condition is satisfied because we are told that the volunteers were randomly assigned to take a low-dose aspirin or a placebo. The second condition is that the sample sizes are large, relative to the proportions involved. The condition is satisfied because all sample counts are large enough; that is, 15 with colon cancer in aspirin group, 26 with colon cancer in placebo group, 500 – 15 = 485 cancer-free in aspirin group, and 500 – 26 = 474 cancer-free in placebo group.

25 Inference for Two Proportions – 2015 Q4
Scoring: Essentially correct (E) if the response includes the following three components: Identifies the correct test procedure (by name or formula). Notes that the use of random assignment satisfies the randomness condition. Checks for approximate normality of the test statistic by citing that all four counts are larger than some standard criterion such as 5 or 10. Notes: For the randomness component, it is (minimally) acceptable to say “random assignment – check” but not acceptable to say “random – check” or “SRS – check.” The important concept here is that it is random assignment, and not random sampling, that is required. If the response implies that the study used a random sample, the randomness component is not satisfied, regardless of whether random assignment is correctly addressed. The normality check may use the expected counts under the null hypothesis in place of the observed counts.

26 Inference for Two Proportions – 2015 Q4
Common Student Errors: Many students had trouble checking the appropriate conditions for the test. For instance: Students incorrectly stated that the randomness condition was satisfied because a simple random sample was chosen, rather than because of random assignment. Students incorrectly stated that the normality condition was satisfied because both groups were larger than 30. Recommendations for Teachers: Teachers should emphasize the distinction between random samples (generally used for surveys and some observational studies) and random assignment (generally used in experimental studies). Teachers should avoid the use of abbreviations such as “SRS” as an acceptable way of describing randomness conditions generally, and require that students describe in words (complete sentences) the conditions they are checking and whether or not those conditions are satisfied.

27 Conditions: Inference for ONE MEAN

28 Inference for a Mean Conditions for Inference for a Mean:
Randomization in gathering data. The population has a normal distribution OR the sample size is large. To check if the normal population assumption is reasonable in the case of a small sample size, create a graphical display of the data and make sure that it does not display strong skewness or outliers. Acceptable graphical displays include a dotplot, a stem-and-leaf plot, a histogram (usually reserved for when n is larger), or a normal probability plot with a boxplot to check for outliers. Avoid using only a boxplot, as complete shape information cannot be determined from a boxplot.

29 Inference for a Mean – 2013 Q1b
An environmental group conducted a study to determine whether crows in a certain region were ingesting food containing unhealthy levels of lead. The lead levels of a random sample of 23 crows in the region were measured and recorded. The data are shown in the stemplot below. The mean lead level of the 23 crows in the sample was 4.90 ppm and the standard deviation was 1.12 ppm. Construct and interpret a 95 percent confidence interval for the mean lead level of crows in the region.

30 Inference for a Mean – 2013 Q1b
Model Solution: The appropriate procedure is a one-sample t-interval for a population mean. Conditions: 1. The sample is randomly selected from the population. 2. The population has a normal distribution, or the sample size is large. The first condition is met because we were told that the crows were randomly selected. The sample size of 23 is not considered large, so we need to examine the sample data to assess whether it is reasonable to assume that the population distribution of lead levels for all crows in this region is normal. The stem-and-leaf plot shows no strong skewness or outliers, so we will consider the second condition to be met.

31 Inference for a Mean – 2013 Q1b
Scoring: Essentially correct (E) if the response identifies a one-sample t-interval for a population mean (either by name or formula) AND also checks both the random sampling and the normality/large sample condition correctly. Note: Any reasonable comment about the normality displayed in the stem-and-leaf plot (or another appropriately sketched plot) is acceptable.

32 Inference for a Mean – 2013 Q1b
Common Student Errors: Students who successfully generated a list of conditions to be checked were not always able to check them correctly. Many students mistakenly thought that a sample size less than 30 determined that the normality condition was not satisfied, without realizing that with a small sample it’s important to check whether the sample data indicate that the population distribution might reasonably be considered normal. A surprising number of students entered the data into a calculator and produced a graphical display in addition to the stem-and-leaf plot provided in the question. This was not an error, but was an unnecessary step that wasted valuable time.

33 Inference for a Mean – 2013 Q1b
Recommendations for Teachers: Provide many opportunities for students to practice with identifying the appropriate inference procedure to address a particular research question. Also emphasize that t-procedures are used with quantitative data, where the relevant parameter is a population mean. Emphasize to students not only what the validity conditions are for particular inference procedures, but also how to check whether the conditions are satisfied in a given context. Students could also benefit by learning about why checking validity conditions is necessary.

34 Inference for a Mean Simulation
POPULATION DISTRIBUTION X 𝜇 For n = 5, 15, and 50, I drew 1000 samples from the population and calculated 𝑡= 𝑥 −2 𝑠 𝑛 t(4) t(14) t(49)

35 Inference for a Mean Simulation
Suppose we are testing the hypotheses H0: μ = 2 Ha: μ < 2 at the 5% significance level and a sample of size 5 gives an observed value of t = – 4.00. t(4) The p-value according to the t-distribution with 4 degrees of freedom is .0081, leading to an incorrect rejection of the null hypothesis. According to the simulated distribution of test statistics, we would observe a value of t less than – 4.00 about 9% of the time. Mistakenly using the t procedures for this test would lead to a Type I error rate of approximately 20%.

36 Inference for a Mean What about the 10% condition for means?
Use of the t procedures relies on the sampled individuals being independent draws from the same population. Whenever we sample without replacement, this assumption is false. The standard error is technically 𝑠 𝑛 1− 𝑛 𝑁 . If 𝑛 𝑁 =0.10 then the standard error is 𝑠 𝑛 (.94868) which is considered “close enough.”

37 Inference for TWO MEANS
Conditions: Inference for TWO MEANS

38 Inference for Two Means
Conditions for Inference for Two Means: The data come from independent random samples or from random assignment to two groups. The populations are normally distributed, OR both sample sizes are large.

39 Inference for Two Means – 2012 Q3b
Independent random samples of 500 households were taken from a large metropolitan area in the United States for the years 1950 and Histograms of household size (number of people in a household) for the years are shown below. A researcher wants to use these data to construct a confidence interval to estimate the change in mean household size in the metropolitan area from the year 1950 to the year State the conditions for using a two-sample t-procedure, and explain whether the conditions for inference are met.

40 Inference for Two Means – 2012 Q3b
Model Solution: The conditions for applying a two-sample t-procedure are: The data come from independent random samples or from random assignment to two groups; The populations are normally distributed, or both sample sizes are large; The population sizes are at least 10 times the sample size. The first condition is satisfied because independent random samples were selected for the years 1950 and The second condition is satisfied because the sample sizes (500 in each group) are quite large, despite the right skewness of the distributions of household sizes in the sample data. The third condition is satisfied because the number of households in the large metropolitan area in both 1950 and 2000 would easily exceed 10 × 500 = 5,000.

41 Inference for Two Means – 2012 Q3b
Scoring: Essentially correct (E) if the response correctly states and checks the following two conditions. The data come from independent random samples. Normality/sample size conditions. Note: The population size condition does not need to be checked to earn E or P.

42 Inference for Two Means – 2012 Q3b
Common Student Errors: Many students mistakenly believed that the sample had to be normally distributed in order for a t-procedure to be valid. Many students considered the normality and sample size conditions to be separate issues, not realizing that a large sample size allows for a t- procedure to be valid even with a population that is not normally distributed. Many students did not clearly specify that both samples needed to be randomly selected from their populations. Many students did not clearly distinguish between stating and checking conditions for inference. Some students tried to implement completely inappropriate checks, such as np > 10. Some students attempted to check the condition that the population size is at least 10 times larger than the sample size, but they often seemed to be unaware of why this condition matters and how it relates to other conditions.

43 Inference for Two Means – 2012 Q3b
Recommendations for Teachers: Expect students to clearly state and check conditions for inference often. In addition, help students understand the reason behind the conditions. For example, the t-distribution is not a close approximation to the sampling distribution of a sample mean when the sample size is small and the population distribution is nonnormal, so a 95 percent confidence interval based on the t-distribution might not successfully capture the actual parameter value in 95 percent of all random samples.

44 Conditions: CHI-SQUARE TESTs

45 Chi-Square Tests Conditions for Chi-Square Tests:
Randomization in gathering data. The expected counts for all cells of the table are at least 5.

46 Chi-Square Tests – 2013 Q4 The Behavioral Risk Factor Surveillance System is an ongoing health survey system that tracks health conditions and risk behaviors in the United States. In one of their studies, a random sample of 8,866 adults answered the question “Do you consume five or more servings of fruits and vegetables per day?” The data are summarized by response and by age-group in the frequency table below. Do the data provide convincing statistical evidence that there is an association between age-group and whether or not a person consumes five or more servings of fruits and vegetables per day for adults in the United States? Age Group (years) Yes No Total 18 – 34 231 741 972 35 – 54 669 2,242 2,911 55 or older 1,291 3,692 4,983 2,191 6,675 8,866

47 Chi-Square Tests – 2013 Q4 Model Solution:
The appropriate test is a chi-square test of independence. The conditions for this test were satisfied because: The question states that the sample was randomly selected. The expected counts for all six cells of the table were at least 5, as seen in the following table that lists expected counts in parentheses beside the observed counts: Age Group (years) Yes No Total 18 – 34 231 (240.2) 741 (731.8) 972 35 – 54 669 (719.4) 2,242 (2191.6) 2,911 55 or older 1,291 (1231.4) 3,692 (3751.6) 4,983 2,191 6,675 8,866

48 Chi-Square Tests – 2013 Q4 Scoring:
Essentially correct (E) if the response includes the following three components: Identifies a chi-square test of independence by name or formula. States AND verifies the random sampling condition. States AND verifies the technical condition that all expected counts were greater than 5. Notes: A response that identifies the test procedure as a chi-square test for homogeneity of proportions does not receive credit for component 1. Stating the condition that the expected counts must be greater than 5 is not in itself sufficient for satisfying component 3; the condition must be checked by reporting expected counts, or minimally reporting the value of the smallest expected count and indicating that it is at least 5. If the response includes an incorrect technical condition, such as “n ≥ 30” or “normality” then this will be considered a parallel solution and credit will not be granted for component 3. If the response states and verifies the condition that 80 percent of all expected counts must be ≥ 5 and all expected counts must be ≥ 1, then the response can receive credit for component 3.

49 Chi-Square Tests – 2013 Q4 Common Student Errors:
Some students stated an appropriate validity condition in terms of expected counts, but did not clearly demonstrate that the condition had been checked numerically. Some students neglected to mention that the condition of having selected a random sample was satisfied. Some students listed incorrect or inappropriate conditions, involving normality or Central Limit Theorem or sample size greater than 30. Recommendations for Teachers: Teachers should make students aware of the importance of always checking conditions for inference, based on the specific details of the study at hand, rather than merely stating assumptions for inference. Make students aware that checks require examination of the sample data and consideration of how sample data were collected.

50 Chi-Square Tests Homogeneity of Proportions vs. Independence:
Homogeneity of proportions asks whether the distribution of ONE VARIABLE is the same in TWO (or more) POPULATIONS. Group members will be identified in advance of collecting data, and so the table’s row (or perhaps column) totals will be fixed. Independence asks whether there’s an association between TWO VARIABLES in ONE POPULATION. Only the table’s total number of respondents is known in advance; the row and column totals appear only after the data have been tallied.

51 Conditions for Inference for One Proportion
PLOTTING ACTIVITY Conditions for Inference for One Proportion

52 ACTIVITY: Inference for a Proportion
Activity created by Bob Lochel (Hatboro-Horsham High School): Have students play with a binomial distribution explorer with sliders for values of n and p. Ask students to provide two settings of n and p which provide firm normality, and two settings which provide a clearly non-normal distribution.

53 Inference for a Proportion
Plot the values of n and p leading to normality in green and values that don’t seem normal in red. Overlay a graph of np ≥ 10. Overlay a graph of n(1 – p) ≥ 10.

54 Conditions for Inference for One Proportion
QUIZ ACTIVITY Conditions for Inference for One Proportion

55 We need a Binomial Experiment
Set # of Trials : Multiple Choice Test with n = 20 questions Success & Failure: “correct” or “incorrect” responses Probability of Success remains constant: 5 choices for each question where you are truly guessing the “correct” response Trials are independent: Each question is answered independently of each other

56 Random Variable X Let X = number of “correct” responses out of 20 Then X ~ Binomial (n=20, p = 0.2)

57 Take the Quiz Use your cell phone or any other device with internet (please share with your neighbor if they don’t have access) Go to: testmoz.com/775938 Take a guess at the “correct” response The “correct” response chosen by random number generator.

58 Distribution of X Now divide each of these values by 20 …..

59 Distribution of 𝑝 𝑛𝑝= =4  𝑛 1−𝑝 = =16 

60 Random Variable X NOW… Take a similar quiz with 35 questions and only 3 choices per question. Let Y = number of “correct” responses out of 35 Then Y ~ Binomial (n=35, p = 0.33)

61 Take the Quiz Use your cell phone or any other device with internet (please share with your neighbor if they don’t have access) Go to: testmoz.com/775882 Take a guess at the “correct” response The “correct” responses were chosen by random number generator.

62 Distribution of Y Now divide each of these values by 35 …..

63 Distribution of 𝑝 𝑛𝑝= =  𝑛 1−𝑝 = =23.33 

64 Parameters of p-hat distribution
𝑋 𝑋 𝑛 = 𝑝 Divide by n 𝜇 𝑝 = 𝑛𝑝 𝑛 =𝑝 𝜇 𝑋 =𝑛𝑝 𝜎 𝑝 = 𝑛𝑝(1−𝑝) 𝑛 = 𝑝(1−𝑝) 𝑛 𝜎 𝑋 = 𝑛𝑝 1−𝑝

65 Results for multiple 99% confidence intervals

66 Conditions for Inference for One Proportion
2016 QUESTION 5 Conditions for Inference for One Proportion

67 Environment Statement
Question A polling agency showed the following two statements to a random sample of 1,048 adults in the United States. The order in which the statements were shown was randomly selected for each person in the sample. After reading the statements, each person was asked to choose the statement that was most consistent with his or her opinion. The results are shown in the table. Environment statement: Protection of the environment should be given priority over economic growth Economy statement: Economic growth should be given priority over protection of the environment Environment Statement Economy Statement No Preference Percent of sample 58% 37% 5%

68 How does the ACTIVITY match up with Question?
ACTIVITY Population: Collection of outcomes from ALL questions asked (list of correct and incorrect) from all adults had they been asked Sample: Responses from one quiz n = 35 or 20 Replicates: Every quiz that was taken Question 5 Population: Collection of ALL responses (list of economy, ecology, or no pref.) from US adults at that time had they been asked Sample: Responses from the 1048 Replicates: None

69 Question 5 – Part (a) (a) Assume the conditions for inference have been met. Construct and interpret a 95 percent confidence interval for the proportion of all adults in the United States who would have chosen the economy statement.

70 Question 5 – part (b) (b) One of the conditions for inference that was met is that the number who chose the economy statement and the number who did not choose the economy statement are both greater than 10. Explain why it is necessary to satisfy that condition.

71 MODEL SOLUTION Question 5 – part (b)
The condition is necessary because the formula for the confidence interval relies on the fact that the binomial distribution can be approximated by a normal distribution which then results in the sampling distribution of 𝑝 being approximately normal. The approximation does not work well unless both 𝑛 𝑝 and 𝑛(1− 𝑝 ) are at least 10.

72 SCORING – Part (b): 1. Part (b) states the condition implies the sampling distribution of 𝑝 is approximately normal OR the normal approximation to the binomial distribution is appropriate. The sampling distribution of 𝑝 is approximately normal The distribution of the sample proportions is approximately normal The sample distribution is approximately normal

73 Sampling Distribution of 𝑝 (Approximately Normal)
Population (NEVER Normal) Sample Distribution (NEVER Normal) Sampling Distribution of 𝑝 (Approximately Normal)

74 Question 5 - part (c) (c) A suggestion was made to use a two-sample z-interval for a difference between proportions to investigate whether the difference in proportions between adults in the United States who would have chosen the economy statement is statistically significant. Is the two-sample z-interval for a difference between proportion an appropriate procedure to investigate the difference? Justify your answer.

75 MODEL SOLUTION Question 5 – part (c)
The suggested procedure is not appropriate since one of the requirements for using a two-sample z-interval for a difference between proportions is that the two proportions are based on two independent samples. In the situation described in this problem the two proportions come from a single sample, and thus are not independent.

76 SCORING – Part (c): 2. Part (c) indicates the procedure is not appropriate because the two proportions come from a single sample (dependent) rather than two (independent) samples. No, there is only one sample and I need 2 independent samples No, I need 2 samples and there is only one sample

77 Common Mistake Discussing the two-sample z interval versus the two-proportion z interval. A two sample interval would not be appropriate because you have proportions, so you should use a two proportion interval. Also, we only have one sample you would need two. WHAT???

78 You are selecting the wrong procedure, you chose a 2-SampZInt but you should be doing 2-PropZInt for a difference in proportions.

79 Best Practices for AP Teachers
Show students how to actually check the conditions, rather than just merely stating what they are. Remind students to think about whether they are working with categorical or quantitative data at the start of any analysis. Emphasize the difference between random samples (generally used for surveys and some observational studies) and random assignment (generally used in experimental studies). Incorporate activities that help students visualize underlying distributional assumptions in inference procedures. Provide scenarios where the inference conditions are not satisfied, and examine the consequences of that.

80 Questions? Presentation materials will be available at
Ellen’s website: Christy’s website: Ellen Breazel Christy Brown


Download ppt "Conditions for Inference: What Are They and Why Do We Need Them?"

Similar presentations


Ads by Google