Presentation on theme: "9-1:Sampling Distributions Preparing for Inference! Parameter: A number that describes the population (usually not known) Statistic: A number that can."— Presentation transcript:
9-1:Sampling Distributions Preparing for Inference! Parameter: A number that describes the population (usually not known) Statistic: A number that can be computed from the sample data without making use of any unknown parameters.
Example Sample surveys show that fewer people enjoy shopping than in the past. A recent survey asked a nationwide random sample of 2500 adults if they agreed or disagreed that “I like buying new clothes but shopping is often frustrating and time-consuming.” Of the respondents, 1650, or 66%, said they agreed.
Example cont’d: p-hat = 66% = statistic = sample proportion Population = what we want to draw conclusions about = all US residents >18 yrs old Parameter = % of all adult US residents who agreed
Sampling Variability Sampling Variability:the value of a statistic varies in repeated random sampling. Simulation, Example 9.3 p. 565
Figure 9.1 (p.566) Sampling distribution of p- hat Histogram of values of p- hat from 1000 SRS’s of size 100 from a population of.70 This is an ideal pattern that would emerge if we looked at all possible samples of size 100 from our population
Describing Sampling Distributions Overall shape: symmetric/approx. normal Outliers/deviations from overall pattern: None Center: close to the true value of p Spread: value of p-hats have large spread, but because the distribution is closer to normal, we can therefore use sigma to describe the spread.
Are you a Survivor Fan? Suppose that the true proportion of US adults who watched Survivor II is p =.37. The graph shows the results of drawing 1000 SRSs of size n = 100 from a population with p =.37. Shape: Center: Spread: Outliers/Deviations:
Top: Results of drawing 1000 SRSs of size n=1000 drawn from a population with p =.37 Bottom: Results of drawing 1000 SRSs of size n=100 drawn from a population with p=.37 What happened when we took n = 1000 vs. n = 100? Notes on top picture: Center: close to.37 Spread: small; range is.321 to.421. Shape: hard to see, since values of p-hat cluster so tightly about.37
Random sampling… …gives us regular and predictable shapes …patterns of behavior over many repetitions …these distributions are approximately normal.
Unbiased Statistic Bias: Concerns the center of the sampling distribution A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated.
Examples of Unbiased Estimators If we draw an SRS from a population in which 60% find shopping frustrating, the mean of the sampling distribution of p-hat is: If we draw an SRS from a population in which 50% find shopping frustrating, the mean of p-hat is:
Variability of a statistic… As long as the candy is well mixed (it selects a random sample), the variability of the result depends only on the size of the scoop and not the size of the container.
Bulls Eye Analogy True value of population parameter: bull’s-eye, sample statistic: arrow fired at the target Bias: our aim is off, we consistently miss the bull’s-eye in the same direction High Variability: repeated shots are widely scattered on the target Goal: low bias, low variability Take random samples with big n!
In items 1–3, classify each underlined number as a parameter or statistic. Give the appropriate notation for each. 1. Forty-two percent of today’s 15-year-old girls will get pregnant in their teens. 2. A 1993 survey conducted by the Richmond Times-Dispatch one week before election day asked voters which candidate for the state’s attorney general they would vote for. Thirty-seven percent of the respondents said they would vote for the Democratic candidate. On election day, 41% actually voted for the Democratic candidate. 3. The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 and the standard deviation is 15. The medical director of a large company looks at the medical records of 72 executives in this age group and finds that the mean systolic blood pressure for these executives is 126.07.
Below are histograms of the values taken by three sample statistics in several hundred samples from the same population. The true value of the population parameter is marked on each histogram. 4.Which statistic has the largest bias among these three? Justify your answer. 5. Which statistic has the lowest variability among these three? 6. Based on the performance of the three statistics in many samples, which is preferred as an estimate of the parameter? Why?