Presentation is loading. Please wait.

Presentation is loading. Please wait.

Review Statistical inference and test of significance.

Similar presentations


Presentation on theme: "Review Statistical inference and test of significance."— Presentation transcript:

1 Review Statistical inference and test of significance

2 Basic concepts Suppose we want to study a population, and the parameter (average or percentage) is unknown. We use ____ to estimate the parameter. With a simple random sample, the sample average/percentage can be used to estimate the population average/percentage. But the sample estimate will be off by some amount, due to chance error. The standard error measures the likely size of it. When the composition of the population is unknown, we have to use the bootstrap method to estimate the SD of the population. What is the bootstrap method? The SD of the population can be estimated by the SD of the sample. This bootstrap estimate is good when the sample is large.

3 Basic concepts What is a confidence interval for the population parameter (average or percentage)? A confidence interval for the population parameter is obtained by going the right number of SEs either way from the sample estimate. The confidence level is read off the normal curve. This should only be used with large samples due to the CLT. How do you interpret the confidence level in terms of frequency theory of probability? It is not about the probability that the parameter lies in the interval. Because parameters are not subject to chance variation. It states about the frequency of multiple samples that the corresponding confidence interval covers the true value (parameter).

4 Basic concepts The formulas for simple random samples may not apply to other kinds of samples. For instance: with samples of convenience, standard errors usually do not make sense. Even if the sample is drawn by probability method, but not simple random sampling, the formula for SE is still not applied.

5 Basic concepts What is a test of significance? What is the null hypothesis, and what is the alternative hypothesis? A test of significance gets at the question of whether an observed difference is real (the alternative hypothesis) or just a chance variation (the null hypothesis). The null must be based on the chance process (assuming no other factors or bias), and the alternative is based on the question/argument we suggest. We can use a test of significance to detect a statement (null), or prove a statement (alternative).

6 Basic concepts What is a test statistic? A test statistic measures the difference between the data and what is expected based on the null hypothesis. This means the calculation is based on the null. What is a z-statistic? The z-statistic says how many SEs away an observed value is from its expected value, where the expected value is calculated using the null hypothesis.

7 Basic concepts What is the observed significance level or P-value? How do you interpret it? The P-value is not the chance of the null being correct. It is the chance of getting a test statistic as extreme as or more extreme than the observed one. (The calculation is based on the null.) Small P-values are evidence against the null: Less than 5%: statistically significant or significant. Less than 1%: highly significant.

8 Basic concepts Suppose we only have a small sample, say the sample size is 5. If the observed values (or the errors) follow the normal curve, and the SD of the population is unknown. Do we still use the z-test? No. We use the t-test instead. Suppose we have a randomized control experiment. We want to compare the data from the treatment group and the control group. In order to prove the treatment indeed has effect, what kind of test shall we use? How do we set up the null and alternative? We use two-sample z-test. The null is based on the chance variation. So it says there is no effect on the treatment. The alternative is based on what we want to prove: the treatment has effect.

9 Basic concepts Suppose we want to detect whether a coin is fair or not. What kind of test shall we use? The one-sample z-test (with two-sided P-value) or the χ²-test. But what if we want to detect a die is fair or not? (More than 2 categories.) We use the χ²-test. The χ²-statistic is always positive. (Compare to the z-statistic.) The χ²-test can also be used to test for independence.

10 Calculation and formula

11

12 Example 1 A survey organization takes a simple random sample of 625 households from a city of 80,000 households. On the average, there are 2.30 persons per sample household, and the SD is 1.75. Find a 95%-confidence interval for the average household size in the city.

13 Solution

14 Remark A variant of this problem could be: Suppose 30% of the sample households have the size greater or equal to 3 persons. Find a 95%-confidence interval for the percentage of the households having the size greater or equal to 3 persons in the city. In this case, you are doing a 0-1 box problem. You may also look at the statement (true or false): 95% of the households in the city contain between 2.16 and 2.44 persons. This is false. It confuses the SD with the SE. SE measures the chance error for multiple samples, SD measures the spread of the data for just one sample.

15 Example 2 According to the census, the median household income in Atlanta (1.5 million households) was $52,000 in 1999. In June 2003, a market research organization takes a simple random sample of 750 households in Atlanta; 56% of the sample households had incomes over $52,000. Did median household income in Atlanta increase over the period 1999 to 2003? Formulate the null and alternative hypotheses, and use a test of significant to detect the statement.

16 Solution This problem asks about whether the median increased or not. But we don’t have enough information about the incomes overall. Even if we know the observed median in 2003, we still don’t know how to compute the corresponding SE. So instead of looking at the incomes (quantitative variable), we look at the qualitative variable: whether the a household had income over $52,000 or not. The idea is that, since the median (50%) income in 1999 was $52,000, if the percentage of households having income over $52,000 was really greater than 50% in 2003 (not due to chance), then the median must increase.

17 Solution So a 0-1 box is needed (to classify the qualitative data): The box has one ticket for each household in 2003. If the income is over $52,000, the ticket is marked 1; otherwise, 0. The null says: the median did not increase, or equivalently, the percentage of the households having incomes over $52,000 is 50%. (The percentage of 1’s in the box is 50%.) The alternative says, this percentage is bigger than 50%. (The median did increase.) The sample is just like 750 draws from the box.

18 Solution

19 Good Luck!


Download ppt "Review Statistical inference and test of significance."

Similar presentations


Ads by Google