Presentation is loading. Please wait.

Presentation is loading. Please wait.

Confidence Intervals, Hypothesis Testing

Similar presentations


Presentation on theme: "Confidence Intervals, Hypothesis Testing"— Presentation transcript:

1 Confidence Intervals, Hypothesis Testing

2 Example 1 You manage a large educational nonprofit and are trying to estimate the amount of deductions your teachers apply for in order to comment to the media (you can write off $250 for supplies annually on your federal tax return). Your assistant randomly samples 50 employees . The mean write-off was 150$ with a SD of $55. What is the probability that the mean write-off is between 140 and 160?

3 What can we say about the question before we even start calculations?
We know we can use the z distribution because of the sample size In asking for a range of values, (between 140 and 160) we can take the normalized scores (number of SD from the mean i.e. z scores) and take that area under the curve This is an example of how one would apply the confidence interval concept

4 Step one Calculate what you can:
Standard error= sigma/sqrt(n)=55/sqrt(50) =7.78 Z score= ( )/7.78=1.28

5 Visualized

6 Find z score

7 Put it in plain English Multiply your area under the curve (probability) by 2 to get 0.798 The probability that the write-offs were within 10 of that mean is 79.8% Comment on these results Think of the alpha we commonly use in class: 0.05 and 0.1

8 What is a confidence interval?
Definition: the best estimate for a range of a population value (parameter) that we can come up with given a sample (sample statistic)  The general formula for n>30: X bar plus or minus (critical value * s.e.) Here is a list of critical values at the most common confidence levels 

9 T versus z The General formula for n<30:
If your n is less than 30, you need to look up the critical value in the t table, at the intersection of the (df) and the significance level depending on if it is a one tailed or 2 tailed test.  Thus the critical value changes depending on your sample size n and your confidence level that you desire 

10 Example 2 Example: we know the mean test scores for 20 people out of a class of 300. The mean score is an 82. The sample standard deviation=15. What confidence level are the scores between 75 and 89? 

11 What can we say about the question before we even start calculations?
We know we can use the t distribution because of the sample size In asking for a range of values, (between 75 and 89) we can take the t scores (number of SD from the mean) and take that area under the curve This is an example of how one would apply the confidence interval concept

12 Calculate what you can:
Standard error= sigma/sqrt(n)=15/sqrt(20) =3.35 t score= (89-82)/3.35=2.0896 Don’t want to use the t table? Be careful, it is a cumulative probability here OR want an easy t table?

13 T table

14 Visualize your answer

15 Put your answer in words
We are 94.96% confident that the population mean of exam scores is between 75 and 89

16 Hypothesis Testing A null hypothesis: nothing has changed or happened
change has not occurred, the effect has not been realized A statement of no difference   Always refers to the population, and is therefore untestable, so it is an implied hypothesis  The null hypothesis is a statement of equality The purpose of the null: acts as a starting point or benchmark against which the actual outcomes of a study can be measured  Until you prove there is a difference, you assume there is no difference 

17 Research hypothesis  Definition: a definite statement that there is a relationship between variables  They posit a relationship between variables, not an equality  They always refer to the sample, not the population

18 One tailed v. two tailed Non-directional: says two variables are different Directional: specifies if one is more than or less than the other  One tailed tests: reflect a directional hypothesis  Greater use than a two tailed test Two Tailed Tests: reflect a non-directional hypothesis  There is a difference but in no particular direction Example one tailed test: Arrest rate is higher after a crackdown on prostitution Example two tailed test: The arrest rate after the crackdown does not equal the arrest rate after

19 Steps to work through a CI
General Steps to take to test a null hypothesis 1. State the null hypothesis  2. Set the level of risk associated with the null hypothesis  3. Select the appropriate test statistic (z or t score, depends on n) 4. Compute the test statistic  5. Determine the value needed for rejection of the null based on a table of critical values for that particular statistic       -each test statistic has a critical value, this is the value you’d expect if the null were true  6. If the obtained value is more extreme than the critical value, the null cannot be accepted, that is, the null occurring by chance is not the best explanation of the events  7. If the obtained value doesn’t exceed the critical value, you do not reject the null 

20 Example There is a series of complaints made to the local police department on prostitution. Before the crackdown, there were 3.4 arrests per day. The chief wants to show that the crackdown has worked. What is the null and research hypothesis?

21 Hypotheses H_0: Following the crackdown arrests after=arrests before; arrests after=3.4 H_A: Following the crackdown, arrests after > 3.4

22 Here is the random sample of arrests per day
Day     Prostitution Arrests 1          3 2          5 3          7 4          2 5          3 6          6 7          4 8          3 9          6 10        1 Step 1: Estimate the population and sample means there are a lot of sites out there that do this:

23 Use these estimates to calculate the standard error
Sample mean: 4 Sample SD (hint divide by n-1)=1.94 S.e.= s/sqrt(n)= 1.94/sqrt(10)=0.61

24 Test the hypothesis with these numbers
When you’re told to test a hypothesis, this is asking you to get the probability of taking a random sample of 10 with the mean at 4.0 if the population mean is actually 3.4 Get the t score for 4.0

25 T score / 0.61= 0.98 Look up the t score

26 Interpret the t score You can see it is in between 0.15 and 0.2
(the computer shows it is 0.176) this means that The probability of drawing a sample of 10 with a mean of 4 if the population mean is really 3.4 is between 0.1 and 0.2; should we accept the null?

27 Why is the t score not enough?
Typically we’d reject the null with 95% confidence, the critical value there is 1.833, we only got 0.98

28 Significance levels (alpha)
The risk that what you observe is not due to the treatment Also, the risk you’re willing to take that you’ll reject a null hypothesis when it is actually true  Example: the increase in test scores is by chance, not due to the after school program  If an article reports significance at the 0.05 level, this means there is a 1 in 20 chance that whatever they observed can be attributed to chance as opposed to the treatment they hypothesized  The researcher picks this value (the risk they’re willing to accept)

29 How sure must you be? If the t score you generate EXCEEDS the t score that is associated with the alpha, we can reject the null hypothesis  And accept the research hypothesis  The alpha is the probability that you SELECT in order to reject the null When our alpha is 0.05, this is the threshold it takes to reject the null, if our t score exceeds the t score associated with 0.05 (at the df) then we reject the null, but there is still a 5% chance that the null is true 

30 In the previous example
Returning to the problem above, the t score is 0.98 if our alpha was at (df=9) and the t score is does not exceed 1.833, so we cannot reject the null.  There is ~17% chance that the null is true, and that’s too high

31 Language in psets “Evaluate your hypothesis at alpha=0.1 and at alpha 0.05 “ This is asking you to see if your t/z score that you calculated exceeds the t/z score at the chosen level of confidence (alpha = 0.05 is the same as 95% confidence) If you’re using the t dist. make sure you determine the correct value at the proper degrees of freedom

32 Handy graphic for errors

33 Interpreting the previous graphic
The null can either be true or false, you’ll never know because you’re not testing the whole population You can either accept it or reject it  Type I: The value associated with a type I is the risk you’re rilling to take and it is conventionally between 0.01 and 0.05 If it is at 0.05, there is a 5% chance you’ll reject the null when it is actually true  Reduce the chance of getting a type I by using smaller and smaller alphas Raising the alpha increases the chance you commit a type II error!

34 Interpreting the previous graphic
Type II: you accepted a null by mistake, and conclude there are no differences when there actually are Reduce your likelihood of committing a type II error by increasing the sample size 

35 SAMPLE SIZE Sample size
When you test a hypothesis with a small sample, the t scores with the associated alpha values will be higher than those for larger samples This is because when you estimate a population with a small sample, it contains more error  As the number of df goes up, the t values for rejecting the null go down  If n is bigger than 30, use the normal distribution 

36 FORMULA TO DETERMINE SAMPLE SIZE
How to determine the sample size: N=[(t (i.e.1.96) * s)/ error we can tolerate ] ^ 2

37 Example We need to determine for the Welfare office the average income for all residents that receive welfare. They want to be 95% confident that the estimate of average income is within $100 of the actual average. How large of a sample do we need in order to reduce the error to 100 (the SD is 442)?

38 Solve by plugging in Step1: we know to build a 95% confidence interval we take (that is the t score/critical value that we want Step 2: n=[(1.96 * 442)/100]^2 N=75.05 or 76 In English: the best sample size is 76 respondents

39 Example We are testing the effect of a drug by injecting 100 people with it and recording their response time. The mean response time for those not who did not get the drug was 1.2 seconds, and the mean response time for those who were injected with the drug its 1.05 seconds. The sample standard deviation is 0.5 seconds. Do you think the drug affects the response time? 

40 Step one Set the hypotheses:
     Null: the response time is equal between those injected and those not injected (the drug has no effect) Research hypothesis: The response time for those injected is less than those not injected (mu _(with drug) < 1.2 seconds)

41 Step 2 If the null was true, what is the probability we would have gotten this with the sample (if that probability is really small then we can reject the null.) we know that n>30, so the CI can use the critical value in the z distribution

42 Next steps Step 3: Estimate the s.e. = s/sqrt(n) = 0.5/10 = 0.05
Step 4: get the test statistic Conceptualize the problem by drawing it out: 1.2 is the mean, how many SD is 1.05s away from 1.2s. Then get the z score for 1.05 to find how many SD it is away from the mean of 1.2.

43 Get the z score get the z score using this formula:
In english this means that 1.05 seconds is 3 SD away from the mean  So in setting up this confidence interval, you’re asking what the odds of getting a score 3 SD from the mean (1.05 s) completely by chance. Since it is far out there in the tails, intuition says it is low.  given we set our hypothesis up this way, we are only testing to see if the drug lowers response time  This calls for a one tailed test

44 Draw it out to help

45 Look at the z table You look at the z table and see that 3.0 has between mu and the score. Thus if we add .5 to we see that the odds of getting this score by chance are or .0014 How to put this into plain English?

46 Estimating population proportions

47 Proportions You can set up confidence intervals around them just like we did with means Here are the steps: 1. estimate the proportion 2. Take the SD with this formula: s= sqrt(p * (1-p))  3. Find the s.e. with this formula:  s / sqrt(n) 4. Set up the confidence interval with this formula:  proportion plus or minus t * s.e.

48 Example The warden wants to estimate how many re-admits he is getting because of a new job training program taking place in the jail. He takes a sample of 100 inmates who went through the program, and found that 68 became inmates again. Give a 95% confidence interval around this population proportion.

49 Calculate what we can Step 1: estimate the population proportion
     =0.68 become re-admitted each year Step 2: get the sample standard deviation using this formula: s= sqrt(p * (1-p))       =sqrt(1* 0.68 * 0.32)      =0.47

50 Next steps Step 3: Use this in order to find the standard error: = s / sqrt(n)      =0.47/ sqrt(100) = 0.047 Step 4: What are the 95% confidence limits of the proportion? Since n is bigger than 30, the normal curve can be used. Set up a confidence interval using this formula: proportion plus or minus t * s.e. = or * 0.047 = or =0.59 to 0.77

51 In English We are 95% confident that the population proportion is between .59 and .77 readmitted to prison.

52 Is the program working? Why or why not?

53 Example Your boss wants to estimate how many welfare recipients own cars. She wants to know the proportion within 2% and wants to be 95% certain. What is the sample size she needs in order to do this?

54 Steps to solve Step one: the sample size formula is n=[ (z * sigma)/ error ] ^2 where error is the amount of error we can tolerate -since she can deal with 2% error, this becomes: n=[ (1.96 * sigma) / 0.02] Step two: We insert 0.5 for sigma since the largest standard deviation is .5 for s proportion of .5, thus if we don’t know a population proportion and need to estimate a sample size, 0.5 is the best proportion estimate to use (which has a SD of 0.5).      n=[ (1.96 *.5) / 0.02]^2=2401  Step 3: she needs to sample 2401 employees 


Download ppt "Confidence Intervals, Hypothesis Testing"

Similar presentations


Ads by Google