Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 10 Comparing Two Means or Proportions. Generalising from sample IndividualsMeasurementGroupsQuestion Children aged 10 Mark in maths test Boys & girls.

Similar presentations


Presentation on theme: "Week 10 Comparing Two Means or Proportions. Generalising from sample IndividualsMeasurementGroupsQuestion Children aged 10 Mark in maths test Boys & girls."— Presentation transcript:

1 Week 10 Comparing Two Means or Proportions

2 Generalising from sample IndividualsMeasurementGroupsQuestion Children aged 10 Mark in maths test Boys & girls Are male marks higher on average? Plots in fieldYield of wheatVarieties A & B Which gives higher yields? Cars leaving production line CO emissions from exhaust Production lines 1 & 2 Are both lines same?

3 Generalising from sample IndividualsMeasurementGroupsQuestion Children aged 10 Pass/fail in maths test Boys & girlsAre males more likely to pass? Cabbages in field Infected by cabbage butterfly Varieties A & B Which is less likely to be infected? Cars leaving production line Rattle in exhaust Production lines 1 & 2 Do both lines have same chance of rattle?

4 Numerical measurements: means Difference in average weight loss for those who diet compared to those who exercise to lose weight? Difference is there between the mean foot lengths of men and women? Population parameter  2 –  1 = difference between population means Sample estimate x 2 – x 1 = difference between sample means

5 Categorical measurements: propns Difference between the proportions that would quit smoking if taking the antidepressant buproprion (Zyban) versus wearing a nicotine patch? Difference between proportion who have heart disease of men who snore and men who don’t snore? Population parameter  2 –  1 = difference between population proportions Sample estimate p 2 – p 1 = difference between sample proportions

6 Requirement: independent samples Random samples taken separately from 2 populations Randomised experiment with 2 treatments One random sample, but a categorical variable splits individuals into 2 groups. Two samples are called independent samples when the measurements in one sample are not related to the measurements in the other sample.

7 Model for numerical data Sample 1 ~ population (mean  1, s.d.  1 ) Sample 2 ~ population (mean  2, s.d.  2 ) Estimation: estimate (  2 –  1 ) with Standard error? Confidence interval? Testing: is (  2 –  1 ) zero? p-value

8 Model for categorical data Sample 1 ~ population (proportion  1 ) Sample 2 ~ population (proportion  2 ) Estimation: estimate (  2 –  1 ) with (p 2 – p 1 ) Standard error? Confidence interval? Testing: is (  2 –  1 ) zero? p-value

9 Distribution of difference In both cases, we need to find distribution of difference (p 2 – p 1 ) or Independent samples >> difference of independent random variables. We already know distns of the two parts — what is distn of their difference?

10 Sum of 2 variables Sample mean: Sample total: Same distns Different distns

11 Difference between 2 variables Same standard devn as sum If X 1 and X 2 are normal Remember that X 1 and X 2 must be independent

12 Example Husband height ~ normal(1.85, 0.1) Wife height ~ normal(1.7, 0.08) Assume independent. (Probably not!!) Prob that wife is taller than husband? (Husband - Wife) ~

13 Example Husband height ~ normal(1.85, 0.1) Wife height ~ normal(1.7, 0.08) Husband - Wife ~ normal(0.15, 0.1281) P (diff ≤ 0) = area 0.150.280.410.02-0.11 Prob = 0.297

14 Difference between proportions If X 1 and X 2 are independent, If p 1 and p 2 are independent, For large samples, p 1 and p 2 are approx normal, so their difference is too.

15 n 1 = n 2 = 244 randomly assigned to each treatment Std error for difference in propns Nicotine patches vs Antidepressant (Zyban)? Zyban: 85 out of 244 quit smoking Patch: 52 out of 244 quit smoking So,

16 Approximate 95% C.I. Best you can do for difference between proportions For means, CI can be improved by replacing ‘2’ by a different value. For sufficiently large samples, the interval Estimate  2  Standard error is an approximate 95% C.I.

17 Patch vs Antidepressant Approx 95% C.I..135  2(.040) =>.135 .080 =>.055 to.215 Study: n 1 = n 2 = 244 randomly assigned to each group Zyban:85 of the 244 Zyban users quit smoking =.348 Patch: 52 of the 244 patch users quit smoking =.213 So, We are 95% confident that Zyban gives an improvement of between 5.5% and 21.5% of the probability of quitting smoking.

18 Difference between means If X 1 and X 2 are independent, If both populations are normal, so is the difference.

19 n 1 = 42 men on diet, n 2 = 27 men on exercise routine Std error for difference in means Lose More Weight by Diet or Exercise? Diet: Lost an average of 7.2 kg with std dev of 3.7 kg Exercise: Lost an average of 4.0 kg with std dev of 3.9 kg So,

20 We are 95% confident that those who diet lose on average 1.58 to 4.82 kg more than those who exercised. Approximate 95% Confidence Interval: 3.2  2(.81) => 3.2  1.62 => 1.58 to 4.82 kg Study: n 1 = 42 men on diet, n 2 = 27 men exercise Diet: Lost an average of 7.2 kg with std dev of 3.7 kg Exercise: Lost an average of 4.0 kg with std dev of 3.9 kg So, Diet vs Exercise

21 A CI for the Difference Between Two Means (Independent Samples): where t* is a value from t-tables. Better C.I. for mean d.f. = min(n 1 –1, n 2 –1) Welch’s approx gives a different d.f. (higher) but is a complicated formula t* is approx 1.96 if d.f. is high

22 Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. Estimate difference between the mean crossing times. No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7 Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8 Effect of a stare on driving

23  No outliers; no strong skewness.  Crossing times in stare group seem faster & less variable. Checking data

24 A 95% CI for  2 –  1 is Effect of stare on driving Using df = min(n 1 –1, n 2 –1) = 12, gives t* = 2.179

25  Slightly narrower C.I. that we got with d.f. = 12. N.B.C.I. is based on df = 21 (Welch’s approx) Effect of stare on driving Minitab

26 Interpretation We are 95% confident that it takes drivers between 0.17 and 1.91 seconds less on average to cross intersection if someone stares at them. A 95% CI for  2 –  1 is 0.17 to 1.91 sec

27 Testing two proportions Hypotheses H 0 :  1 –  2 = 0 H A :  1 –  2 ≠ 0 or  1 –  2 < 0 or  1 –  2 > 0 Watch how Population 1 and 2 are defined. Data requirements Independent samples n 1 p 1, n 1 (1-p 1 ), n 2 p 2, n 2 (1-p 2 ) all at least 5, preferably ≥10

28 Test statistic Based on p 1 – p 2 Standardise:

29 Test statistic If H 0 is true, best estimate of  is So we use test statistic If H 0 is true, this has standard normal distn p-value from normal distn

30 Prevention of Ear Infections Does the use of sweetener xylitol reduce the incidence of ear infections? Randomized Experiment: Of 165 children on placebo, 68 got ear infection. Of 159 children on xylitol, 46 got ear infection. Hypotheses: H 0 :  1 –  2 =  H a :  1 –  2 >  Data check: At least 5 success & failure in each group

31 Prevention of Ear Infections Overall propn getting infection Test statistic p-value = 0.01 Conclusion: Strong evidence xylitol reduces chance of ear infection

32 Testing two means Hypotheses H 0 :  1 –  2 = 0 H A :  1 –  2 ≠ 0 or  1 –  2 < 0 or  1 –  2 > 0 Watch how Population 1 and 2 are defined. Data requirements Fairly large n 1 and n 2 (say 30 or more), or Not much skewness & no outliers (normal model reasonable)

33 Test statistic Based on Standardise:

34 Test Test statistic: If H 0 is true, this has approx t-distn with d.f. = min(n 1 –1, n 2 –1) Same d.f. as CI for  1 –  2 p-value from t distn Minitab or Excel n 1 and n 2 ≥ 30 Use normal tables

35 Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. Test whether stare speeds up crossing times. No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, 7.1, 5.7, 6.5, 4.7, 6.9, 5.2, 4.7 Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, 6.1, 4.8, 4.9, 4.5, 7.2, 5.8 Effect of a stare on driving

36  Small sample sizes, but  No outliers; no strong skewness. Checking data

37 Effect of stare on driving Hypotheses H 0 :  1 –  2 = 0 H A :  1 –  2 > 0 where 1 = no-stare, 2 = stare

38 Effect of stare on driving Test statistic df = min(n 1 –1, n 2 –1) = 12 Upper tail area of t-distn (12 d.f.) p = 0.016 P-value Strong evidence that stare speeds up crossing

39  Very similar p-value and same conclusion N.B.Test is based on df = 21 (Welch’s approx) Effect of stare on driving Minitab Strong evidence that stare speeds up crossing

40 Paired data and 2-sample data Make sure you distinguish between: 2 measurements on each individual (e.g. before & after) Measurements from 2 independent groups Different cars assessed for insurance claims in garages A and B Same cars assessed by both garages 2 independent samples Paired data


Download ppt "Week 10 Comparing Two Means or Proportions. Generalising from sample IndividualsMeasurementGroupsQuestion Children aged 10 Mark in maths test Boys & girls."

Similar presentations


Ads by Google