 # Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I.

## Presentation on theme: "Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I."— Presentation transcript:

Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I

Fall 2012Biostat 5111 Discussion Outline Calculating confidence interval for population mean (μ) When population standard deviation (σ) is known When population standard deviation (σ) is not known I have a confidence interval. What is it really telling me? Two-sided hypothesis testing z-test (σ known) and t-test (σ not known) Three different ways, all equivalent. A little Stata. Putting it all together Connections, more interpretations

Fall 2012Biostat 5112 Confidence intervals for population mean (population σ known) What we want to know: what is the population mean cholesterol for hypertensive men? What we have: a random sample of 25 hypertensive men and their cholesterol. Knowledge that the population cholesterol standard deviation for hypertensive men is 45 mg/ml The data: 233.47 203.76 204.66 279.39 189.35 227.17 187.55 234.37 234.37 274.89 241.58 160.53 189.35 167.74 205.56 231.67 160.53 266.79 163.23 222.67 202.86 272.19 229.87 219.06 297.40 What would be an estimate of the population mean cholesterol for hypertensive men?

Fall 2012Biostat 5113 Confidence intervals for population mean (population σ known)

Fall 2012Biostat 5114 Confidence intervals for population mean (population σ known)

Fall 2012Biostat 5115 Confidence intervals for population mean (population σ known)

Fall 2012Biostat 5116 Confidence intervals for population mean (population σ known) What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Use calculator or Stata

Fall 2012Biostat 5117 Confidence intervals for population mean (population σ known) What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Use calculator or Stata It would mean plugging in a larger n, which would make for a tighter CI, i.e. the values would be closer to the sample mean.

Fall 2012Biostat 5118 Confidence intervals for population mean (population σ known) What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the sigma was larger/smaller? General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Use calculator or Stata A 99% CI means larger values for t and thus a wider interval. A 90% CI means smaller values for t and a tighter interval. It would also affect the interpretation.

Fall 2012Biostat 5119 Confidence intervals for population mean (population σ known) What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? What if the σ was larger/smaller? General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean (α = 1-0.95 = 0.05). Use calculator or Stata Larger σ means a wider interval. Smaller σ means a tighter interval. Makes sense – sampling from less diffuse data should mean less uncertainty.

Fall 2012Biostat 51110 Confidence intervals for population mean (population σ NOT known)

Fall 2012Biostat 51111 Confidence intervals for population mean (population σ NOT known) If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Use calculator or Stata

If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? Fall 2012Biostat 51112 Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Use calculator or Stata

If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? Fall 2012Biostat 51113 Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Use calculator or Stata It would mean plugging in a larger n, which would make for a tighter CI, i.e. the values would be closer to the sample mean.

If we drew another sample of the same size, which values would be the same and which would likely change? What would happen if our sample size was larger? What if we wanted a 99% CI? 90% CI? Fall 2012Biostat 51114 Confidence intervals for population mean (population σ NOT known) General formula for confidence interval of the mean. General formula for 95% confidence interval of the mean. Use calculator or Stata A 99% CI means larger values for t and thus a wider interval. A 90% CI means smaller values for t and a tighter interval.

Fall 2012Biostat 51115 Confidence interval of sample mean - interpretation Scientific collaborator asking statistician some questions: Q: What is your best estimate of the population mean? A: The sample mean! For our sample, it is 220. Q: But how sure are you that it is the population mean? A: I don’t know if it is or not, but I can tell you the 95% confidence interval calculated from our data is (204.07, 235.93) Q: Ok, so there’s a 95% chance that the pop. mean is in that interval right? A: Not quite! The true mean either is or it isn’t in that confidence interval. So we can’t put a probability on it. However, I can tell you that if I were to repeat this experiment over and over again, 95% of the confidence intervals produced will contain the truth.

Fall 2012Biostat 51116 Hypothesis testing for population mean (population σ known: z-test) Known facts: In the general population, men have mean cholesterol of 211 mg/ml with standard deviation 45 mg/ml. What we want to know: Do men in the hypertensive population have different mean cholesterol than men in the general population? What we have: A random sample of 25 hypertensive men and their cholesterol. Knowledge that the population std. dev. for hypertensive men is the same as that of the general population (45 mg/ml)

Fall 2012Biostat 51117 Hypothesis testing for population mean (population σ known: z-test)

Fall 2012Biostat 51118 Hypothesis testing for population mean (population σ known: z-test)

Fall 2012Biostat 51119 Hypothesis testing for population mean (population σ known: z-test)

Fall 2012Biostat 51120 Hypothesis testing for population mean (population σ known: z-test)

Fall 2012Biostat 51121 Hypothesis testing for population mean (population σ known: z-test)

Fall 2012Biostat 51122 Hypothesis testing for population mean (population σ known: z-test) It is not a coincidence that all three methods produced the same conclusion. They are mathematically equivalent! When doing an analysis yourself, just pick the one you feel most comfortable with. When reading research papers though, it is good to be familiar with all three. Let’s use some pictures to help illustrate why the three methods are equivalent.

Fall 2012Biostat 51123 Hypothesis testing for population mean (population σ known: z-test) Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H 0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means:

Fall 2012Biostat 51124 Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H 0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means:

Fall 2012Biostat 51125 Hypothesis testing for population mean (population σ known: z-test) Z-score scale Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H 0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the Z-score for each of these samples. A histogram of these millions of Z-scores: If H 0 is true, the probability of observing a Z-score in the extreme red area is 5% (recall α=0.05).

Fall 2012Biostat 51126 Hypothesis testing for population mean (population σ known: z-test) p-value scale Suppose we live in a world where hypertensive men actually are the same as everyone else (i.e. H 0 is true): Say we took MANY samples of 25 hypertensive male cholesterols and found the sample mean for each of these samples. A histogram of these millions of sample means:

Fall 2012Biostat 51127 Hypothesis testing for population mean (population σ NOT known: t-test) In the previous example, we knew the population sd. What if we don’t? Known facts: In the general population, men have mean cholesterol of 211 mg/ml with standard deviation 45 mg/ml. What we want to know: Do men in the hypertensive population have different mean cholesterol than men in the general population? What we have: A random sample of 25 hypertensive men and their cholesterol. Knowledge that the population sd for hypertensive men is the same as that of the general population (45 mg/ml)

Fall 2012Biostat 51128 Hypothesis testing for population mean (population σ NOT known: t-test)

Fall 2012Biostat 51129 Hypothesis testing for population mean (population σ NOT known: t-test)

Fall 2012Biostat 51130 Hypothesis testing for population mean (population σ NOT known: t-test)

Fall 2012Biostat 51131 Hypothesis testing for population mean (population σ NOT known: t-test)

Fall 2012Biostat 51132 Hypothesis testing for population mean (population σ NOT known: t-test)

Fall 2012Biostat 51133 One sample t-test example in Stata We can do all of this in Stata using the ttesti command. ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean 211 Pr(T |t|) = 0.2551 Pr(T > t) = 0.1276 Sample size Sample mean Sample Std. dev. Null mean

Fall 2012Biostat 51134 One sample t-test example in Stata We can do all of this in Stata using the ttesti command. ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean 211 Pr(T |t|) = 0.2551 Pr(T > t) = 0.1276 Sample size Sample mean Sample Std. dev. Null mean This is the T-score

Fall 2012Biostat 51135 One sample t-test example in Stata We can do all of this in Stata using the ttesti command. ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean 211 Pr(T |t|) = 0.2551 Pr(T > t) = 0.1276 Sample size Sample mean Sample Std. dev. Null mean

Fall 2012Biostat 51136 One sample t-test example in Stata We can do all of this in Stata using the ttesti command. ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean 211 Pr(T |t|) = 0.2551 Pr(T > t) = 0.1276 Sample size Sample mean Sample Std. dev. Null mean H 0 : μ = 211 H a : μ < 211 In a world where H 0 is true, the probability of seeing a sample mean even smaller than the one we observed (<220) is 87.24%. This is a p-value.

Fall 2012Biostat 51137 One sample t-test example in Stata We can do all of this in Stata using the ttesti command. ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean 211 Pr(T |t|) = 0.2551 Pr(T > t) = 0.1276 Sample size Sample mean Sample Std. dev. Null mean H 0 : μ = 211 H a : μ > 211 In a world where H 0 is true, the probability of seeing a sample mean even greater than the one we observed (>220) is 12.76%. This is a p-value.

Fall 2012Biostat 51138 One sample t-test example in Stata We can do all of this in Stata using the ttesti command. ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean 211 Pr(T |t|) = 0.2551 Pr(T > t) = 0.1276 Sample size Sample mean Sample Std. dev. Null mean H 0 : μ = 211 H a : μ ≠ 211 In a world where H 0 is true, the probability of seeing a sample mean more extreme than the one we observed (>220 or <202) is 25.51%. This is a p-value.

Fall 2012Biostat 51139 One sample t-test example in Stata In our sample of cholesterol measurements from 25 hypertensive males, we observed a mean cholesterol of 220 mg/ml (95% CI: 204.07, 235.93). We conduct a two-sided hypothesis test with the null hypothesis that the mean cholesterol of hypertensive males is the same as the mean cholesterol of the general male population using the t-test. Our test resulted in a T- score of 1.17. This does not fall in the two-sided α=0.05 rejection region, so is not a statistically significant result. We thus conclude that we do not have sufficient evidence to reject the null hypothesis. Note that this does not mean the null hypothesis is true, just that we do not have sufficient evidence to rule it out.. ttesti 25 220 38.6 211 One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 25 220 7.72 38.6 204.0667 235.9333 ------------------------------------------------------------------------------ mean = mean(x) t = 1.1658 Ho: mean = 211 degrees of freedom = 24 Ha: mean 211 Pr(T |t|) = 0.2551 Pr(T > t) = 0.1276 In practice, conclusions/interpretations may not be this wordy. We do so here for thoroughness.

Fall 2012Biostat 51140 Summary Some takeaways Hypothesis testing can be done on the mean scale, the z-scale (t-scale if we don’t know σ), or the p-value scale. Another way if we are doing 2-sided testing: just calculate the (1-α)% confidence interval (e.g. 95% CI for α=0.05). If the null mean is not in the interval, reject it. These are all mathematically equivalent. If we do not reject the null it does not imply the null is true! It simply means we don’t have sufficient evidence to reject it. What does α =0.05 mean? One overly simplified example: in clinical trials it means we are willing to let through 5% of drugs that have no effect. We don’t know how many drugs have no effect. We just know we are willing to let through 5% of them.

Download ppt "Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I."

Similar presentations