Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lunch & Learn Statistics By Jay. Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple.

Similar presentations


Presentation on theme: "Lunch & Learn Statistics By Jay. Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple."— Presentation transcript:

1 Lunch & Learn Statistics By Jay

2 Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple statistical tests

3 What topics will we cover? Statistical concepts. Probability. Definitions Descriptive statistics. Hypothesis Formulation Hypothesis testing. Normal Distribution.  and  errors. Student’s t distribution Paired and unpaired t tests. Analysis of variance Regression Categorical data. Sensitivity / specificity. Chi square tests.

4 Session #1 Review Observations vary: sample v. population Observational vs. experimental data Graphing data

5 Session #2 Review Statistics are functions of the data Useful statistics have known distributions Statistical inference Estimation Testing hypotheses Tests seek to disprove a “null” hypothesis

6 Session #3 Review Tests involve a NULL hypothesis (H 0 ) an ALTERNATIVE hypothesis (H A ) Try to disprove H 0 4 steps in hypothesis testing –Identify the test statistic –State the null and alternative hypotheses –Identify the rejection region –State your conclusion

7 Session #4 Concepts  (type I) and  (type II) errors. Normal Distribution and the z- statistic: a mathematical construct. The Central Limit Theorem: a divine gift for statistical inference? We will use the Normal distribution to: … perform hypothesis tests … calculate power and sample size

8 Type I error …rejecting the null hypothesis when in fact, it is true = P (reject H 0 | H 0 true) =  Generally,  =0.05 For one-sided tests, it is conservative to choose  =0.025

9 Type II error …accepting H 0 when in fact H A is true. = P (accepting H 0 | H A true) =  Often we pick  =0.20 or 0.10 and calculate the sample size to achieve this goal. In drug development, it is wasteful (but less expensive) to choose  >0.10

10 What is Power? Power is 1- , that is… = 1 - P (accepting H 0 | H A true) = P (rejecting H 0 | H A true) …rejecting H 0 when in fact H A is true. This is something we want to happen: our goal!

11 Normal Distribution Parametric Mean  and Variance  2 Mean = point of symmetry  2 = “spread” of bell curve Complicated mathematical formula f(x) = exp[-(x-  ) 2 /2  2 ]/ [  (2  )] Looks like a bell centered at  Distance from midline to inflection point =  Importance: Central Limit Theorem

12 Central Limit Theorem Averages of random variables ~ N( ,  2 /n) Proof: uses Taylor and MacLauren expansions of infinite series and other nasty mathematical tricks When n ~ teens  averages ~Normal for reasonable distributions When n>30  averages ~Normal no matter how weird the original distribution

13 Z- statistic If xbar ~ N( ,  2 /n), then… (xbar-  )/(  /  n) ~ N(0,1) [Note change in the denominator] We say that “xbar has been normalized”

14 Why the z-statistic is useful Hypothesis tests require a test statistic with a known distribution The z-statistic distribution is known Averages of anything (if n>30) can use the z-statistic

15 Why the z-statistic is useful Hypothesis tests require a test statistic with a known distribution The z-statistic distribution is known Averages of anything (if n>30) can use the z-statistic ASSUMPTION: WE KNOW THE VARIANCE  2 OF THE POPULATION STUDIED

16 Example: C-section data Test if initial SBP is too low, I.e., < 85 mm Hg Four steps in testing: 1.Identify test statistic 2.State hypotheses 3.Identify rejection region 4.State conclusions

17 Example: C-section data Identify a test statistic: … minimum value? How is it distributed? … median value? How is it distributed? … average value? How is it distributed? N( ,  2 / n )

18 Example: C-section data Identify a test statistic: xbar ~ N( ,  2 / n ) Therefore… : z = (xbar-  )/(  /  n) is the test statistic. We know it’s distribution: N(0,1)

19 Example: C-section data Identify a test statistic: z State null and alternative hypotheses: H 0 :  >=85 (remember, put = with H 0 ) H A :  < 85

20 Example: C-section data Identify a test statistic: z for average SBP State the null and alternative hypotheses H 0 :  >=85 H A :  < 85 Identify the rejection region Under H 0, we use  0 for population mean: z = (xbar-  0 )/(  /  n) Here  0 is 85 mmHg When z < -z , we reject H 0.

21 Example: C-section data Final step: state conclusion Calculate z = (xbar-  0 )/(  /  n) xbar = 80.25  0 = 85 (according to H 0 ) For now, we will use 8.006 for  n = 20 z = -2.65, which is < -1.645 (z 0.05 ) Therefore, we reject H 0 and conclude: Data not consistent with SBP at least 85 mmHg

22 Example: C-section data Final step: state conclusion Calculate z = (xbar-  0 )/(  /  n) xbar = 80.25  0 = 85 (according to H 0 ) For now, we will use 8.006 for  n = 20 z = -2.65, which is < -1.645 (z 0.05 ) Therefore, we reject H 0 and conclude: Data not consistent with SBP at least 85 mmHg “Nominal p-value” Look up  value associated with z = -2.65: Get 0.004025

23 Calculating Power of a Test Recall, power = P (rejecting H 0 | H A true). [reject H 0 at  level, when H A is true] Start with “reject H 0 at  level”: P (reject H 0 | H 0 true) =  = P (z < - z  | H 0 true) for our example = P [(xbar-  0 )/(  /  n) < -z  |  =  0 ). = P [xbar <  0 – z  (  /  n)].

24 Calculating Power Note in this case P [(xbar-85)/1.79 < -1.645] = 0.05 P [xbar < 85 –(1.645)(1.79)] = 0.05 P (xbar < 82.05545) = 0.05 (Another way to write the rejection region of the hypothesis test)

25 Calculating Power Step 2: re-write probability for type II: Power = P (reject H 0 | H A true) P [xbar <  0 – z  (  /  n)]. = P [(xbar-  A )/(  /  n) < (  0 – z  (  /  n) -  A )/ (  /  n)] = P [z A < (  0 -  A )/ (  /  n) – z  ]

26 Calculating Power = P [z A < (  0 -  A )/ (  /  n) – z  ] = P [z A < (85-80)/1.79 – 1.645] = P (z A < 1.148) = 0.875

27 Calculating Sample Size = P [z A < (  0 -  A )/ (  /  n) – z  ] Choose power of 0.90  z A = 1.282 (table) 1.282 < (  0 -  A )/ (  /  n) – z  Solve for n. Usually  0 =0;  A,  specified;  2 estimated. For two-sided test, use z  /2

28 Calculating Sample Size = P [z A < (  0 -  A )/ (  /  n) – z  ] Choose power of 0.90  z A = 1.282 (table) 1.282 < (  0 -  A )/ (  /  n) – z  1.282 < (85-80)/ (8.006/  n) – 1.645 Solve for n: 21.97, round up to 22

29 Review of Session #4  (type I) and  (type II) errors. Normal Distribution and the z- statistic The Central Limit Theorem Hypothesis testing using z and N(0,1) Calculating power Calculating sample size

30 Session #4 Homework Using the C-section data… (1)Determine whether or not the increase in SBP exceeds 20 mmHg. [Hint: form paired differences. Calculate  2 on diffs.] (2)What is the power of this test to detect an increase of 10 mmHg in SBP? (3)Extra Credit: Find sample size that provides  90% chance of detecting an increase in SBP of 5 mmHg or more.


Download ppt "Lunch & Learn Statistics By Jay. Goals Introduce / reinforce statistical thinking Understand statistical models Appreciate model assumptions Perform simple."

Similar presentations


Ads by Google