Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Similar presentations


Presentation on theme: "Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"— Presentation transcript:

1 Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1

2 SECTION 5.1 Parameters and factors that affect sample size Sample size estimation and correlation

3 SECTION 5.6 Sample size estimates for a two sample (independent groups) dichotomous outcome

4 Learning Outcome: Calculate and interpret sample size estimates for two sample (independent groups) dichotomous outcome ---Estimate for a confidence interval ---Estimate for a hypothesis test

5 Sample Size to Estimate C.I. Sample Size for Hypothesis Test C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p 2 2 2 Dichotomous Outcome – Two Independent Samples n i = 2 ES = | p 1 – p 2 | p(1– p)

6 Sample Size to Estimate C.I. C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E 2 Dichotomous Outcome – Two Independent Samples (C.I.) Example: Estimate required sample size for 95% C.I. for the difference in the incidence proportion of adults over 50 who develop prostate cancer (over 30 years) by smoking status (non-smokers vs. heavy smokers). Parameters: Margin of error: 5% Assumed prevalence (Non-smoker)p 1 = 0.17 Assumed prevalence (Smoker)p 2 = 0.34 Assumed dropout rate:20% Desired C.I.: 95% (i.e. z = 1.96) n i = [0.17(1-0.17) + 0.34(1-0.34)] 0.05 1.96 2 n 1 = 561.6 n 2 = 561.6 n = 1123.3 Take into account the drop out rate: N (number to enroll) / (% retained) N = 1123.3 / 0.80 = 1404 subjects

7 Sample Size to Estimate C.I. C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E 2 Dichotomous Outcome – Two Independent Samples (C.I.)(Practice) Example: Estimate required sample size for 95% C.I. for the difference in the annual incidence proportion of depression among teenagers by psychological trauma (trauma vs. no trauma). Parameters: Margin of error: 5% Assumed prevalence (No trauma)p 1 = 0.06 Assumed prevalence (Trauma)p 2 = 0.12 Assumed dropout rate:10% Desired C.I.: 95% (i.e. z = 1.96) n i = [ ] n 1 = _____ n 2 = _____ n = _____ Take into account the drop out rate: N (number to enroll) / (% retained) N = ________________________

8 Sample Size to Estimate C.I. C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E 2 Dichotomous Outcome – Two Independent Samples (C.I.)(Practice) Example: Estimate required sample size for 95% C.I. for the difference in the annual incidence proportion of depression among teenagers by psychological trauma (trauma vs. no trauma). Parameters: Margin of error: 5% Assumed prevalence (No trauma)p 1 = 0.06 Assumed prevalence (Trauma)p 2 = 0.12 Assumed dropout rate:10% Desired C.I.: 95% (i.e. z = 1.96) n i = [0.06(1-0.06) + 0.12(1-0.12)] 0.05 1.96 2 n 1 = 248.9 n 2 = 248.9 n = 497.9 Take into account the drop out rate: N (number to enroll) / (% retained) N = 497.9 / 0.90 = 553.2 subjects

9 Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p 2 2 Dichotomous Outcome – Two Independent Samples (H 0 Test) n i = 2 ES = | p 1 – p 2 | p(1– p) Example: Compare the prevalence of hypertension in a trial of a new drug versus placebo. Parameters/Assumptions: Margin of error: 20% reduction Assumed prevalence (Placebo)p 1 = 0.30 Assumed prevalence (Drug)p 2 = 0.24 Assumed dropout rate:10% 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | 0.30 – 0.24 | 0.27(1– 0.27) p = 0.27 = 0.135 n i = 2 1.96 + 0.84 0.135 2 = 858.5 Take into account the drop out rate: N (number to enroll) / (% retained) N = 1717 / 0.90 = 1908 subjects n 1 = 858.5 n 2 = 858.5 n = 1717 A sample size of n = 1908 will ensure that a 2- sided test with α=0.05 has 80% power to detect a 20% reduction in the prevalence of hypertension attributed to the new drug.

10 Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p 2 2 Dichotomous Outcome – Two Independent Samples (H 0 Test)(Practice) n i = 2 ES = | p 1 – p 2 | p(1– p) Example: Compare prevalence of hyperglycemia in a trial of a new drug versus placebo. Parameters/Assumptions: Margin of error: 40% reduction Assumed prevalence (Placebo)p 1 = 0.50 Assumed prevalence (Drug)p 2 = 0.30 Assumed dropout rate:15% 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | p = 0.40 = _____ ni =ni = = ____ Take into account the drop out rate: N (number to enroll) / (% retained) N = ________________________ n 1 = ____ n 2 = ____ n = _____

11 Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p 2 2 Dichotomous Outcome – Two Independent Samples (H 0 Test)(Practice) n i = 2 ES = | p 1 – p 2 | p(1– p) Example: Compare prevalence of hyperglycemia in a trial of a new drug versus placebo. Parameters/Assumptions: Margin of error: 40% reduction Assumed prevalence (Placebo)p 1 = 0.50 Assumed prevalence (Drug)p 2 = 0.30 Assumed dropout rate:15% 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | 0.50 – 0.30 | 0.40(1– 0.40) p = 0.40 = 0.408 n i = 2 1.96 + 0.84 0.408 2 = 94.1 Take into account the drop out rate: N (number to enroll) / (% retained) N = 188.2 / 0.85 = 221.4 subjects n 1 = 94.1 n 2 = 94.1 n = 188.2 A sample size of n = 222 will ensure that a 2-sided test with α=0.05 has 80% power to detect a 40% reduction in the prevalence of hyperglycemia attributed to the new drug.

12 SECTION 5.7 Introduction to correlation

13 Learning Outcome: Describe the conceptual basis and properties of the correlation coefficient.

14 Correlation and Regression are both measures of association “Association” Statistical dependence between two variables: Exposure(e.g. risk factor, protective factor, predictor variable, treatment) Outcome(e.g. disease, event)

15 “Association” Example: The degree to which the rate of disease in persons with a specific exposure is either higher or lower than the rate of disease among those without that exposure. Correlation and Regression are both measures of association

16 Correlation and Regression are both measures of association Some Terms for “association” variables: Variable 1:“x” variable independent variable predictor variable exposure variable Variable 2:“y” variable dependent variable outcome variable

17 Correlation Coefficient Different types depending on numerical properties of “x” and “y” variables  Pearson: 2 continuous variables (~ normally distributed)  Spearman: 2 continuous variables (>1 variable not normally distributed)  Point bi-serial: one continuous and one binary variable  Phi-coefficient: two dichotomous variables

18 Correlation Coefficient Properties of correlation coefficients:  Range of -1.0 to 1.0  Value of -1.0 (perfect negative correlation)  Value of 1.0 (perfect positive correlation)  Value of 0 (no correlation (“association”)) As a rule of thumb, correlation coefficients: 0.0 to 0.30: “weak” 0.30 to 0.70: “moderate” 0.70 to 1.0: “high Usually, the p-value generated for r is based on the null hypothesis H 0 that r = 0.

19 Other points to note:  The correlation coefficient is unaffected by units of measurement  Correlations does not imply causation  Correlation should not be used when: a)There is a non-linear relationship between variables b)There are outliers c)There are distinct sub-group effects Correlation coefficients are spurious

20 SECTION 5.8 Calculate and interpret correlation coefficients

21 Learning Outcome: Calculate and interpret correlations coefficients: Pearson and Spearman (interpretation only)

22 Correlation Coefficient Computation Form: Pearson correlation (“r”) where x and y are the sample means of X and Y, s x and s y are the sample standard deviations of X and Y. Co-variation

23

24 The t-test for the correlation coefficient A t-test can be used to test whether the correlation between two variables is significant. The test statistic is t Guidelines: Using the t-test for the correlation coefficient 1. State H 0 and H 1. 2. Specify α. 3. Determine the degrees of freedom. d.f. = n – 2 4. Find the critical value(s) from table 2 with n-2 degrees of freedom 5. Compute the test statistic.

25 Example: Assume a correlation coefficient of 0.28 is observed with a sample size of n = 26. We wish to test this relationship in a 2-sided manner with α = 0.05. 1.State H 0 and H 1 H 0 : r = 0;H 1 : r = 0; 2. Specify α. α = 0.05 (2-sided) 3. Determine the degrees of freedom. d.f. = n – 2 d.f. = 26 – 2 = 24 4. Find the critical value(s) from table 2 with d.f. = n-2 = 2.064 5. Compute the test statistic. t = 0.28 (1- 0.28 2 ) (26 – 2) t = 1.43 Conclusion: 1.43 < 2.064 Do not reject H 0

26 Practice: Assume a correlation coefficient of 0.43 is observed with a sample size of n = 22. We wish to test this relationship in a 2-sided manner with α = 0.05. 1.State H 0 and H 1 H 0 : _____;H 1 : _____; 2. Specify α. α = ___________ 3. Determine the degrees of freedom. d.f. = n – 2 d.f. = ______ 4. Find the critical value(s) from table 2 with d.f. = n-2 = _____ 5. Compute the test statistic. Conclusion: Accept or Reject H 0 t = _____

27 Practice: Assume a correlation coefficient of 0.43 is observed with a sample size of n = 22. We wish to test this relationship in a 2-sided manner with α = 0.05. 1.State H 0 and H 1 H 0 : r = 0;H 1 : r = 0; 2. Specify α. α = 0.05 (2-sided) 3. Determine the degrees of freedom. d.f. = n – 2 d.f. = 22 – 2 = 20 4. Find the critical value(s) from table 2 with d.f. = n-2 = 2.086 5. Compute the test statistic. t = 0.43 (1- 0.43 2 ) (22 – 2) t = 2.13 Conclusion: 2.13 > 2.086 Reject H 0

28 Subject ID“x”“y”x i - xy i - y(x i – x) (y i – y) 11222 -2.132.00-4.25 21531 0.8811.009.625 31014 -4.13-6.0024.75 487 -6.13-13.0079.625 51614 1.88-6.00-11.25 62838 13.8818.00249.75 71420 -0.130.000 81014 -4.13-6.0024.75 Sum of all observations113160373.00 Mean value14.12520.0 Standard deviation6.2410.18 373 So, r xy = ----------------------------=-------= 0.84 (8 - 1) x (6.24 x 10.18)444.66 See SAS page 1

29 Subject ID“x”“y”x i - xy i - y(x i – x) (y i – y) 1816 ??? 21210 ??? 36 ??? 4116 ??? 51412 ??? 61630 ??? 7816 ??? 8612 ??? Sum of all observations??? Mean value?? Standard deviation3.727.25 So, r xy = _________________________________ Practice Calculation

30 Subject ID“x”“y”x i - xy i - y(x i – x) (y i – y) 1816 -2.132.00-4.25 21210 1.88-4.00-7.50 3610 -4.13-4.0016.50 4116 0.88-8.00-7.00 51412 3.88-2.00-7.75 61630 5.8816.0094.00 7816 -2.132.00-4.25 8612 -4.13-2.008.25 Sum of all observations8111288.00 Mean value10.12514.0 Standard deviation3.727.25 88 So, r xy = ----------------------------=-------= 0.47 (8 - 1) x (3.72 x 7.25)188.79 See SAS page 2

31 Correlation Coefficient Computation Form: Pearson correlation (“r”) From the formula above, it should be intuitive that the Pearson R is sensitive to extreme values

32 IDXY 11624 2128 31419 41114 52428 61822 7137 8298 94227 10712 1121 121417 13 26 141921 15248 161721 17358 182221 19189 2011 R0.161 See SAS pages 3-4

33 IDXY 11624 2128 31419 41114 52428 61822 7137 8298 942100 10712 1121 121417 13 26 141921 15248 161721 17358 182221 19189 2011 R0.573 See SAS pages 5-6

34 Correlation Coefficient With extreme values, you can use the Spearman “rank” correlation procedure to remove the undue influence of the extreme values. Assuming no ties in ranks Where d i = x i − y i between the ranks of each observation

35 Example: Incorrect use of Pearson R IDXY 124 100 216 22 310 13 48 14 518 11 614 10 713 9 811 12 99 15 1017 8 R0.696 See SAS page 7

36 IDXYXYRank XRank Ydidi 124 100 81417-636 216 22 91528-636 310 13 101336-39 48 14 1112451 518 11 1395239 614 10 14106339 713 9 162279-24 811 12 17881749 99 15 181194525 1017 8 2410010 00 R0.696Sum of178 6 x 178 1068 So, R s = 1 - -------------=1 - ------- = -0.08 10(100-1) 990 See SAS page 8

37 SECTION 5.9 Use of correlation in Excel, Power Point, and SPSS

38 Learning Outcomes: Calculate correlation coefficients in Excel and SPSS Produce a scatter plot in Power Point to depict correlation

39 Calculate Correlation Coefficients Excel Plot in Power Point Excel: (refer to Excel spreadsheet) =CORREL(Array 1,Array 2) =CORREL(A4:A15,B4:B15) XY 26248 42366 51592 44469 80634 62381 74668 38435 50572 42391 76663 64410

40 Power Point: ---Insert Chart ---X-Y- Scatter ---Add Trend Line (click on data points) r = 0.76

41 SPSS: Analyze Correlate, Bivariate Pearson, Spearman Age, Body Mass Index

42 SPSS: Analyze Correlate, Bivariate Pearson, Spearman Glucose, Triglycerides


Download ppt "Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,"

Similar presentations


Ads by Google