Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,

Slides:



Advertisements
Similar presentations
Sample size estimation
Advertisements

1 2 Two-samples tests, X 2 Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Objectives (BPS chapter 24)
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
PSY 307 – Statistics for the Behavioral Sciences
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
Linear Regression and Correlation
The Simple Regression Model
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Social Research Methods
BS704 Class 7 Hypothesis Testing Procedures
BCOR 1020 Business Statistics Lecture 24 – April 17, 2008.
5-3 Inference on the Means of Two Populations, Variances Unknown
Sample Size Determination
Correlation and Regression Analysis
Comparing Population Parameters (Z-test, t-tests and Chi-Square test) Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director,
Linear Regression/Correlation
Linear Regression and Correlation Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of.
Correlation and Linear Regression
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression
Correlation.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Chapter 15 Correlation and Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Bivariate Regression (Part 1) Chapter1212 Visual Displays and Correlation Analysis Bivariate Regression Regression Terminology Ordinary Least Squares Formulas.
Power and Sample Size Determination Anwar Ahmad. Learning Objectives Provide examples demonstrating how the margin of error, effect size and variability.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Basic Statistics Correlation Var Relationships Associations.
Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
1 Review of ANOVA & Inferences About The Pearson Correlation Coefficient Heibatollah Baghi, and Mastee Badii.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
1 Inferences About The Pearson Correlation Coefficient.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Simple linear regression Tron Anders Moger
Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida,
© Copyright McGraw-Hill 2004
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
Association between two categorical variables
Correlation – Regression
Correlation and Regression
Chapter Thirteen McGraw-Hill/Irwin
Presentation transcript:

Statistics for clinicians Biostatistics course by Kevin E. Kip, Ph.D., FAHA Professor and Executive Director, Research Center University of South Florida, College of Nursing Professor, College of Public Health Department of Epidemiology and Biostatistics Associate Member, Byrd Alzheimer’s Institute Morsani College of Medicine Tampa, FL, USA 1

SECTION 5.1 Parameters and factors that affect sample size Sample size estimation and correlation

SECTION 5.6 Sample size estimates for a two sample (independent groups) dichotomous outcome

Learning Outcome: Calculate and interpret sample size estimates for two sample (independent groups) dichotomous outcome ---Estimate for a confidence interval ---Estimate for a hypothesis test

Sample Size to Estimate C.I. Sample Size for Hypothesis Test C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p Dichotomous Outcome – Two Independent Samples n i = 2 ES = | p 1 – p 2 | p(1– p)

Sample Size to Estimate C.I. C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E 2 Dichotomous Outcome – Two Independent Samples (C.I.) Example: Estimate required sample size for 95% C.I. for the difference in the incidence proportion of adults over 50 who develop prostate cancer (over 30 years) by smoking status (non-smokers vs. heavy smokers). Parameters: Margin of error: 5% Assumed prevalence (Non-smoker)p 1 = 0.17 Assumed prevalence (Smoker)p 2 = 0.34 Assumed dropout rate:20% Desired C.I.: 95% (i.e. z = 1.96) n i = [0.17(1-0.17) (1-0.34)] n 1 = n 2 = n = Take into account the drop out rate: N (number to enroll) / (% retained) N = / 0.80 = 1404 subjects

Sample Size to Estimate C.I. C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E 2 Dichotomous Outcome – Two Independent Samples (C.I.)(Practice) Example: Estimate required sample size for 95% C.I. for the difference in the annual incidence proportion of depression among teenagers by psychological trauma (trauma vs. no trauma). Parameters: Margin of error: 5% Assumed prevalence (No trauma)p 1 = 0.06 Assumed prevalence (Trauma)p 2 = 0.12 Assumed dropout rate:10% Desired C.I.: 95% (i.e. z = 1.96) n i = [ ] n 1 = _____ n 2 = _____ n = _____ Take into account the drop out rate: N (number to enroll) / (% retained) N = ________________________

Sample Size to Estimate C.I. C.I. for (p 1 – p 2 ) n i = [p 1 (1-p 1 ) + p 2 (1-p 2 )] Z E 2 Dichotomous Outcome – Two Independent Samples (C.I.)(Practice) Example: Estimate required sample size for 95% C.I. for the difference in the annual incidence proportion of depression among teenagers by psychological trauma (trauma vs. no trauma). Parameters: Margin of error: 5% Assumed prevalence (No trauma)p 1 = 0.06 Assumed prevalence (Trauma)p 2 = 0.12 Assumed dropout rate:10% Desired C.I.: 95% (i.e. z = 1.96) n i = [0.06(1-0.06) (1-0.12)] n 1 = n 2 = n = Take into account the drop out rate: N (number to enroll) / (% retained) N = / 0.90 = subjects

Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p 2 2 Dichotomous Outcome – Two Independent Samples (H 0 Test) n i = 2 ES = | p 1 – p 2 | p(1– p) Example: Compare the prevalence of hypertension in a trial of a new drug versus placebo. Parameters/Assumptions: Margin of error: 20% reduction Assumed prevalence (Placebo)p 1 = 0.30 Assumed prevalence (Drug)p 2 = 0.24 Assumed dropout rate:10% 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | 0.30 – 0.24 | 0.27(1– 0.27) p = 0.27 = n i = = Take into account the drop out rate: N (number to enroll) / (% retained) N = 1717 / 0.90 = 1908 subjects n 1 = n 2 = n = 1717 A sample size of n = 1908 will ensure that a 2- sided test with α=0.05 has 80% power to detect a 20% reduction in the prevalence of hypertension attributed to the new drug.

Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p 2 2 Dichotomous Outcome – Two Independent Samples (H 0 Test)(Practice) n i = 2 ES = | p 1 – p 2 | p(1– p) Example: Compare prevalence of hyperglycemia in a trial of a new drug versus placebo. Parameters/Assumptions: Margin of error: 40% reduction Assumed prevalence (Placebo)p 1 = 0.50 Assumed prevalence (Drug)p 2 = 0.30 Assumed dropout rate:15% 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | p = 0.40 = _____ ni =ni = = ____ Take into account the drop out rate: N (number to enroll) / (% retained) N = ________________________ n 1 = ____ n 2 = ____ n = _____

Sample Size for Hypothesis Test Z 1 – α/2 + Z 1 – β ES H 0 : p 1 = p 2 2 Dichotomous Outcome – Two Independent Samples (H 0 Test)(Practice) n i = 2 ES = | p 1 – p 2 | p(1– p) Example: Compare prevalence of hyperglycemia in a trial of a new drug versus placebo. Parameters/Assumptions: Margin of error: 40% reduction Assumed prevalence (Placebo)p 1 = 0.50 Assumed prevalence (Drug)p 2 = 0.30 Assumed dropout rate:15% 2-sided type I error rate (α):0.05 Desired power (1-β):0.80 ES = | 0.50 – 0.30 | 0.40(1– 0.40) p = 0.40 = n i = = 94.1 Take into account the drop out rate: N (number to enroll) / (% retained) N = / 0.85 = subjects n 1 = 94.1 n 2 = 94.1 n = A sample size of n = 222 will ensure that a 2-sided test with α=0.05 has 80% power to detect a 40% reduction in the prevalence of hyperglycemia attributed to the new drug.

SECTION 5.7 Introduction to correlation

Learning Outcome: Describe the conceptual basis and properties of the correlation coefficient.

Correlation and Regression are both measures of association “Association” Statistical dependence between two variables: Exposure(e.g. risk factor, protective factor, predictor variable, treatment) Outcome(e.g. disease, event)

“Association” Example: The degree to which the rate of disease in persons with a specific exposure is either higher or lower than the rate of disease among those without that exposure. Correlation and Regression are both measures of association

Correlation and Regression are both measures of association Some Terms for “association” variables: Variable 1:“x” variable independent variable predictor variable exposure variable Variable 2:“y” variable dependent variable outcome variable

Correlation Coefficient Different types depending on numerical properties of “x” and “y” variables  Pearson: 2 continuous variables (~ normally distributed)  Spearman: 2 continuous variables (>1 variable not normally distributed)  Point bi-serial: one continuous and one binary variable  Phi-coefficient: two dichotomous variables

Correlation Coefficient Properties of correlation coefficients:  Range of -1.0 to 1.0  Value of -1.0 (perfect negative correlation)  Value of 1.0 (perfect positive correlation)  Value of 0 (no correlation (“association”)) As a rule of thumb, correlation coefficients: 0.0 to 0.30: “weak” 0.30 to 0.70: “moderate” 0.70 to 1.0: “high Usually, the p-value generated for r is based on the null hypothesis H 0 that r = 0.

Other points to note:  The correlation coefficient is unaffected by units of measurement  Correlations does not imply causation  Correlation should not be used when: a)There is a non-linear relationship between variables b)There are outliers c)There are distinct sub-group effects Correlation coefficients are spurious

SECTION 5.8 Calculate and interpret correlation coefficients

Learning Outcome: Calculate and interpret correlations coefficients: Pearson and Spearman (interpretation only)

Correlation Coefficient Computation Form: Pearson correlation (“r”) where x and y are the sample means of X and Y, s x and s y are the sample standard deviations of X and Y. Co-variation

The t-test for the correlation coefficient A t-test can be used to test whether the correlation between two variables is significant. The test statistic is t Guidelines: Using the t-test for the correlation coefficient 1. State H 0 and H Specify α. 3. Determine the degrees of freedom. d.f. = n – 2 4. Find the critical value(s) from table 2 with n-2 degrees of freedom 5. Compute the test statistic.

Example: Assume a correlation coefficient of 0.28 is observed with a sample size of n = 26. We wish to test this relationship in a 2-sided manner with α = State H 0 and H 1 H 0 : r = 0;H 1 : r = 0; 2. Specify α. α = 0.05 (2-sided) 3. Determine the degrees of freedom. d.f. = n – 2 d.f. = 26 – 2 = Find the critical value(s) from table 2 with d.f. = n-2 = Compute the test statistic. t = 0.28 ( ) (26 – 2) t = 1.43 Conclusion: 1.43 < Do not reject H 0

Practice: Assume a correlation coefficient of 0.43 is observed with a sample size of n = 22. We wish to test this relationship in a 2-sided manner with α = State H 0 and H 1 H 0 : _____;H 1 : _____; 2. Specify α. α = ___________ 3. Determine the degrees of freedom. d.f. = n – 2 d.f. = ______ 4. Find the critical value(s) from table 2 with d.f. = n-2 = _____ 5. Compute the test statistic. Conclusion: Accept or Reject H 0 t = _____

Practice: Assume a correlation coefficient of 0.43 is observed with a sample size of n = 22. We wish to test this relationship in a 2-sided manner with α = State H 0 and H 1 H 0 : r = 0;H 1 : r = 0; 2. Specify α. α = 0.05 (2-sided) 3. Determine the degrees of freedom. d.f. = n – 2 d.f. = 22 – 2 = Find the critical value(s) from table 2 with d.f. = n-2 = Compute the test statistic. t = 0.43 ( ) (22 – 2) t = 2.13 Conclusion: 2.13 > Reject H 0

Subject ID“x”“y”x i - xy i - y(x i – x) (y i – y) Sum of all observations Mean value Standard deviation So, r xy = = = 0.84 (8 - 1) x (6.24 x 10.18) See SAS page 1

Subject ID“x”“y”x i - xy i - y(x i – x) (y i – y) 1816 ??? ??? 36 ??? 4116 ??? ??? ??? 7816 ??? 8612 ??? Sum of all observations??? Mean value?? Standard deviation So, r xy = _________________________________ Practice Calculation

Subject ID“x”“y”x i - xy i - y(x i – x) (y i – y) Sum of all observations Mean value Standard deviation So, r xy = = = 0.47 (8 - 1) x (3.72 x 7.25) See SAS page 2

Correlation Coefficient Computation Form: Pearson correlation (“r”) From the formula above, it should be intuitive that the Pearson R is sensitive to extreme values

IDXY R0.161 See SAS pages 3-4

IDXY R0.573 See SAS pages 5-6

Correlation Coefficient With extreme values, you can use the Spearman “rank” correlation procedure to remove the undue influence of the extreme values. Assuming no ties in ranks Where d i = x i − y i between the ranks of each observation

Example: Incorrect use of Pearson R IDXY R0.696 See SAS page 7

IDXYXYRank XRank Ydidi R0.696Sum of178 6 x So, R s = = = (100-1) 990 See SAS page 8

SECTION 5.9 Use of correlation in Excel, Power Point, and SPSS

Learning Outcomes: Calculate correlation coefficients in Excel and SPSS Produce a scatter plot in Power Point to depict correlation

Calculate Correlation Coefficients Excel Plot in Power Point Excel: (refer to Excel spreadsheet) =CORREL(Array 1,Array 2) =CORREL(A4:A15,B4:B15) XY

Power Point: ---Insert Chart ---X-Y- Scatter ---Add Trend Line (click on data points) r = 0.76

SPSS: Analyze Correlate, Bivariate Pearson, Spearman Age, Body Mass Index

SPSS: Analyze Correlate, Bivariate Pearson, Spearman Glucose, Triglycerides