Power and Sample Size Adapted from: Boulder 2004 Benjamin Neale Shaun Purcell I HAVE THE POWER!!!

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Bivariate analysis HGEN619 class 2007.
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Hypothesis Testing Steps in Hypothesis Testing:
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Inferential Statistics & Hypothesis Testing
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
Multiple regression analysis
Power and Sample Size + Principles of Simulation Benjamin Neale March 4 th, 2010 International Twin Workshop, Boulder, CO.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Hypothesis : Statement about a parameter Hypothesis testing : decision making procedure about the hypothesis Null hypothesis : the main hypothesis H 0.
Lec 6, Ch.5, pp90-105: Statistics (Objectives) Understand basic principles of statistics through reading these pages, especially… Know well about the normal.
Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Genetic Dominance in Extended Pedigrees: Boulder, March 2008 Irene Rebollo Biological Psychology Department, Vrije Universiteit Netherlands Twin Register.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Today Concepts underlying inferential statistics
Hypothesis Testing Using The One-Sample t-Test
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Descriptive Statistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Inferential Statistics
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Karri Silventoinen University of Helsinki Osaka University.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
Overview of Statistical Hypothesis Testing: The z-Test
8 - 1 © 2003 Pearson Prentice Hall Chi-Square (  2 ) Test of Variance.
4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population.
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
Power & Sample Size and Principles of Simulation Michael C Neale HGEN 691 Sept Based on Benjamin Neale March 2014 International Twin Workshop,
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Categorical Data HGEN
Logic of Hypothesis Testing
Power and p-values Benjamin Neale March 10th, 2016
Hypothesis Testing: Hypotheses
Statistical Inference about Regression
Inferential Statistics
Power and Sample Size I HAVE THE POWER!!! Boulder 2006 Benjamin Neale.
Power Calculation Practical
Power Calculation Practical
Multivariate Genetic Analysis: Introduction
Presentation transcript:

Power and Sample Size Adapted from: Boulder 2004 Benjamin Neale Shaun Purcell I HAVE THE POWER!!!

Overview Introduce Concept of Power via Correlation Coefficient (ρ) Example Introduce Concept of Power via Correlation Coefficient (ρ) Example Discuss Factors Contributing to Power Discuss Factors Contributing to Power Practical: Practical: Simulating data as a means of computing power Simulating data as a means of computing power Using Mx for Power Calculations Using Mx for Power Calculations

3 Simple example Investigate the linear relationship between two random variables X and Y:  =0 vs.  0 using the Pearson correlation coefficient. Sample subjects at random from population Sample subjects at random from population Measure X andY Measure X andY Calculate the measure of association  Calculate the measure of association  Test whether   0. Test whether   0.

4 How to Test   0 Assume data are normally distributed Define a null-hypothesis (  = 0) Choose an  level (usually.05) Use the (null) distribution of the test statistic associated with  =0 t=  √ [(N-2)/(1-  2 )]

5 How to Test   0 Sample N=40 r=.303, t=1.867, df=38, p=.06  =.05 Because observed p > , we fail to reject  = 0 Have we drawn the correct conclusion that p is genuinely zero?

6  = type I error rate probability of deciding  0 (while in truth  =0)  is often chosen to equal.05...why? DOGMA

7 N=40, r=0, nrep=1000, central t(38),  =0.05 (critical value 2.04)

8 Observed non-null distribution (  =.2) and null distribution

9 In 23% of tests that  =0, |t|>2.024 (  =0.05), and thus correctly conclude that  0. The probability of correctly rejecting the null-hypothesis (  =0) is 1- , known as the power.

Hypothesis Testing Correlation Coefficient hypotheses: Correlation Coefficient hypotheses: h o (null hypothesis) is ρ=0 h o (null hypothesis) is ρ=0 h a (alternative hypothesis) is ρ ≠ 0 h a (alternative hypothesis) is ρ ≠ 0 Two-sided test, where ρ > 0 or ρ 0 or ρ < 0 are one-sided Null hypothesis usually assumes no effect Null hypothesis usually assumes no effect Alternative hypothesis is the idea being tested Alternative hypothesis is the idea being tested

11 Summary of Possible Results H-0 trueH-0 false accept H-01-  reject H-0  1-   =type 1 error rate  =type 2 error rate 1-  =statistical power

Rejection of H 0 Non-rejection of H 0 H 0 true H A true STATISTICS R E A L I T Y Nonsignificant result (1-  ) Type II error at rate  Significant result (1-  ) Type I error at rate 

Power The probability of rejecting the null-hypothesis depends on: The probability of rejecting the null-hypothesis depends on: the significance criterion (  ) the significance criterion (  ) the sample size (N) the sample size (N) the effect size (NCP) the effect size (NCP) “The probability of detecting a given effect size in a population from a sample of size N, using significance criterion  ”

P(T)P(T) T alpha 0.05 Sampling distribution if H A were true Sampling distribution if H 0 were true   POWER = 1 -  Standard Case Effect Size (NCP)

P(T)P(T) T alpha 0.1 Sampling distribution if H A were true Sampling distribution if H 0 were true POWER = 1 -  ↑ Impact of less conservative Impact of less conservative   

P(T)P(T) T alpha 0.01 Sampling distribution if H A were true Sampling distribution if H 0 were true POWER = 1 -  ↓ Impact of more conservative Impact of more conservative   

P(T)P(T) T alpha 0.05   Impact of increased sample size Reduced variance of sampling distribution if H A is true Sampling distribution if H 0 is true POWER = 1 -  ↑

P(T)P(T) T alpha 0.05 Sampling distribution if H A were true Sampling distribution if H 0 were true   POWER = 1 -  ↑ Impact of increase in Effect Size Effect Size (NCP)↑

Summary: Factors affecting power Effect Size Effect Size Sample Size Sample Size Alpha Level Alpha Level Type of Data: Type of Data: Binary, Ordinal, Continuous Binary, Ordinal, Continuous Research Design Research Design

Uses of power calculations Planning a study Planning a study Possibly to reflect on ns trend result Possibly to reflect on ns trend result No need if significance is achieved No need if significance is achieved To determine chances of study success To determine chances of study success

Power Calculations via Simulation Simulate Data under theorized model Simulate Data under theorized model Calculate Statistics and Perform Test Calculate Statistics and Perform Test Given α, how many tests p < α Given α, how many tests p < α Power = (#hits)/(#tests) Power = (#hits)/(#tests)

Practical: Empirical Power 1 Simulate Data under a model online Simulate Data under a model online Fit an ACE model, and test for C Fit an ACE model, and test for C Collate fit statistics on board Collate fit statistics on board

Practical: Empirical Power 2 First get ower-raw.mx and put it into your directory First get ower-raw.mx and put it into your directory Second, open this script in Mx, and note both places where we must paste in the data Second, open this script in Mx, and note both places where we must paste in the data Third, simulate data (see next slide) Third, simulate data (see next slide) Fourth, fit the ACE model and then fit the AE submodel Fourth, fit the ACE model and then fit the AE submodel

Practical: Empirical Power 3 Simulation Conditions Simulation Conditions 30% A 2 20% C 2 50% E 2 30% A 2 20% C 2 50% E 2 Input: Input: A C of E of A C of E of MZ 350 DZ 350 MZ 350 DZ Simulate and use “Space Delimited” option at Simulate and use “Space Delimited” option at or click here in slide show mode or click here in slide show modehere Click submit after filling in the fields and you will get a page of data Click submit after filling in the fields and you will get a page of data

Practical: Empirical Power 4 With the data page, use ctrl-a to select the data, control-c to copy, switch to Mx (e.g. with alt-tab) and in Mx control-v to paste in both the MZ and DZ groups. With the data page, use ctrl-a to select the data, control-c to copy, switch to Mx (e.g. with alt-tab) and in Mx control-v to paste in both the MZ and DZ groups. Run the ace.mx script with the data pasted in and modify it to run the AE model. Run the ace.mx script with the data pasted in and modify it to run the AE model. Report the -2log-likelihoods on the whiteboard Report the -2log-likelihoods on the whiteboard Optionally, keep a record of A, C, and E estimates of the first model, and the A and E estimates of the second model Optionally, keep a record of A, C, and E estimates of the first model, and the A and E estimates of the second model

Simulation of other types of data Use SAS/R/Matlab/Mathematica Use SAS/R/Matlab/Mathematica Any decent random number generator will do Any decent random number generator will do See ower/sim1.sas See ower/sim1.sas ower/sim1.sas ower/sim1.sas

27 R R is in your future R is in your future Can do it manually with rnorm Can do it manually with rnorm Easier to use mvrnorm Easier to use mvrnorm runmx at Matt Keller’s site: runmx at Matt Keller’s site: library (MASS) mvrnorm(n=100,c(1,1),matrix(c(1,.5,.5,1),2,2),empirical=FALSE)

Mathematica Example In[32]:= (mu={1,2,3,4}; sigma={{1,1/2,1/3,1/4},{1/2,1/3,1/4,1/5},{1/3,1/4,1/5,1/6},{1/4,1/5,1/6, 1/7}}; Timing[Table[Random[MultinormalDistribution[mu,sigma]],{1000}]][[1]]) Out[32]= 1.1 Second In[33]:= Timing[RandomArray[MultinormalDistribution[mu,sigma],1000]][[1]] Out[33]= 0.04 Second In[37]:= TableForm[RandomArray[MultinormalDistribution[mu,sigma],10]] Obtain mathematica from VCU

Theoretical Power Calculations Based on Stats, rather than Simulations Based on Stats, rather than Simulations Can be calculated by hand sometimes, but Mx does it for us Can be calculated by hand sometimes, but Mx does it for us Note that sample size and alpha-level are the only things we can change, but can assume different effect sizes Note that sample size and alpha-level are the only things we can change, but can assume different effect sizes Mx gives us the relative power levels at the alpha specified for different sample sizes Mx gives us the relative power levels at the alpha specified for different sample sizes

Theoretical Power Calculations We will use the power.mx script to look at the sample size necessary for different power levels We will use the power.mx script to look at the sample size necessary for different power levels In Mx, power calculations can be computed in 2 ways: In Mx, power calculations can be computed in 2 ways: Using Covariance Matrices (We Do This One) Using Covariance Matrices (We Do This One) Requiring an initial dataset to generate a likelihood so that we can use a chi-square test Requiring an initial dataset to generate a likelihood so that we can use a chi-square test

Power.mx 1 ! Simulate the data ! 30% additive genetic ! 20% common environment ! 50% nonshared environment #NGroups 3 G1: model parameters Calculation Begin Matrices; X lower 1 1 fixed Y lower 1 1 fixed Z lower 1 1 fixed End Matrices; Matrix X Matrix Y Matrix Z Begin Algebra; A = X*X' ; C = Y*Y' ; E = Z*Z' ; End Algebra; End

Power.mx 2 G2: MZ twin pairs Calculation Matrices = Group 1 Covariances A+C+E|A+C _ A+C|A+C+E / Options MX%E=mzsim.cov End G3: DZ twin pairs Calculation Matrices = Group 1 H Full 1 1 H Full 1 1 Covariances _ / Matrix H 0.5 Options MX%E=dzsim.cov End

Power.mx 3 ! Second part of script ! Fit the wrong model to the simulated data ! to calculate power #NGroups 3 G1 : model parameters Calculation Begin Matrices; X lower 1 1 free Y lower 1 1 fixed Z lower 1 1 free End Matrices; Begin Algebra; A = X*X' ; C = Y*Y' ; E = Z*Z' ; End Algebra; End

Power.mx 4 G2 : MZ twins Data NInput_vars=2 NObservations=350 CMatrix Full File=mzsim.cov Matrices= Group 1 Covariances A+C+E|A+C _ A+C | A+C+E / Option RSiduals End G3 : DZ twins Data NInput_vars=2 NObservations=350 CMatrix Full File=dzsim.cov Matrices= Group 1 H Full 1 1 H Full 1 1 Covariances _ | A+C+E / Matix H 0.5 Option RSiduals ! Power for alpha = 0.05 and 1 df Option Power= 0.05,1 End

35 Model Identification Necessary Conditions Necessary Conditions Sufficient Conditions Sufficient Conditions Algebraic Tests Algebraic Tests Empirical Tests Empirical Tests 35

36 Necessary Conditions Number of Parameters < or = Number of Statistics Number of Parameters < or = Number of Statistics Structural Equation Model usually count variances & covariances to identify variance components Structural Equation Model usually count variances & covariances to identify variance components What is the number of statistics/parameters in a univariate ACE model? Bivariate? What is the number of statistics/parameters in a univariate ACE model? Bivariate? 36

37 Sufficient Conditions No general sufficient conditions for SEM No general sufficient conditions for SEM Special case: ACE model Special case: ACE model Distinct Statistics (i.e. have different predicted values Distinct Statistics (i.e. have different predicted values VP = a2 + c2 + e2 VP = a2 + c2 + e2 CMZ = a2 + c2 CMZ = a2 + c2 CDZ =.5 a2 + c2 CDZ =.5 a2 + c2 37

38 Sufficient Conditions 2 Arrange in matrix form Arrange in matrix form a2 VP a2 VP c2 = CMZ c2 = CMZ e2 CDZ e2 CDZ A x = b A x = b If A can be inverted then can find A -1 b If A can be inverted then can find A -1 b 38

39 Sufficient Conditions 3 39 Solve ACE modelCalc ng=1Begin Matrices; A full 3 3 b full 3 1End Matrices;Matrix A Labels Col A A C ELabels Row A VP CMZ CDZMatrix b ! Data, essentially1.8.5Labels Col B StatisticLabels Row B VP CMZ CDZBegin Algebra; C = A~; x = A~*b;End Algebra;Labels Row x A C EEnd

40 Sufficient Conditions 4 What if not soluble by inversion? What if not soluble by inversion? Empirical: Empirical: 1 Pick set of parameter values T 1 1 Pick set of parameter values T 1 2 Simulate data 2 Simulate data 3 Fit model to data starting at T 2 (not T 1 ) 3 Fit model to data starting at T 2 (not T 1 ) 4 Repeat and look for solutions to step 3 that are perfect but have estimates not equal to T 1 4 Repeat and look for solutions to step 3 that are perfect but have estimates not equal to T 1 If equally good solution but different values, reject identified model hypothesis If equally good solution but different values, reject identified model hypothesis 40

Conclusion Power calculations relatively simple to do Power calculations relatively simple to do Curse of dimensionality Curse of dimensionality Different for raw vs summary statistics Different for raw vs summary statistics Simulation can be done many ways Simulation can be done many ways No substitute for research design No substitute for research design