Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
METHODS FOR HAPLOTYPE RECONSTRUCTION
Lecture 2: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
The search process: A way to rank endophenotypes (e) for relevance to a disease (i) phenotype.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Multiple Testing, Permutation, False Discovery Benjamin Neale Pak Sham Shaun Purcell.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Estimating “Heritability” using Genetic Data David Evans University of Queensland.
Power in QTL linkage: single and multilocus analysis Shaun Purcell 1,2 & Pak Sham 1 1 SGDP, IoP, London, UK 2 Whitehead Institute, MIT, Cambridge, MA,
Quantitative Genetics
Lecture 6: Multiple Regression
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Linkage Analysis in Merlin
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Rare and common variants: twenty arguments G.Gibson Homework 3 Mylène Champs Marine Flechet Mathieu Stifkens 1 Bioinformatics - GBIO K.Van Steen.
Copy the folder… Faculty/Sarah/Tues_merlin to the C Drive C:/Tues_merlin.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Module 7: Estimating Genetic Variances – Why estimate genetic variances? – Single factor mating designs PBG 650 Advanced Plant Breeding.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Karri Silventoinen University of Helsinki Osaka University.
Beak of the Finch Natural Selection Statistical Analysis.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Linkage in selected samples Manuel Ferreira QIMR Boulder Advanced Course 2005.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Permutation Analysis Benjamin Neale, Michael Neale, Manuel Ferreira.
Power and Sample Size Boulder 2004 Benjamin Neale Shaun Purcell.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Power of linkage analysis Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
An quick overview of human genetic linkage analysis
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Lecture 22: Quantitative Traits II
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.
David M. Evans Multivariate QTL Linkage Analysis Queensland Institute of Medical Research Brisbane Australia Twin Workshop Boulder 2003.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
A simple method to localise pleiotropic QTL using univariate linkage analyses of correlated traits Manuel Ferreira Peter Visscher Nick Martin David Duffy.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
A Fine Mapping Theorem to Refine Results from Association Genetics Studies S.J. Schrodi, V.E. Garcia, C.M. Rowland Celera, Alameda, CA ABSTRACT Justification.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
15 Inferential Statistics.
Power in QTL linkage analysis
Regression Models for Linkage: Merlin Regress
Stratification Lon Cardon University of Oxford
Genome Wide Association Studies using SNP
Can resemblance (e.g. correlations) between sib pairs, or DZ twins, be modeled as a function of DNA marker sharing at a particular chromosomal location?
I Have the Power in QTL linkage: single and multilocus analysis
Regression-based linkage analysis
Linkage in Selected Samples
Why general modeling framework?
Multivariate Linkage Continued
Lecture 9: QTL Mapping II: Outbred Populations
Univariate Linkage in Mx
Power Calculation for QTL Association
Presentation transcript:

Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston

Outline 1. Aim 2. Statistical power 3. Estimate the power of linkage / association analysis 4. Improve the power of linkage analysis Analytically Empirically

1. Aim

1. Know what type-I error and power are 2. Know that you can/should estimate the power of your linkage/association analyses (analytically or empirically) 4. Be aware that there are MANY factors that increase type-I error and decrease power 3. Know that there a number of tools that you can use to estimate power

2. Statistical power

Type-1 error H 0 is true α In reality… Type-2 error β 1 - α Power 1 - β H 0 : Person A is not guilty H 1 : Person A is guilty – send him to jail H 1 is true H 0 is true H 1 is true We decide… Power: probability of declaring that something is true when in reality it is true.

xxxxxx x x x x x x x x x x x x x x x x x x x x xxxxxx x x x x x x x x x x x x x x x x x x x x H 0 : There is NO linkage between a marker and a trait H 1 : There is linkage between a marker and a trait Linkage test statistic has different distributions under H 0 and H 1 x x

Where should I set the threshold to determine significance? x ThresholdPower (1 – β)Type-1 error (α) To low High I decide H 1 is true (Linkage) I decide H 0 is true

Where should I set the threshold to determine significance? x ThresholdPower (1 – β)Type-1 error (α) To low High To high Low I decide H 1 is true I decide H 0 is true

How do I maximise Power while minimising Type-1 error rate? x I decide H 1 is true I decide H 0 is true Power (1 – β) Type-1 error (α) 1. Set a high threshold for significance (i.e. results in low α [e.g ]) 2. Try to shift the distribution of the linkage test statistic when H 1 is true as far as possible from the distribution when H 0 is true.

Non-centrality parameter H0H0 H1H1 NCP Mean (μ) Variance (σ 2 ) Central Χ 2 df 2*(df) Non-central Χ 2 df + NCP 2*(df) + 4*NCP These distributions ARE NOT chi-sq with 1df!! Just for illustration.. Run R script in folder to see what they really look like..

H0H0 H1H1 NCP Small NCP Big overlap between H 0 and H 1 distributions Lower power Large NCP Small overlap between H 0 and H 1 distributions Greater power

Short practical on GPC Genetic Power Calculator is an online resource for carrying out basic power calculations. For our 1st example we will use the probability function calculator to play with power

1. Go to: ‘ ’ Click the ‘Probability Function Calculator’ tab.Probability Function Calculator 2. We’ll focus on the first 3 input lines. These refer to the chi-sq distribution that we’re interested in right now. Using the Probability Function Calculator of the GPC NCP Degrees of freedom of your test. E.g. 1df for univariate linkage (ignoring for now that it’s a mixture distribution)

1. Let’s start with a simple exercise. Determine the critical value (X) of a chi-square distribution with 1 df and NCP = 0, such that P(X>x) = Exercises df = 1 NCP = 0 P(X>x) = 0.05 X = ? Determine the P(X>x) for a chi-square distribution with 1 df and NCP = 0 and X = df = 1 NCP = 0 P(X>x) = ? X = 3.84

2. Find the power when the NCP of the test is 5, degrees of freedom=1, and the critical X is Exercises df = 1 NCP = 5 P(X>x) = ? X = 3.84 What if the NCP = 10? df = 1 NCP = 10 P(X>x) = ? X = NCP = 5 NCP =

3. Find the required NCP to obtain a power of 0.8, for degrees of freedom=1 and critical X = Exercises df = 1 NCP = ? P(X>x) = 0.8 X = 3.84 What if the X = 13.8? df = 1 NCP = ? P(X>x) = 0.8 X = NCP = ? = NCP = ? = 0.8

2. Estimate power for linkage and association

Why is it important to estimate power? To determine whether the study you’re designing/analysing can in fact localise the QTL you’re looking for. Study design and interpretation of results. You’ll need to do it for most grant applications. When and how should I estimate power? Study design stage Analysis stage How? Theoretically, empirically Empirically When?

Theoretical power estimation NCP determines the power to detect linkage NCP = μ(H 1 is true) - df If we can predict what the NCP of the test will be, we can estimate the power of the test

4. Marker informativeness (i.e. Var(π) and Var(z)) Theoretical power estimation *Linkage* Variance Components linkage analysis (and some HE extensions) 1. The number of sibs in the sibship (s) 2. Residual sib correlation (r) 3. Squared variance due to the additive QTL component (V A ) 5. Squared variance due to the dominance QTL component (V D ). ^ Sham et al AJHG 66: 1616

Another short practical on GPC The idea is to see how genetic parameters and the study design influence the NCP – and so the power – of linkage analysis

1. Go to: ‘ ’ Click the ‘VC QTL linkage for sibships’ tab.VC QTL linkage for sibships Using the ‘VC QTL linkage for sibships’ of the GPC

1. Let’s estimate the power of linkage for the following parameters: Exercises QTL additive variance: 0.2 QTL dominance variance: 0 Residual shared variance: 0.4 Residual nonshared variance: 0.4 Recombination fraction: 0 Sample Size: 200 Sibship Size: 2 User-defined type I error rate: 0.05 User-defined power: determine N : 0.8 Power = 0.36 (alpha = 0.05) Sample size for 80% power = 681 families

2. We can now assess the impact of varying the QTL heritability Exercises QTL additive variance: 0.4 QTL dominance variance: 0 Residual shared variance: 0.4 Residual nonshared variance: 0.4 Recombination fraction: 0 Sample Size: 200 Sibship Size: 2 User-defined type I error rate: 0.05 User-defined power: determine N : 0.8 Power = 0.73 (alpha = 0.05) Sample size for 80% power = 237 families

3. … the sibship size Exercises QTL additive variance: 0.2 QTL dominance variance: 0 Residual shared variance: 0.4 Residual nonshared variance: 0.2 Recombination fraction: 0 Sample Size: 200 Sibship Size: 3 User-defined type I error rate: 0.05 User-defined power: determine N : 0.8 Power = 0.99 (alpha = 0.05) Sample size for 80% power = 78 families

CaTS performs power calculations for large genetic association studies, including two stage studies. Theoretical power estimation *Association: case-control*

TDT Power calculator, while accounting for the effects of untested loci and shared environmental factors that also contribute to disease risk Theoretical power estimation *Association: TDT*

Theoretical power estimation Advantages: Fast, GPC, CaTS Disadvantages: Approximation, may not fit well individual study designs, particularly if one needs to consider more complex pedigrees, missing data, ascertainment strategies, different tests, etc…

Empirical power estimation Mx: simulate covariance matrices for 3 groups (IBD 0, 1 and 2 pairs) according to an FQE model (i.e. with V Q > 0) and then fit the wrong model (FE). The resulting test statistic (minus 1df) corresponds to the NCP of the test. See powerFEQ.mx script. Still has many of the disadvantages of the theoretical approach, but is a useful framework for general power estimations. Simulate data: generate a dataset with a simulated marker that explains a proportion of the phenotypic variance. Test the marker for linkage with the phenotype. Repeat this N times. For a given α, Power = proportion of replicates with a P-value < α (e.g. < 0.05).

Empirical power estimation *Linkage / Association* Example with LINX

3. How to improve power

Factors that influence type-1 error and power 1. Ascertainment 2. Disease model QTL heritability, MAF, disease prevalence Family structure, selective sampling 3. Deviations in trait distribution 4. Pedigree errors 5. Genotyping errors 7. Genome coverage LinkageAssociation Family-based Case-control 6. Missing data

Pedigree errors Definition. When the self-reported familial relationship for a given pair of individuals differs from the real relationship (determined from genotyping data). Similar for gender mix-ups. Impact on linkage and FB association analysis. Increase type-1 error rate (can also decrease power) Detection. Can be detected using genome-wide patterns of allele sharing. Some errors are easy to detect. Software: GRR. Correction. If problem cannot be resolved, delete problematic individuals (family) Boehnke and Cox (1997), AJHG 61: ; Broman and Weber (1998), AJHG 63:1563-4; McPeek and Sun (2000), AJHG 66: ; Epstein et al. (2000), AJHG 67:

Pedigree errors *Impact on linkage* CSGA (1997) A genome-wide search for asthma susceptibility loci in ethnically diverse populations. Nat Genet 15: ~15 families with wrong relationships No significant evidence for linkage Error checking is essential!

GRR Pedigree errors *Detection/Correction*

Practical Aim: Identify pedigree errors with GRR 1. Go to: ‘ Egmondserver\share\Programs ’ Copy entire ‘GRR’ folder into your desktop. 2. Go into the ‘GRR’ folder in your desktop, and run the GRR.exe file. 3. Press the ‘Load’ button, and navigate into the same ‘GRR’ folder on the desktop. Select the file ‘sample.ped’ and press ‘Open’. Note that all sibpairs in ‘sample.ped’ were reported to be fullsibs or half-sibs. I’ll identify one error. Can you identify the other two?

Summmary 1. Statistical power 2. Estimate the power of linkage analysis 3. Improve the power of linkage analysis