Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Tests of Hypotheses Based on a Single Sample
Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Lab 3 : Exact tests and Measuring of Genetic Variation.
Lab 3 : Exact tests and Measuring Genetic Variation.
Chapter 9 Hypothesis Testing Understandable Statistics Ninth Edition
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
METHODS FOR HAPLOTYPE RECONSTRUCTION
1 G Lect 2a G Lecture 2a Thinking about variability Samples and variability Null hypothesis testing.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Decision Errors and Power
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Introduction to Statistics
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Basics of Linkage Analysis
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Richard M. Jacobs, OSA, Ph.D.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
AM Recitation 2/10/11.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
CHP400: Community Health Program - lI Research Methodology. Data analysis Hypothesis testing Statistical Inference test t-test and 22 Test of Significance.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
HYPOTHESIS TESTING. Statistical Methods Estimation Hypothesis Testing Inferential Statistics Descriptive Statistics Statistical Methods.
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Retain H o Refute hypothesis and model MODELS Explanations or Theories OBSERVATIONS Pattern in Space or Time HYPOTHESIS Predictions based on model NULL.
Chapter 10 The t Test for Two Independent Samples
© Copyright McGraw-Hill 2004
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
A short introduction to epidemiology Chapter 6: Precision Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
P-values.
Genome Wide Association Studies using SNP
Lecture 4: Meta-analysis
Hypothesis Testing: Hypotheses
Gerald Dyer, Jr., MPH October 20, 2016
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Presentation transcript:

Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health and Population Studies, University of KwaZulu-Natal, Durban, South Africa

Introductions Public databases and resources for genetics whole genome sequencing and fine-mapping Genetics GWAS results and interpretation GWAS QC Basic principles of measuring disease in populations population genetics Principal components analyses Basic genotype data summaries and analyses GWAS association analyses WTAC Durban module summaries V2 EpidemiologyBioinformatics meta-analysis and power of genetic studies

Objectives Power – Define and be able to calculate the power to detect a genetic association – Understand the impact of various parameters on the power of a genetic association test … and thus ways to increase power Meta analysis – Define meta-analysis and appreciate how it can be used to increase power for discovery and replication in genetic association testing – Explore the stages in a typical genome-wide discovery meta- analysis – Combined effect size estimates – Appraisal of the evidence 3

Power 4

Power of a statistical test is the probability of detecting an effect, given that it is there Equivalently, power is the probability of rejecting the null hypothesis when it is false E.g. power of 0.9 or 90% = if we repeat an association test at a locus with a real effect 1000 times, then we would expect to see a statistically significant difference 900 times.

Power and Significance Type I Error [  ] Significance – Probability of incorrectly detecting an effect – Significance;  ; false positive; P(detected|false) – Typical significance levels are < 5%, 1% Type II Error [  – Probability of incorrectly rejecting an effect – Power; 1-  true positive ; P(detected|true) – Typical power values are > 80% Decreasing Type I increases Type II and vice versa Aim to minimise  and maximise power 6 EffectDetectReject True 1-  False  1- 

7 E.g. Power to detect association at a SNP with risk allele frequency = 0.3 in cases and an allelic OR = 1.1

Significance levels in GWAS Frequentist approach Suppose you have m=20 tests and  =0.05 P(one or more false positive) = 1-P(no false positives) = 1-(1-  ) m = 0.64 Bonferroni  = 0.05/m – Control the probability of one or more false positives – P(one or more false positives) = 1-(1-  m) m ≈ m  – Assumes tests are independent, conservative False Discovery – Control the proportion of false positives among all significant results 8

Significance levels in GWAS Bayesian approach True Discovery – Control the proportion of true positives among all significant results 9

Significance levels in GWAS Bayesian approach True Discovery – Control the proportion of true positives among all significant results 10

Significance levels in GWAS Bayesian approach True Discovery – Control the proportion of true positives among all significant results – Depends on your prior belief of an association  Replication 1/100  Candidate gene study 1/1,000  GWAS 1/100,000  E.g. For a prior of 1 x 10 -5, power of 0.5 and  =5 x 10 -8, the posterior probability of a true association is ~ 0.99

True Discovery Rate 12

Why calculate power? To determine the sample size required to achieve a given power to detect an anticipated effect ….or whether the given sample size has sufficient power Also sheds light on the result of a completed study, particularly in the interpretation of negative results Often required as part of a grant proposal

Calculating Power Many genetic association tests have a   distribution Shape of   distribution depends on the non centrality parameter (NCP) and degrees of freedom (df). – For a test statistic T, Under the null: NCP=0 and E(T)=df central   Under the alternative: NCP≠0 and E(T)=NCP+df noncentral   Because shapes of central and noncentral   are known, we can deduce the areas under the curves and hence the power if we know the NCP, df and type I error 

E.g. Case/control allelic test A bi-allelic causal SNP genotyped in N samples with  cases: The true population effect, e.g.  =log(OR), depends on the disease prevalence, effect size and risk allele frequency f For an allelic test at the causal SNP For an allelic test at a marker locus in LD r 2 with the causal SNP For the same power at the marker locus, the sample size must be increased by a factor of 1/r 2

Power summary Power to detect association at a marker locus depends on – Sample size N and proportion  of cases – SNP allele frequency f – Effect size e.g. OR=exp(  ), RR – LD r 2 between marker and causal SNP – Disease prevalence – Disease model e.g. additive (df=1), genotypic (df=2) etc. – Type I error  Investigator can increase power – Increasing sample size – Increasing effect size. E.g. by extreme designs or reducing measurement error – Reducing LD. E.g. by genotyping a region of interest more densely

Meta Analysis 17

What is Meta-analysis ? The statistical synthesis of information from multiple independent studies to obtain a summary based on evidence from the combined data Increase power by increasing sample size Reduce false positive findings Evaluate consistency (homogeneity) or inconsistency (heterogeneity) of results across multiple datasets Meta-analysis can be used for the discovery of new variants or for the replication of previous finding 18

Typical Genome-wide meta-analysis Study 1 Association Signals Replication in similar populations Association testing in diverse populations Replicating loci Re- sequenci ng & fine- mapping Causal Variants 19 Study 2 Study 3 Study 4 Study 5 Association signals Association Signals Meta- analysis

E.g. MalariaGEN Consortium – Individual studies 20

E.g. MalariaGEN Consortium – Meta-analysis 21

Synthesize results 22 There are several ways to combine datasets in a meta- analysis framework.. P-value meta-analysis Effect-size meta-analysis – Fixed Effects – Random Effects – Bayesian approach Multivariate approaches Other extensions – E.g. Multiple phenotypes; multiple variants; main and interaction effects

Effect size meta-analysis Given effect sizes and standard errors from multiple studies we estimate a combined effect by computing a weighted mean Fixed Effects Model – One true effect size shared by all studies – Combined effect estimates the common effect size – Observed effect size varies due to random error in each study – Weights assigned according to amount of information captured in each study i.e. large studies given more weight Random Effects Model – True effect size can vary from study to study – Each study is estimating a different effect size – Combined effect estimates the mean of the distribution of effects – Observed effect size varies due to random error in each study and true variation in effect size between studies – Weights are more balanced compared to fixed effects model 23

Evidence for association 24

Evidence for association 25

Heterogeneity Genuine diversity in the genetic effects between studies may arise for a variety of reasons – Variable LD between typed and causal variant – Study phenotype may be correlated true phenotype with variable correlation across studies E.g. FTO – Differences in environmental exposure – Chance Heterogeneity must be carefully examined against potential biases. – Differences in study design – Population structure – Publication bias, selective outcome bias etc. Commonly used statistical heterogeneity metrics include Cochran’s Q statistic or the between study variance  2 26

Summary Meta-analysis can improve the power to detect and validate associations with common variants with small effects typical in major diseases A wide array of methods for meta-analysis is available, including fixed effects, random effects and Bayesian approaches each with particular advantages and disadvantages Meta-analysis allows the exploration of heterogeneity of genetic effects across data sets as well as providing summary effects Selection biases need to be carefully considered and reported in any meta-analysis Careful collection and quality checking of information is essential to avoid errors 27

Additional Reading Hum Genet Feb;123(1):1-14. Epub 2007 Nov 17. Methods for meta- analysis in genetic association studies: a review of their potential and pitfalls. F.K.Kavoura & J.P.Ioannidis Trends Genet Sep;20(9): Meta-analysis of genetic association studies. M.R.Munafo & J. Flint Pharmacogenomics Feb;10(2): doi: / Meta-analysis in genome-wide association studies. E. Zeggini & JP Ioannidis. 28