Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

What is an association study? Define linkage disequilibrium
Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Association Tests for Rare Variants Using Sequence Data
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
GENOMICS TERM PROJECT Assessment of Significance in a SNP.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Differentially expressed genes
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Association Analysis SeattleSNPs March 21, 2006 Dr. Chris Carlson FHCRC.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Study Design Discussion The Ghost of Candidate Gene Past and the Ghost of Genome-wide Association Yet to Come Stephen S. Rich, Ph.D. Wake Forest University.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Hypothesis Testing.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
CS177 Lecture 10 SNPs and Human Genetic Variation
What host factors are at play? Paul de Bakker Division of Genetics, Brigham and Women’s Hospital Broad Institute of MIT and Harvard
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Genome-Wide Association Study (GWAS)
BGRS 2006 SEARCH FOR MULTI-SNP DISEASE ASSOCIATION D. Brinza, A. Perelygin, M. Brinton and A. Zelikovsky Georgia State University, Atlanta, GA, USA 123.
Back to basics – Probability, Conditional Probability and Independence Probability of an outcome in an experiment is the proportion of times that.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
An example of using BFDP for identifying noteworthy associations Joan Dong.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lecture 22: Quantitative Traits II
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Association tests. Basics of association testing Consider the evolutionary history of individuals proximal to the disease carrying mutation.
Boulder 2008 Benjamin Neale I have the power and multiple testing.
More about tests and intervals CHAPTER 21. Do not state your claim as the null hypothesis, instead make what you’re trying to prove the alternative. The.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Power Calculations for GWAS
Genome Wide Association Studies using SNP
Recombination (Crossing Over)
BMI/CS 776 Spring 2018 Anthony Gitter
High level GWAS analysis
P-value Approach for Test Conclusion
Power to detect QTL Association
Genome-wide Associations
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Evaluation of power for linkage disequilibrium mapping
Presentation transcript:

Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA

Motivation Whole genome association study How to perform multiple hypothesis correction –To increase statistical power Incorporate prior information on molecular function of associated loci Information on linkage disequilibrium structure

Main idea Traditional method –Use a single significance threshold In practice, markers are not identical Set a different threshold at each marker, which reflects both intrinsic (e.g. LD, allele freq.) and extrinsic information on the markers

Standard Association Study M markers in N cases and N controls f i = minor allele frequency at marker i True case/control allele frequency Marker d: casual variant with a relative risk

Standard Association Study Test statistic ~ N(,1) Power at a single marker (probability of detecting an association with N individuals at p-value or significance threshold t

Multiple Hypothesis correction Fix the false positive rate at each marker so that the total false positive rate is α Bonferroni correction – t i = α/M Expected power: where c i is the probability of marker i to be causal  Probability of rejecting the correct null hypothesis

Multi-Threshold Association Allow a different threshold t i for each marker Power: with adjusted false positive rate Goal: set values for t i to maximize the power subject to the constraints

Maximizing the Power Gradient at each marker will be equal at the optimal point Given a value of gradient, solve for the threshold at each marker to achieve that gradient Do binary search over the gradient until thresholds sum to α

Maximizing Power for Proxies In practice, markers are tags for causal variation Given K variants, assign each potential causal variation v k to the best marker i The effective non-centrality parameter is reduced by a factor of | r ki | where r ki is the correlation coefficient between variant k and marker i. If v k is causal, the power function when observing proxy marker i is

Maximizing Power for Proxies Each variant k has a prob of being causal c k The total power captured by each marker i The total power of the association study

Candidate Gene study 1000 cases and controls over ENCODE regions using markers in Affymetrix 500k genechip

Robustness over relative risks

Whole Genome Association Assumption –Each SNP is equally likely to be causal with relative risk of 2 Power for traditional study and multi- threshold association for 2,614,057 SNPs –avg: / –Avg over power in [0.1, 0.9]: / 0.615

Impact of extrinsic information 1.cSNPs are more likely to be involved in disease 2.Add information on se of genes which are more likely to be involved in specific disease 30,700 cSNPs in HapMap contributes to 20% of the disease causing variation Cancer Gene Census: 363 genes in which mutations have been implicated in cancer. 20% of causal variation is assumed in these genes