What is an association study? Define linkage disequilibrium

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Lecture 2 Strachan and Read Chapter 13
Genetic Approaches to Thinking, Moving and Feeling
Genome Wide Association Study (GWAS) and Personalized Medicine
Planning breeding programs for impact
Which Phenotypes Can be Predicted from a Genome Wide Scan of Single Nucleotide Polymorphisms (SNPs): Ethnicity vs. Breast Cancer Mohsen Hajiloo, Russell.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Genetic Analysis in Human Disease
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
Understanding Genetics of Schizophrenia
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
Non-Mendelian Genetics
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
A basic review of genetics Dr. Danny Chan Associate Professor Assistant Dean (Faculty of Medicine) Department of Biochemistry Department of Biochemistry.
Genome-Wide Association Study (GWAS)
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Methods in genome wide association studies. Norú Moreno
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
INTRODUCTION TO ASSOCIATION MAPPING
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
Power Calculations for GWAS
Single Nucleotide Polymorphisms (SNPs
SNPs and complex traits: where is the hidden heritability?
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
upstream vs. ORF binding and gene expression?
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
High level GWAS analysis
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Genome-wide Associations
Beyond GWAS Erik Fransen.
Medical genomics BI420 Department of Biology, Boston College
Perspectives from Human Studies and Low Density Chip
Medical genomics BI420 Department of Biology, Boston College
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Presentation transcript:

What is an association study? Define linkage disequilibrium Miranda Durkie January 2010

What is an association study? Association is a statistical measure of the co-occurrence of certain phenotypic traits with certain alleles. An association study is an examination of genetic variation across a given genome, designed to identify genetic associations with observable traits.

How does association occur? Direct causation: having allele A makes you susceptible to disease D. Possession of A may not be sufficient in itself to give you D but it makes it more likely you’ll develop D. Natural selection: people who have disease D may be more likely to survive and reproduce if they have allele A. Population stratification: the population contains several distinct genetic subsets and both disease D and allele A both happen to be more common in one particular subset. Type 1 error: association studies test a large number of markers to find significant associations (p < 0.05). However by chance 5% of results will be significant at p = 0.05 and 1% at p = 0.01. Therefore data needs correction and in the past this was not done adequately so results could not be replicated. Linkage disequilibrium: aim of association studies is to discover associations caused by linkage disequilibrium of allele A and disease D.

Linkage Linkage analysis is used to track the inheritance of alleles within a family. Linked markers or alleles are only separated if a recombination event occurs. The closer a marker is it to disease/susceptibility allele the less likely it is to be separated by recombination over several generations. This leads to a common haplotype which occurs more often than would be expected by chance. Within an individual family this linkage will extend up to 20cM but for association studies only few kb Linkage disequilibrium is the non-random association between two or more alleles located together on the same chromosome.

Linkage disequilibrium 2 markers with alleles Aa and Bb Frequency of allele A=p and a=1-p Frequency of allele B=q and b=1-q If there is no association then AB occurs at frequency pq However if frequency of AB>pq then AB must be in postive LD.

Association vs linkage studies Linkage is the relationship between alleles, whilst association is the relationship between alleles and phenotypes. Association studies do not study families but instead look for differences in allele frequencies between different groups of individuals with defined phenotypes. For both studies, the disease-causing mutation and/or susceptibility allele does not need to be known. Instead SNPs or other markers such as di-, tri- or tetra-nucleotide repeats which are in linkage disequilibrium with the disease/susceptibility allele are used.

Designing an association study Identify SNPs to analyse Genotype all SNPs in subset of the samples Identify tagSNPs Genotype tagSNPs in all samples Analyse data

1. Identify SNPs to analyse Work out region of interest, or choose regions of known homology from a mouse or other animal model. Work out size of area you wish to study is e.g. choose a 1Mb region around your locus of interest and choose one SNP every 500bp. If possible include SNPs that have been validated in the same ethnic group as the one you are studying. Prioritise SNPs with higher polymorphic frequencies (>10%)

Identify SNPs cont. If looking within genes prioritise possible functional variants e.g. non-synonymous SNPs within exons Read current literature to find if out if any of the SNPs have been associated with similar phenotypes in other studies Ensure that there are no SNPs under the primer or probe binding sites which could lead to non-amplification of one allele and skew your results Due to advances in technology majority of current association studies now look at whole genome = genome-wide association studies (GWAS)

2. Genotype subset of samples Ensure cases and controls are ethnically matched Ensure methodology is robust, accurate and high-throughput e.g. SNParrays - which one? Exonic only? Platform? Cost? No of SNPs? Genotype at least 96 controls and if you wish 96 cases Record the genotypes conservatively i.e. if unsure mark as unknown Analyse the data to Check for deviation from Hardy-Weinberg equilibrium for all alleles - if a deviation is found it is likely that genotyping errors have been made so re-check Calculate LD scores for SNPs in the region Identify tagSNPs (also called haplotype tagging or htSNPs)

3. Identify tagSNPs Over 10 million SNPs in human genome Linked SNPs are often inherited together as a block and the genotypes of these SNPs can be used to generate a haplotype. The key SNPs that uniquely define the haplotype are called tagSNPs or haplotype tagging SNPs HapMap project started in 2002 and was international collaboration to describe common patterns of genetic variation between individuals Identified around 500,000 key tagSNPs which can be used to generate inferred haplotypes of surrounding SNPs This has made genome-wide scans more efficient and comprehensive.

4. Genotype tagSNPs in all samples Commercially available SNP arrays have been designed by several companies e.g. Affymetrix and Illumina to cover hundreds of thousands of SNPs across the whole genome. They can have slightly different target SNPs e.g. Illumina Human-1 focuses on exonic SNPs thus concentrating on potential functional variants. These arrays use tagSNPs to maximise the amount of data generated by as few SNPs as possible. In recognition of the potential role of CNVs in complex disease susceptibility many arrays also study CNVs.

How many samples? Must ensure sufficient cases and controls are tested to reach statistical significance The lower the odds ratio for an increase in susceptibility, the more samples are required for the testing to reach statistical significance. It is estimated that common susceptibility loci are likely to have odds ratios (OR) of 1.1 to 1.5. Therefore, for example, in order to achieve 90% power to detect an allele with 0.2 frequency and an OR of 1.2, more than 6000 affected cases and more than double that number of normal controls are required. If the frequency of the variant is only 0.05 you would need 20,000 cases.

5. Analyse data Do single-point analysis first by looking at individuals SNPs and calculating 2 and odds ratios. Need to apply a correction for multiple testing e.g. Bonferroni correction is conservative correction used for studying multiple alleles that are in LD with each other (non-independent tests) Once you have tested each individual SNP for association you can then construct haplotypes and study them for association with the disease/trait Use bioinformatics programs such as HelixTree, SNPHAP and Stata Because of the problems with sample size for detecting low susceptibility traits, meta-analysis has been increasingly used. Meta-analysis of GWA datasets can increase the power to detect association signals by increasing sample size and by examining more variants throughout the genome than each dataset alone.

Real examples 1 2007 Wellcome Trust published GWA study looking at 2,000 cases of seven common diseases and 3,000 shared controls. Found 24 associations: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. Linked 10 genes to common disorders not previously known Colorectal cancer GWA has found 10 associated SNPs, 5 of which are linked to TGFβ superfamily signalling pathway

Real examples 2 GWA studies have led to the discovery of at least 24 loci linked to type 2 diabetes Mainly linked to insulin secretion pathway rather than insulin resistance However it is estimated that these loci only account for 5% of the factors contributing to heritability of T2D Studies of hundreds of thousands or even thousands of thousands of individual required to identify low susceptibility alleles CNVs associations found linked to schizophrenia, alzheimers and parkinsons

Future of GWA Study of gene-gene and gene-environment interactions crucial which may be missed by single-point GWA Majority of associated variants will not be functional therefore work will be required to identify causal variants SNPs account for 78% variation in genome but only 26% of total nucleotide differences Further study of CNVs will be crucial Study of rare rather than common variants (1000G) Study of regulatory variants Next generation sequencing