Presentation is loading. Please wait.

Presentation is loading. Please wait.

Miranda Durkie January 2010.  Association is a statistical measure of the co- occurrence of certain phenotypic traits with certain alleles.  An association.

Similar presentations

Presentation on theme: "Miranda Durkie January 2010.  Association is a statistical measure of the co- occurrence of certain phenotypic traits with certain alleles.  An association."— Presentation transcript:

1 Miranda Durkie January 2010

2  Association is a statistical measure of the co- occurrence of certain phenotypic traits with certain alleles.  An association study is an examination of genetic variation across a given genome, designed to identify genetic associations with observable traits.

3 1. Direct causation: having allele A makes you susceptible to disease D. Possession of A may not be sufficient in itself to give you D but it makes it more likely you’ll develop D. 2. Natural selection: people who have disease D may be more likely to survive and reproduce if they have allele A. 3. Population stratification: the population contains several distinct genetic subsets and both disease D and allele A both happen to be more common in one particular subset. 4. Type 1 error: association studies test a large number of markers to find significant associations (p < 0.05). However by chance 5% of results will be significant at p = 0.05 and 1% at p = 0.01. Therefore data needs correction and in the past this was not done adequately so results could not be replicated. 5. Linkage disequilibrium: aim of association studies is to discover associations caused by linkage disequilibrium of allele A and disease D.

4  Linkage analysis is used to track the inheritance of alleles within a family.  Linked markers or alleles are only separated if a recombination event occurs.  The closer a marker is it to disease/susceptibility allele the less likely it is to be separated by recombination over several generations. This leads to a common haplotype which occurs more often than would be expected by chance.  Within an individual family this linkage will extend up to 20cM but for association studies only few kb  Linkage disequilibrium is the non-random association between two or more alleles located together on the same chromosome.

5  2 markers with alleles Aa and Bb  Frequency of allele A=p and a=1-p  Frequency of allele B=q and b=1-q  If there is no association then AB occurs at frequency pq  However if frequency of AB>pq then AB must be in postive LD.

6  Linkage is the relationship between alleles, whilst association is the relationship between alleles and phenotypes.  Association studies do not study families but instead look for differences in allele frequencies between different groups of individuals with defined phenotypes.  For both studies, the disease-causing mutation and/or susceptibility allele does not need to be known. Instead SNPs or other markers such as di-, tri- or tetra-nucleotide repeats which are in linkage disequilibrium with the disease/susceptibility allele are used.

7 1. Identify SNPs to analyse 2. Genotype all SNPs in subset of the samples 3. Identify tagSNPs 4. Genotype tagSNPs in all samples 5. Analyse data

8  Work out region of interest, or choose regions of known homology from a mouse or other animal model.  Work out size of area you wish to study is e.g. choose a 1Mb region around your locus of interest and choose one SNP every 500bp.  If possible include SNPs that have been validated in the same ethnic group as the one you are studying.  Prioritise SNPs with higher polymorphic frequencies (>10%)

9  If looking within genes prioritise possible functional variants e.g. non-synonymous SNPs within exons  Read current literature to find if out if any of the SNPs have been associated with similar phenotypes in other studies  Ensure that there are no SNPs under the primer or probe binding sites which could lead to non- amplification of one allele and skew your results  Due to advances in technology majority of current association studies now look at whole genome = genome-wide association studies (GWAS)

10  Ensure cases and controls are ethnically matched  Ensure methodology is robust, accurate and high- throughput e.g. SNParrays - which one? Exonic only? Platform? Cost? No of SNPs?  Genotype at least 96 controls and if you wish 96 cases  Record the genotypes conservatively i.e. if unsure mark as unknown  Analyse the data to  Check for deviation from Hardy-Weinberg equilibrium for all alleles - if a deviation is found it is likely that genotyping errors have been made so re-check  Calculate LD scores for SNPs in the region  Identify tagSNPs (also called haplotype tagging or htSNPs)

11  Over 10 million SNPs in human genome  Linked SNPs are often inherited together as a block and the genotypes of these SNPs can be used to generate a haplotype.  The key SNPs that uniquely define the haplotype are called tagSNPs or haplotype tagging SNPs  HapMap project started in 2002 and was international collaboration to describe common patterns of genetic variation between individuals  Identified around 500,000 key tagSNPs which can be used to generate inferred haplotypes of surrounding SNPs  This has made genome-wide scans more efficient and comprehensive.

12  Commercially available SNP arrays have been designed by several companies e.g. Affymetrix and Illumina to cover hundreds of thousands of SNPs across the whole genome.  They can have slightly different target SNPs e.g. Illumina Human-1 focuses on exonic SNPs thus concentrating on potential functional variants.  These arrays use tagSNPs to maximise the amount of data generated by as few SNPs as possible.  In recognition of the potential role of CNVs in complex disease susceptibility many arrays also study CNVs.

13  Must ensure sufficient cases and controls are tested to reach statistical significance  The lower the odds ratio for an increase in susceptibility, the more samples are required for the testing to reach statistical significance.  It is estimated that common susceptibility loci are likely to have odds ratios (OR) of 1.1 to 1.5.  Therefore, for example, in order to achieve 90% power to detect an allele with 0.2 frequency and an OR of 1.2, more than 6000 affected cases and more than double that number of normal controls are required.  If the frequency of the variant is only 0.05 you would need 20,000 cases.

14  Do single-point analysis first by looking at individuals SNPs and calculating  2 and odds ratios.  Need to apply a correction for multiple testing e.g. Bonferroni correction is conservative correction used for studying multiple alleles that are in LD with each other (non-independent tests)  Once you have tested each individual SNP for association you can then construct haplotypes and study them for association with the disease/trait  Use bioinformatics programs such as HelixTree, SNPHAP and Stata  Because of the problems with sample size for detecting low susceptibility traits, meta-analysis has been increasingly used. Meta-analysis of GWA datasets can increase the power to detect association signals by increasing sample size and by examining more variants throughout the genome than each dataset alone.

15  2007 Wellcome Trust published GWA study looking at 2,000 cases of seven common diseases and 3,000 shared controls.  Found 24 associations: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes.  Linked 10 genes to common disorders not previously known  Colorectal cancer GWA has found 10 associated SNPs, 5 of which are linked to TGF β superfamily signalling pathway

16  GWA studies have led to the discovery of at least 24 loci linked to type 2 diabetes  Mainly linked to insulin secretion pathway rather than insulin resistance  However it is estimated that these loci only account for 5% of the factors contributing to heritability of T2D  Studies of hundreds of thousands or even thousands of thousands of individual required to identify low susceptibility alleles  CNVs associations found linked to schizophrenia, alzheimers and parkinsons

17  Study of gene-gene and gene-environment interactions crucial which may be missed by single- point GWA  Majority of associated variants will not be functional therefore work will be required to identify causal variants  SNPs account for 78% variation in genome but only 26% of total nucleotide differences  Further study of CNVs will be crucial  Study of rare rather than common variants (1000G)  Study of regulatory variants  Next generation sequencing

Download ppt "Miranda Durkie January 2010.  Association is a statistical measure of the co- occurrence of certain phenotypic traits with certain alleles.  An association."

Similar presentations

Ads by Google