Medical genomics BI420 Department of Biology, Boston College 1. Phenotypic effects caused by known genetic variants - Diseases of Mendelian inheritance: sickle cell, cystic fibrosis - known relationships between drug metabolic phenotypes and drug metabolizing enzyme polymorphisms 2. Genetic mapping to find genetic variants that cause diseases – linkage analysis and association studies - the principle of genetic mapping, genetic markers - monogenic diseases - family based linkage analysis - complex diseases - case-control association studies: strategies, marker selection and association testing 3. Genome-wide association mapping resources: the HapMap - the reason for the HapMap - HapMap physical and informational resources - the association structure of the human genome 4. Structural and epigenetic variations in disease BI420 Department of Biology, Boston College
Lecture overview 1. Phenotypic effects caused by known genetic variants 2. Genetic mapping to find genetic variants that cause diseases – linkage analysis and association studies 3. Genome-wide association mapping resources – the HapMap 4. Somatic variations in disease
1. Phenotypic effects caused by known genetic variants
Many SNPs have phenotypic effects Some notable genetic diseases: cystic fibrosis (Mendelian recessive) sickle-cell anemia Badano and Katsanis, NRG 2002
Genetic variants may affect drug metabolism: Pharmacogenetics Evans and Relling, Science 1999
Are all genetic variants functional? ~ 40 million known SNPs SNPs, on the scale of the genome, can be described well with the “neutral theory” of sequence variations, i.e. the vast majority of SNPs are likely to have no functional effects How do we find the few functional variants in the background of millions of non-functional ones?
2. Genetic mapping to find genetic variants that cause diseases – linkage analysis and association studies
Recall from population genetics… sequence variations are the result of mutation events TAAAAAT TAACAAT TAAAAAT TAACAAT MRCA mutations are propagated down through generations and determine present-day variation patterns
Mendelian diseases have simple inheritance genotype inheritance Mendelian diseases have simple relationship between genotype + phenotype inheritance
Modulators: Recombination accgttatgtaga acggttatgtaga accgttatgtaga acggttatgtaga acggttatgtaga acggttatgtaga accgttatgtaga because of recombination, DNA sequences may not have a unique common ancestor, hence phylogenetic analysis may not apply
Genetic mapping
Linkage analysis compares the transmission of marker genotype and phenotype in families Sequence regions of the genome to determine which loci are linked with the trait. Works well for Mendelian diseases
However, some diseases have complex inheritance Multiple genes may influence the trait. E.g. retinitis pigmentosa requires heterozygosity for two genes. Badano and Katsanis, NRG 2002
General problem: Allele frequency and relative risk Brinkman et al. Nature Reviews Genetics 14 March 2006 For complex diseases, it is difficult to find the causal loci, since each may contribute a only a little to the disease. To simplify this problem, the “common disease-common variant” hypothesis is often invoked. This is the hypothesis that common genetic diseases are caused by adding up the effects of several loci, each with a common bad variant. However, the relative risk associated with the bad variant at any individual locus is thought to be low. If this is true, to understand complex disease, you only need to sequence sites with common variants.
Allelic association (linkage disequilibrium, LD) allelic association is the non-random assortment between alleles i.e. it measures how well knowledge of the allele state at one site permits prediction at another marker site functional site significant allelic association between a marker and a functional site permits localization (mapping) even without having the functional site in our collection allelic association, and the use of genetic markers is the basis for mapping functional alleles
Case-control association testing clinical cases clinical controls genotyping cases and controls at various polymorphisms searching for markers with “significant” marker allele frequency differences between cases and controls; these marker signify regions of possible causative alleles AF(cases) AF(controls)
Association study strategies region(s) interrogated: single gene, list of candidate genes (“candidate gene study”), or entire genome (“genome scan”) direct or indirect: causative variant marker that is co-inherited with causative variant single-SNP marker or multi-SNP haplotype marker single-stage or multi-stage
Association study strategies for economy, one cannot genotype every SNP in thousands of clinical samples: marker selection is the process where a subset of all available SNPs is chosen 1. hypothesis driven (i.e. based on gene function) 2. LD-driven – based entirely on the reduction of redundancy presented by the linkage disequilibrium (LD) between SNPs; tags represent other SNPs they are correlated with causative variant
Marker selection depends on genome LD Daly et al. NG 2001
3. Genome-wide association mapping resources – the HapMap
The HapMap resource goal: to map out human allele and association structure of at the kilobase scale We wish to determine which SNPs provide the most information about human genetic diversity
Linkage Disequilibrium (LD) structure in four human populations International HapMap Consortium, Nature 2005
Genome-wide scans for human diseases SNPs in Complement Factor H (CFH) gene are associated with Age-related Macular Degeneration (AMD) Klein et al, Science 2005
How successful have association studies been? 201 SNPs associated with height could explain about 16% of genetic variance, 142 SNPs associated with Crohn's disease could explain about 20%, and 67 SNPs could explain about 17% of genetic variance in each of three common cancers. Baker 2010. Nature. Much information about inheritance is still unaccounted for!
Where is the “missing heritability” An alternative to the “common disease-common variant” idea is that common diseases are actually caused by variants that are rare, but that there is more than one way to cause the disease. If this is true, it is important to sequence every base in the genome to understand complex disease.
Most newly discovered SNPs are rare Ryan Poplin 12M 10M 8M 4M 2M 6M number of sites frequency of alternate allele 0.001 0.01 0.1 1.0
Functional prediction vs. allele frequency Jin Yu, Fuli Yu, Baylor College of Medicine Marth et al. Genome Biology 2011
Disease causing alleles vs. allele frequency
Assessing the impact of rare variants Manolio et al. Nature 2009 Traditional GWAS unlikely to work for rare variants New methods are under development
VAAST Instead of individual variants, use a larger unit for comparison e.g. a gene Weight predicted impact of variant (e.g. non-synonymous change, large allele frequency difference etc.)
4. Somatic variants in disease
Somatic mutations © Brian Stavely, Memorial University of Newfoundland the detection of somatic mutations, and their distinction from inherited polymorphism, is important to separate pre-disposing variants from mutations that occur during disease progression e.g. in cancer
Detecting somatic mutations with comparative data based on comparison of cancer and normal tissue from the same individual often cancer tissue is highly heterogeneous and the somatic mutant allele may be present at low allele frequency