Presentation is loading. Please wait.

Presentation is loading. Please wait.

Julia Krushkal 4/9/2017 The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The.

Similar presentations

Presentation on theme: "Julia Krushkal 4/9/2017 The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The."— Presentation transcript:

1 Julia Krushkal 4/9/2017 The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The University of Tennessee Health Science Center jkrushka{at}

2 270 Individuals from 4 Geographically Diverse Populations
HapMap Population Samples Julia Krushkal 4/9/2017 Project launched in 2002 to provide a public resource for accelerating medical genetic research 270 Individuals from 4 Geographically Diverse Populations YRI: 90 Yorubans from Ibadan, Nigeria 30 parent-offspring trios CEU: 90 northern and western European-descent living in Utah, USA from the Centre d’Etude du Polymorphisme Humain (CEPH) collection CHB: 45 unrelated Han Chinese from Beijing, China JPT: 45 unrelated Japanese from Tokyo, Japan HapMap NHGRI

3 The International HapMap Project
Julia Krushkal 4/9/2017 “…Determine the common patterns of DNA sequence variation in the human genome, by characterizing sequence variants, their frequencies, and correlations between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe.” Nature (2003) Population-specific sequence variation Allele frequencies Linkage disequilibrium patterns Haplotype information Tag SNPs Structural genome variation Better understanding of human population dynamics and of the history of human populations Cell lines available from Coriell Inst. for Medical Research A rich resource for biomedical genetic analysis

4 International HapMap Project Papers
The Int. HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, The Int. HapMap Consortium. A Haplotype Map of the Human Genome. Nature 437, The Int. HapMap Consortium. The International HapMap Project. Nature 426, The Int. HapMap Consortium. Integrating Ethics and Science in the International HapMap Project. Nature Reviews Genet 5, Thorisson et al. The International HapMap Project Web site. Genome Res 15: HapMap-related papers Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, Clark et al. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res, 15: Clayton et al. Population structure, differential bias and genomic control in a large-scale, case- control association study. Nature Genet 37(11): de Bakker et al. Efficiency and power in genetic association studies. Nature Genet, 37(11): Goldstein, Cavalleri. Genomics: Understanding human diversity. Nature 437: Hinds et al. Whole genome patterns of common DNA variation in three human populations. Science 307: Myers et al. A fine-scale map of recombination rates and hotspots across the human genome. Science, 310: Nielsen R et al.Genomic scans for selective sweeps using SNP data.Genome Res 15: Smith et al. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res 15: Weir et al. Measures of human population structure show heterogeneity among genomic regions. Genome Res 15: Julia Krushkal 4/9/2017

5 Julia Krushkal 4/9/2017 Nature (2003)

6 Human Chromosomes Contain DNA 22 pairs of autosomes +
Julia Krushkal 4/9/2017 Human Chromosomes Contain DNA 22 pairs of autosomes + sex-chromosomes (X and Y) + mitochondrial genome Contain functional units (genes) and other DNA Human genome sequence is available as a reference, as a result of the Human Genome Project A significant amount of inter-individual variation exists

7 Some Basic Definitions
Locus - A site in the genome The DNA in the human genome is not a static entity. There are differences between different copies: Allele – a genetic variant, i.e., a form (state) of a locus Mutation - a genetic change An individual carries two copies of each locus on autosomes Individual alleles are inherited from parents to offspring (1 from each parent) Genotype - A set of alleles an individual is carrying at a given locus

8 Chromosomes are sets of continuously linked genetic loci
Example: Integrated map of chromosome 5 from the International HapMap Project,

Some DNA loci vary among individuals Linked genetic loci are inherited non-independently Loci may change with time (mutation, selection, genetic drift) Some DNA changes lead to quantitative changes in RNA expression and to quantitative or qualitative changes in protein production Some genetic changes, even small, may lead to disease A large amount of natural variation occurs in healthy individuals, i.e., many changes are neutral Loci genetically linked to the disease-causing locus can be used as genetic markers to search for the disease locus SNP1 SNP2 There are many types of DNA variation, e.g. Sequence variation AAAC/TGGCTA Microsatellite repeats …AATG AATG AATG AATG…

10 Polymorphic Site A locus with common DNA variation
 2 alleles in a population Shows difference in DNA sequence among individuals In most definitions: the most common allele with frequency < 99%, or minor allele frequency (MAF)  1%, or MAF  2%, or at least two alleles have frequencies  1%. A rare allele that occurs in <1% of the population is usually non considered a polymorphic site.

11 SNP=Single Nucleotide Polymorphism
A SNP locus on the distal end of the long arm of human chromosome 5 (data from Ensembl) SNP locus rs CAAATTCCATG[A or C]AGAAGGAAATACAT A and C are alleles at SNP locus rs

12 A SNP locus on the distal end of the long arm of chromosome 5
SNP locus rs

13 Regulatory Interactions: The ENCODE Project
Julia Krushkal Regulatory Interactions: The ENCODE Project <> 4/9/2017 2003-Pilot project launched (1% of the genome) 2007- Pilot project completed; production phase launched on the entire genome High-through-put experimental and computational approaches to studies of DNA regulatory sites, regulatory interactions, and DNA modification Production Scale Effort Pilot Scale Effort Data Coordination Center Technology Development Effort

14 HapMap SNP Density Coverage
Genome SNP Variation Julia Krushkal 4/9/2017 Size of human genome is  3.2  109 bp 99.9% identical 9-10 mln SNPs may have MAF 5%  30,000 genes HapMap SNP Density Coverage Phase I (published in 2005) 1,007,329 SNPs that passed quality control 1 SNP / 3000 bp 11,500 nsSNP 10 ENCODE regions, 500 kb each 17,944 SNPs 1 SNP / 279 bp Phase II (published in 2007) >3,806,000 SNPs 1 SNP / 875bp 25-30% of all SNPs with MAF  5% The cumulative number of non-redundant SNPs (each mapped to a single location in the genome) is shown as a solid line, as well as the number of SNPs validated by genotyping (dotted line) and double-hit status (dashed line). Years are divided into quarters (Q1–Q4).

15 Julia Krushkal 4/9/2017

16 Julia Krushkal 4/9/2017

17 Julia Krushkal 4/9/2017 SNP Differences among Individuals Far Exceed Differences among Populations Phase 1: Autosomes: Across the 1 million SNPs genotyped, only 11 have fixed differences between CEU and YRI, 21 between CEU and CHB/JPT, and 5 between YRI and CHB/JPT. X chromosome 123 SNPs were completely differentiated between YRI and CHB/JPT, but only 2 between CEU and YRI and 1 between CEU and CHB/JPT.

18 Julia Krushkal 4/9/2017 Haplotypes A haplotype is a set of alleles at multiple loci located on the same copy of the chromosome Genotype calls obtained from sequencing or DNA chip genotyping do not provide the information about which of the two chromosomal copies a particular allele belongs to. E.g., genotypes for individual X: Haplotypes SNP# Genotypes SNP A A1 A2 A T SNP B B1 B2 T C SNP C C1 C2 G C A C C A1 B2 C2 Haplotype 1 Haplotype 2 A2 B1 C1 T T G

19 Recombination A1 B1 A2 B2 A1 B1 A2 B2 Recombination (crossing-over) x
“Random” event Occurs during meiosis The larger the distance between loci or as more generations pass, the more likely recombination(s) will occur A B1 A B2 A B1 A B2 Recombination (crossing-over) x A B1 A B2 A B2 A B1 Nonrecombinant Recombinant Haplotypes Haplotypes

20 Julia Krushkal 4/9/2017 Two ancestral chromosomes being scrambled through recombination over many generations to yield different descendant chromosomes. If an A allele on the ancestral chromosome increases the risk of a disease, the two individuals in the current generation who inherit that part of the ancestral chromosome will be at increased risk. Source: the International HapMap Project

21 Linkage Disequilibrium
Associations among alleles at different loci A B1 D = Linkage disequilibrium coefficient Coefficient of association A B2 D=pA1B1-pA1pB1 Locus A Locus B Normalized disequilibrium coefficient Correlation coefficient D’=D/|D|max |D| max = | min(pA1pB2, pA2pB1)| -1  D’  1 =D/ pA1pA2pB1pB2 In case of no association, D=0 (linkage equilibrium) Practical implications in fine gene mapping: Search for locus B using association of marker loci with disease

22 The value of D decreases geometrically with
each generation A B a  b D(t)=(1-  ) D(t-1) D(t)=(1-  ) tD(0) Unless the two loci are closely linked, the value of D should rapidly decrease to 0. The occurrence of association between two loci implies that they are closely linked.

23 Haplotype Maps Generated by The International HapMap Project
Julia Krushkal 4/9/2017 Haplotype Maps Generated by The International HapMap Project 3 steps of the HapMap construction (a) SNPs are identified in DNA samples from multiple individuals. (b) Adjacent SNPs that are inherited together are compiled into haplotypes. (c)"Tag" SNPs are identified within haplotypes that uniquely describe those haplotypes. Source: The International HapMap Project

24 Haplotype Maps of the Human Genome
Julia Krushkal 4/9/2017 Haplotype Maps of the Human Genome Helmuth 2001, Science 293: Find correlations among groups of SNPs Haplotypes were inferred for the HapMap project from trios data and from unrelated individuals using Phase (Stephens 01; Stephens and Donnely 03)

25 Haplotype Maps of the Human Genome
Julia Krushkal Haplotype Maps of the Human Genome 4/9/2017 Genome regions decomposed into discrete haplotype blocks, which capture similarity in haplotype organization Patil et al. 2001, Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21. Science 294(5547):

26 Julia Krushkal 4/9/2017 Haplotype Block Partition Results for Three Populations 1,586,383 (SNPs) genotyped in 71 Americans of European, African, and Asian ancestry Population   Blocks   Average size, kb*   Required SNPs     African-American   235,663   8.8   570,886   European-American   109,913   20.7   275,960   Han Chinese   89,994   25.2   220,809   * Average distance spanned by segregating sites in each block.   Minimum number of SNPs required to distinguish common haplotype patterns with frequencies of 5% or higher. Hinds et al Science

27 Population differences in local bin structure
Hinds et al 2005 Extended LD bin and haplotype block structure around the CFTR gene. LD bins, where each bin has at least one SNP with r2 > 0.8 with every other SNP, are depicted as light horizontal bars, with the positions of constituent SNPs indicated by vertical tick marks as well as the extreme ends of the bars. Isolated SNPs are indicated by plain tick marks. Haplotype blocks, within which at least 80% of observed haplotypes could be grouped into common patterns with frequencies of at least 5%, are depicted as dark horizontal bars. Unlike haplotype blocks that are by design sequential and nonoverlapping, SNPs in one LD bin can be interdigitated with SNPs in multiple other overlapping bins Population differences in local bin structure Differences in allele and haplotype frequencies “Although analysis panels are characterized both by different haplotype frequencies and, to some extent, different combinations of alleles, both common and rare haplotypes are often shared across populations” (The Int. HapMap Project, Nature, 2005)

28 Tag SNP (htSNP) selection
Julia Krushkal 4/9/2017 Tag SNP (htSNP) selection Pairwise LD-based and haploblock-based tagging methods Partition haplotypes into blocks Can use haplotype-based (haploblocks) or genotype-based (LD-blocks) partitioning Select representative htSNPs from each block Latest DNA microarrays aim to capture SNPs with r2  0.8 “Tags are the subset of variants genotyped in a disease study. SNPs that are not typed in the study but whose effect can be studied through LD with a tag are termed proxies. A tag with perfect correlation (r2 = 1) to an untyped putative causal allele is termed a perfect proxy.” De Bakker et al., 2005

29 Julia Krushkal 4/9/2017

30 Tag SNP, Haplotypes, and LD
Julia Krushkal 4/9/2017 Tag SNP, Haplotypes, and LD The Int. HapMap Consortium, Nature, 2005

31 Use of Haplotypes in Association Analysis
Julia Krushkal 4/9/2017 Use of Haplotypes in Association Analysis Testing one marker at a time for associations is very time-consuming Problem of multiple testing Testing individual SNPs, we are not utilizing information from other markers Benefits of Using Haplotypes Haplotypes allow us to use information from multiple loci simultaneously LD information between loci is captured

32 Benefits of Haplotype Analysis
Julia Krushkal 4/9/2017 Benefits of Haplotype Analysis Construct a single highly informative mega-locus from a number of less informative but closely linked loci Identify genotyping or data entry errors. Likelihood ratio tests indicate which typings are more likely to be an error Find boundaries of conserved haplotypes associated with a trait. Employs recombinations from the entire history a population

33 Amount of Captured Sequence Variation in HapMap Phase II
Julia Krushkal 4/9/2017 Amount of Captured Sequence Variation in HapMap Phase II For common variants (MAF  0.05) the mean maximum r2 of any SNP to a typed one is 0.90 in YRI, 0.96 in CEU and 0.95 in CHB /JPT. 1.09 million SNPs capture all common Phase II SNPs with r2  0.8 in YRI. Very common SNPs with MAF  0.25 are captured extremely well (mean maximum r2 of 0.93 in YRI to 0.97 in CEU) Rarer SNPs with MAF,0.05 are less well covered (mean maximum r2 of 0.74 in CHB/JPT to 0.76 in YRI).

34 Julia Krushkal 4/9/2017

35 Recombination Hot Spots
Julia Krushkal 4/9/2017 Recombination Hot Spots

36 Structural Genome Variation
Julia Krushkal 4/9/2017 HapMap samples are also used as a resource for CNV analysis Large number of copy number variants (CNVs) and other genome rearrangements found among individuals Some variation is assumed normal, other may cause disease Genome databases, e.g. Database of Genomics Variants at the TCAG of the Toronto Hospital of Sick Children, the Copy Number Variation Project Map at the Sanger Center

37 Julia Krushkal 4/9/2017 Segmental duplications are recombination hotspots, causing global genome rearrangements

38 Julia Krushkal 4/9/2017

39 HapMap Genome Browser Julia Krushkal 4/9/2017

40 Julia Krushkal 4/9/2017

41 Julia Krushkal 4/9/2017

42 Julia Krushkal 4/9/2017

43 Julia Krushkal 4/9/2017

44 Perlegen Genotype Browser
Julia Krushkal 4/9/2017

45 Julia Krushkal 4/9/2017

46 UCSC Genome Browser Julia Krushkal 4/9/2017

47 DNA Chips and Resequencing:
Julia Krushkal 4/9/2017 DNA Chips and Resequencing: High-through-put Analysis of Sequence Variation An easy way to access genome-wide variation Both Affymetrix and Illumina DNA chips contain representative SNP and CNV probes Affymetrix GeneChip 6.0: 1.8 million markers for genetic variation, including 906,000 SNPs and 946,000 copy number probes. Illumina 1M Bead Chip and 1M-duo Bead Chip: ~950,000 genome-spanning tag SNPs; ~100,000 additional non-HapMap SNPs, >565,000 SNPs in and near coding regions such as nsSNPs, promoter regions, 3’ and 5’ UTRs; dense coverage in ADME and MHC regions. ~260,000 markers located in novel and reported copy number polymorphic regions. Sequenom mass arrays (based on Maldi-TOF)

48 Genome-Wide Association
Julia Krushkal 4/9/2017 Genome-Wide Association Select representative htSNPs from low diversity haplotype blocks Adjustment for multiple comparisons LD values highly variable: smoothing function needed Haplotypes in a sliding window OR screen for top SNPs likely functional SNPs SNPs in genes involved in pathways of interest

49 Use of Phase-Resolved Data in Association Analysis
Julia Krushkal 4/9/2017 Use of Phase-Resolved Data in Association Analysis Find association with haplotypes similar to analyses of individual SNP alleles; Need to consider multiple testing Test for tendency of cases to ‘cluster’ around groups of ‘similar’ haplotypes Extend log-linear approach to take haplotype structure into account Modifications also used for ambiguous phase

50 Julia Krushkal 4/9/2017 As of 04/14/2008, GWAS of 150 traits posted

51 Julia Krushkal 4/9/2017 Microarray analysis

52 Julia Krushkal 4/9/2017 Special Thanks to Ken Manly, whose presentation ideas for the HapMap module 2006 inspired and helped organized this presentation

Download ppt "Julia Krushkal 4/9/2017 The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The."

Similar presentations

Ads by Google