Presentation is loading. Please wait.

Presentation is loading. Please wait.

The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The University of Tennessee Health.

Similar presentations

Presentation on theme: "The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The University of Tennessee Health."— Presentation transcript:

1 The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The University of Tennessee Health Science Center jkrushka{at}

2 HapMap Population Samples Project launched in 2002 to provide a public resource for accelerating medical genetic research 270 Individuals from 4 Geographically Diverse Populations YRI: 90 Yorubans from Ibadan, Nigeria 30 parent-offspring trios CEU: 90 northern and western European-descent living in Utah, USA from the Centre d’Etude du Polymorphisme Humain (CEPH) collection 30 parent-offspring trios CHB: 45 unrelated Han Chinese from Beijing, China JPT: 45 unrelated Japanese from Tokyo, Japan HapMap NHGRI

3 The International HapMap Project Population-specific sequence variation Allele frequencies Linkage disequilibrium patterns Haplotype information Tag SNPs Structural genome variation Better understanding of human population dynamics and of the history of human populations Cell lines available from Coriell Inst. for Medical Research A rich resource for biomedical genetic analysis “…Determine the common patterns of DNA sequence variation in the human genome, by characterizing sequence variants, their frequencies, and correlations between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe.” Nature (2003)

4 International HapMap Project Papers The Int. HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, The Int. HapMap Consortium. A Haplotype Map of the Human Genome. Nature 437, The Int. HapMap Consortium. The International HapMap Project. Nature 426, The Int. HapMap Consortium. Integrating Ethics and Science in the International HapMap Project. Nature Reviews Genet 5, Thorisson et al. The International HapMap Project Web site. Genome Res 15: HapMap-related papers Sabeti et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, Clark et al. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res, 15: Clayton et al. Population structure, differential bias and genomic control in a large-scale, case- control association study. Nature Genet 37(11): de Bakker et al. Efficiency and power in genetic association studies. Nature Genet, 37(11): Goldstein, Cavalleri. Genomics: Understanding human diversity. Nature 437: Hinds et al. Whole genome patterns of common DNA variation in three human populations. Science 307: Myers et al. A fine-scale map of recombination rates and hotspots across the human genome. Science, 310: Nielsen R et al.Genomic scans for selective sweeps using SNP data.Genome Res 15: Smith et al. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res 15: Weir et al. Measures of human population structure show heterogeneity among genomic regions. Genome Res 15:

5 Nature (2003)

6 Human Chromosomes Contain DNA 22 pairs of autosomes + sex-chromosomes (X and Y) + mitochondrial genome Contain functional units (genes) and other DNA Human genome sequence is available as a reference, as a result of the Human Genome Project A significant amount of inter-individual variation exists

7 Some Basic Definitions Locus - A site in the genome The DNA in the human genome is not a static entity. There are differences between different copies: Allele – a genetic variant, i.e., a form (state) of a locus Mutation - a genetic change An individual carries two copies of each locus on autosomes Individual alleles are inherited from parents to offspring (1 from each parent) Genotype - A set of alleles an individual is carrying at a given locus

8 Chromosomes are sets of continuously linked genetic loci Example: Integrated map of chromosome 5 from the International HapMap Project,

9 Genetic Variation Some DNA loci vary among individuals Linked genetic loci are inherited non-independently Loci may change with time (mutation, selection, genetic drift) Some DNA changes lead to quantitative changes in RNA expression and to quantitative or qualitative changes in protein production Some genetic changes, even small, may lead to disease A large amount of natural variation occurs in healthy individuals, i.e., many changes are neutral Loci genetically linked to the disease-causing locus can be used as genetic markers to search for the disease locus SNP1SNP2 Sequence variation AAAC/TGGCTA There are many types of DNA variation, e.g. Microsatellite repeats …AATG AATG AATG AATG…

10 Polymorphic Site A locus with common DNA variation  2 alleles in a population Shows difference in DNA sequence among individuals In most definitions: the most common allele with frequency < 99%, or minor allele frequency (MAF)  1%, or MAF  2%, or at least two alleles have frequencies  1%. A rare allele that occurs in <1% of the population is usually non considered a polymorphic site.

11 SNP=Single Nucleotide Polymorphism A and C are alleles at SNP locus rs SNP locus rs CAAATTCCATG[A or C]AGAAGGAAATACAT A SNP locus on the distal end of the long arm of human chromosome 5 (data from Ensembl)

12 A SNP locus on the distal end of the long arm of chromosome 5 SNP locus rs

13 Regulatory Interactions: The ENCODE Project 2003-Pilot project launched (1% of the genome) Pilot project completed; production phase launched on the entire genome <> Production Scale Effort Pilot Scale Effort Data Coordination Center Technology Development Effort High-through-put experimental and computational approaches to studies of DNA regulatory sites, regulatory interactions, and DNA modification

14 Genome SNP Variation Size of human genome is  3.2  10 9 bp 99.9% identical 9-10 mln SNPs may have MAF  5%  30,000 genes Phase I (published in 2005) 1,007,329 SNPs that passed quality control 1 SNP / 3000 bp 11,500 nsSNP 10 ENCODE regions, 500 kb each 17,944 SNPs 1 SNP / 279 bp Phase II (published in 2007) >3,806,000 SNPs 1 SNP / 875bp 25-30% of all SNPs with MAF  5% HapMap SNP Density Coverage The cumulative number of non- redundant SNPs (each mapped to a single location in the genome) is shown as a solid line, as well as the number of SNPs validated by genotyping (dotted line) and double- hit status (dashed line). Years are divided into quarters (Q1–Q4).



17 SNP Differences among Individuals Far Exceed Differences among Populations Phase 1: Autosomes: Across the 1 million SNPs genotyped, only 11 have fixed differences between CEU and YRI, 21 between CEU and CHB/JPT, and 5 between YRI and CHB/JPT. X chromosome 123 SNPs were completely differentiated between YRI and CHB/JPT, but only 2 between CEU and YRI and 1 between CEU and CHB/JPT.

18 Haplotypes A haplotype is a set of alleles at multiple loci located on the same copy of the chromosome A1 B2 C2 A2 B1 C1 Genotype calls obtained from sequencing or DNA chip genotyping do not provide the information about which of the two chromosomal copies a particular allele belongs to. E.g., genotypes for individual X: Haplotypes SNP# Genotypes SNP A A1 A2 A T SNP B B1 B2 T C SNP C C1 C2 G C A C C T T G Haplotype 1 Haplotype 2

19 A1 B1 A2 B2 x A1 B1 A2 B2 Recombination (crossing-over) Nonrecombinant Recombinant Haplotypes Recombination “Random” event Occurs during meiosis The larger the distance between loci or as more generations pass, the more likely recombination(s) will occur A1 B1 A2 B2 A1 B2 A2 B1

20 Two ancestral chromosomes being scrambled through recombination over many generations to yield different descendant chromosomes. If an A allele on the ancestral chromosome increases the risk of a disease, the two individuals in the current generation who inherit that part of the ancestral chromosome will be at increased risk. Source: the International HapMap Project

21 Linkage Disequilibrium In case of no association, D=p A1B1 -p A1 p B1 D = Linkage disequilibrium coefficient Coefficient of association Practical implications in fine gene mapping: Search for locus B using association of marker loci with disease D=0 (linkage equilibrium) Locus A Locus B A1 B1 A2 B2 Associations among alleles at different loci D’=D/|D| max |D| max = | min(p A1 p B2, p A2 p B1 )| -1  D’  1  =D/  p A1 p A2 p B1 p B2 Normalized disequilibrium coefficient Correlation coefficient

22 The value of D decreases geometrically with each generation A B a  b D (t) =(1-  ) D (t-1) D (t) =(1-  ) t D (0) Unless the two loci are closely linked, the value of D should rapidly decrease to 0. The occurrence of association between two loci implies that they are closely linked.

23 Haplotype Maps Generated by The International HapMap Project 3 steps of the HapMap construction (a) SNPs are identified in DNA samples from multiple individuals. (b) Adjacent SNPs that are inherited together are compiled into haplotypes. (c)"Tag" SNPs are identified within haplotypes that uniquely describe those haplotypes. Source: The International HapMap Project

24 Haplotype Maps of the Human Genome Haplotypes were inferred for the HapMap project from trios data and from unrelated individuals using Phase (Stephens 01; Stephens and Donnely 03) Helmuth 2001, Science 293: Find correlations among groups of SNPs

25 Patil et al. 2001, Blocks of Limited Haplotype Diversity Revealed by High- Resolution Scanning of Human Chromosome 21. Science 294(5547): Genome regions decomposed into discrete haplotype blocks, which capture similarity in haplotype organization Haplotype Maps of the Human Genome

26 Haplotype Block Partition Results for Three Populations Population Blocks Average size, kb * Required SNPs African-American 235, ,886 European-American 109, ,960 Han Chinese 89, ,809 * Average distance spanned by segregating sites in each block. Minimum number of SNPs required to distinguish common haplotype patterns with frequencies of 5% or higher. Hinds et al Science 1,586,383 (SNPs) genotyped in 71 Americans of European, African, and Asian ancestry

27 Hinds et al 2005 Extended LD bin and haplotype block structure around the CFTR gene. LD bins, where each bin has at least one SNP with r 2 > 0.8 with every other SNP, are depicted as light horizontal bars, with the positions of constituent SNPs indicated by vertical tick marks as well as the extreme ends of the bars. Isolated SNPs are indicated by plain tick marks. Haplotype blocks, within which at least 80% of observed haplotypes could be grouped into common patterns with frequencies of at least 5%, are depicted as dark horizontal bars. Unlike haplotype blocks that are by design sequential and nonoverlapping, SNPs in one LD bin can be interdigitated with SNPs in multiple other overlapping bins Population differences in local bin structure Differences in allele and haplotype frequencies “Although analysis panels are characterized both by different haplotype frequencies and, to some extent, different combinations of alleles, both common and rare haplotypes are often shared across populations” ( The Int. HapMap Project, Nature, 2005)

28 Tag SNP (htSNP) selection Pairwise LD-based and haploblock-based tagging methods Partition haplotypes into blocks Can use haplotype-based (haploblocks) or genotype-based (LD-blocks) partitioning Select representative htSNPs from each block Latest DNA microarrays aim to capture SNPs with r 2  0.8 “Tags are the subset of variants genotyped in a disease study. SNPs that are not typed in the study but whose effect can be studied through LD with a tag are termed proxies. A tag with perfect correlation ( r 2 = 1) to an untyped putative causal allele is termed a perfect proxy.” De Bakker et al., 2005


30 The Int. HapMap Consortium, Nature, 2005 Tag SNP, Haplotypes, and LD

31 Use of Haplotypes in Association Analysis Testing one marker at a time for associations is very time-consuming Problem of multiple testing Testing individual SNPs, we are not utilizing information from other markers Benefits of Using Haplotypes Haplotypes allow us to use information from multiple loci simultaneously LD information between loci is captured

32 Benefits of Haplotype Analysis Construct a single highly informative mega-locus from a number of less informative but closely linked loci Identify genotyping or data entry errors. Likelihood ratio tests indicate which typings are more likely to be an error Find boundaries of conserved haplotypes associated with a trait. Employs recombinations from the entire history a population

33 Amount of Captured Sequence Variation in HapMap Phase II For common variants (MAF  0.05) the mean maximum r 2 of any SNP to a typed one is 0.90 in YRI, 0.96 in CEU and 0.95 in CHB /JPT million SNPs capture all common Phase II SNPs with r 2  0.8 in YRI. Very common SNPs with MAF  0.25 are captured extremely well (mean maximum r 2 of 0.93 in YRI to 0.97 in CEU) Rarer SNPs with MAF,0.05 are less well covered (mean maximum r 2 of 0.74 in CHB/JPT to 0.76 in YRI).


35 Recombination Hot Spots

36 Structural Genome Variation Large number of copy number variants (CNVs) and other genome rearrangements found among individuals Some variation is assumed normal, other may cause disease Genome databases, e.g. Database of Genomics Variants at the TCAG of the Toronto Hospital of Sick Children, the Copy Number Variation Project Map at the Sanger Center HapMap samples are also used as a resource for CNV analysis

37 Segmental duplications are recombination hotspots, causing global genome rearrangements


39 HapMap Genome Browser





44 Perlegen Genotype Browser


46 UCSC Genome Browser

47 DNA Chips and Resequencing: High-through-put Analysis of Sequence Variation An easy way to access genome-wide variation Both Affymetrix and Illumina DNA chips contain representative SNP and CNV probes Affymetrix GeneChip 6.0: 1.8 million markers for genetic variation, including 906,000 SNPs and 946,000 copy number probes. Illumina 1M Bead Chip and 1M-duo Bead Chip: ~950,000 genome-spanning tag SNPs; ~100,000 additional non-HapMap SNPs, >565,000 SNPs in and near coding regions such as nsSNPs, promoter regions, 3’ and 5’ UTRs; dense coverage in ADME and MHC regions. ~260,000 markers located in novel and reported copy number polymorphic regions. Sequenom mass arrays (based on Maldi-TOF)

48 Genome-Wide Association Select representative htSNPs from low diversity haplotype blocks Adjustment for multiple comparisons LD values highly variable: smoothing function needed Haplotypes in a sliding window OR screen for top SNPs likely functional SNPs SNPs in genes involved in pathways of interest

49 Use of Phase-Resolved Data in Association Analysis Find association with haplotypes similar to analyses of individual SNP alleles; Need to consider multiple testing Test for tendency of cases to ‘cluster’ around groups of ‘similar’ haplotypes Extend log-linear approach to take haplotype structure into account Modifications also used for ambiguous phase

50 As of 04/14/2008, GWAS of 150 traits posted


52 Special Thanks to Ken Manly, whose presentation ideas for the HapMap module 2006 inspired and helped organized this presentation

Download ppt "The International HapMap Project: a Rich Resource of Genetic Information Julia Krushkal Department of Preventive Medicine The University of Tennessee Health."

Similar presentations

Ads by Google