Using genetics to study human history and natural selection David Reich Harvard Medical School Depatment of Genetics Broad Institute.

Slides:



Advertisements
Similar presentations
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Advertisements

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Gene Expression Levels Are a Target of Recent Natural Selection in the Human Genome Mol. Biol. Evol. 26(3):649– Journal Club
Centro Nacional Genotipado Análisis bioinformático de secuencias y expresión de genes y genomas. Human SNPs. Teoria y prácticas. Arcadi Navarro Madrid,
High-density admixture mapping to find genes for complex disease David Reich Harvard Medical School Department of Genetics Broad Institute July 13, 2004.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Pattern of similarity between Europeans and Neanderthals Green et al. Science 328, 710 (2010)
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Natural Selection in Humans
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Hidenki Innan and Yuseob Kim Pattern of Polymorphism After Strong Artificial Selection in a Domestication Event Hidenki Innan and Yuseob Kim A Summary.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Sequencing TRAF1 in patients with rheumatoid arthritis Bruce C. Jobse Medical and Population Genetics Broad Institute.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
A Primer on Genetic Variation Variety Lawrence Brody - NHGRI.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Genome Biology and Biotechnology 4. The variable human genome Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute for.
Host genetic diversity Genome-wide approaches. Affected sib analysis Take full sibs, preferably of the same sex should share many environmental variables.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
An quick overview of human genetic linkage analysis
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
MULTIPLE POPULATIONS OF ARTEMISININ-RESISTANT PLASMODIUM FALCIPARUM IN CAMBODIA MIOTTO ET. AL Presented by Josie Benson.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Signals of natural selection in the HapMap project data The International HapMap Consortium Gil McVean Department of Statistics, Oxford University.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Common variation, GWAS & PLINK
Genetic Linkage.
Of Sea Urchins, Birds and Men
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Signatures of Selection
Genetic Linkage.
Detection of the footprint of natural selection in the genome
Patterns of Linkage Disequilibrium in the Human Genome
Power to detect QTL Association
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Detection of the footprint of natural selection in the genome
Genetic Drift, followed by selection can cause linkage disequilibrium
Emily C. Walsh, Kristie A. Mather, Stephen F
CAG Expansion in the Huntington Disease Gene Is Associated with a Specific and Targetable Predisposing Haplogroup  Simon C. Warby, Alexandre Montpetit,
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations  Wen-Ya Ko, Prianka Rajan, Felicia Gomez, Laura.
Presentation transcript:

Using genetics to study human history and natural selection David Reich Harvard Medical School Depatment of Genetics Broad Institute

tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatgg cagagaacacaaaatatggccagtggcctaaatccagcctactaccttttttttttttttgtaacattttacta acatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcagagttaagaaattgtag cagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaa cttatcattatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatc ctccaactttgtttttaatcaaaagtgttttggccatcctagatatactttgtattgccacataaatttgaaga tcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaatctatagattaattag aggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccacc ccccactcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctca agtccatataaaatgccatactatttgcatataacctctgcaatcctcccctatagtttagatcatctctagat tacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgttgagggttttttgttttgtt ttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtgg tgagatcatagcttactgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctg ggatacaggtgtgacccactgtgcccagttattattttttatttgtattattttactgttgtattatttttaat tattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggagggctaac tgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtt tacttttcagtatacaagactttgacgttttttgttaaatgtatttgtaagtattttattatttgtgatgttat ttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagcactttgggaggctgaggcgggca gatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaatagaaaaaa attagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacc tgggaggcggagcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtc aaaaaaataaataaaatttaaaaaaagaagaagaaattattttcttaatttcattttcaggttttttatttatt tctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagctaacatagtttatta tttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttcttt cttttcaacctagactggatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactaga gaatgtattgaagaatatattgttgaacaaaagcagtgagagtggacatccctgctttccccctgattttaggg ggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttg aggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca tctc gaga gaga gaga gaga gaga gcgc gcgc gcgc tctc gaga gaga gaga gaga gaga tctc tctc tctc tctc gaga gaga gaga tctc gcgc tctc tctc tctc

A 2-part talk: Section 1: How human history affects human genetic variation Section 2: Detecting selection by the pattern of genetic variation and finding disease genes

How does human history affect genetic variation? A genome-wide survey of Linkage Disequilibrium Section 1 Linkage disequilibrium is a phenomenon whereby genetic variants are associated: people who have one tend to have a second as well

Linkage Disequilibrium Explained Variations in Chromosomes Within a Population Common Ancestor Emergence of Variations Over Time timepresent Disease Mutation Section 1

Time = present What Determines Extent of LD? 2,000 gens. ago Disease-Causing Mutation 1,000 gens. ago Section 1

How Far Does Association (LD) Extend Between Neighboring Common Sites? 0kb 160kb 80kb40kb20kb10kb5kb Range of uncertainty Section 1 Theoretical: 3-8 kb

Strategy for Assessing Extent of LD 19 regions 44 Caucasian samples from Utah a great deal of DNA sequencing per sample Distance from core single nucleotide polymorphism (SNP) Section 1 0kb 160kb 80kb40kb20kb10kb5kb

Section 1

A Genome-Wide Assessment of Linkage Disequilibrium Disease Gene Mapping Human history Section 1

MYSTERY: What explains the long-range LD? Section 1  Important event in population history?

Positive Control: 48 Swedes Identical pattern to Utah Section 1

96 Nigerians (Yoruba) Much Less LD Associations in Africans a SUBSET of those in Caucasians MUST be influenced by population history Section 1

Confirmation of less LD in Africans from Direct DNA Sequencing Anna DiRienzo also shows this pattern Section 1

More evidence from Genotyping ~5,000 SNPs (Gabriel et al. 2002) K. Kidd, J. Kidd, Sarah Tishkoff also show this Section 1

Explanation: Bottleneck or ‘Founder Effect’ in History of North Europeans What was this event? (1) Out of Africa? Ancestral Population North Europeans likely <10 founding chromosomes ~100,000 years ago Yoruba Ancestors Section 1 (2) Founding of Europe?

Open Mysteries Section 1 what caused the bottleneck event? “Out of Africa” migration? how many people involved? When did it occur? can we better understand when the founder event occurred, and how many people involved?

Acknowledgements for Section 1 Collaborators: Michele Cargill Stacey Bolk James Ireland Pardis C. Sabeti Daniel J. Richter Thomas Lavery Rose Kouyoumjian Shelli F. Farhadian Ryk Ward Eric S. Lander Samples: Leif Groop Richard Cooper Charles Rotimi

Using Long-Range Linkage Disequilibrium to Detect Positive Selection in the Genome Section 2

Overview 1.The difficulty of detecting genomic regions affected by natural selection 2. The long-range haplotype test 3. Results for two genes: G6PD and CD40 ligand Section 2

Existing formal tests for selection DNA Sequence analysis Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio Weak Genotyping-based tests Not general at present Section 2

Old alleles: low or high frequency short-range LD Positive Selection Our test is based on the relationship between allele frequency and extent of linkage disequilibrium Young alleles: low frequency long-range LD No selection Young alleles: high frequency long-range LD Section 2

The signal of selection frequency Linkage Disequilibrium (Homozygosity) Neutrality Positive Selection Section 2

gene Paradigm of the Core Region Core Haplotypes Section 2

Long-range multi-SNP haplotypes C/T A/G A/G C/T C/T C/T Long-range markersCore markers gene Decay of LD Section 2

Long-range multi-SNP haplotypes 100% Decay of homozygosity (probability, at any distance, that any two haplotypes that start out the same have all the same SNP genotypes) 18% gene C/T A/G C/T Core markers Long-range markers GG C C C C T T T T C T 75% 35% T T C C AG 3 Section 2

CD40 ligand (2002): Recent association by Sabeti et al. involved in immune regulation Two genes associated with malaria resistance well established association to malaria resistance G6PD (1960’s) selection demonstrated in 2001 by Tishkoff et al. Section 2

Experimental Design -180kb Gene +520kb CD40 ligand (7 SNPs in core, 14 at long distances) -480kb G6PD +220kb -180kb TNFSF5 +520kb telomere -480kb Gene +220kb telomere G6PD (11 SNPs in core, 14 at long distances) Section 2

Experimental Design DNA samples from 231 African men Yoruba(Nigeria) Beni (Nigeria) Shona(Zimbabwe) Perfect phase (X chromosome) Section 2

Core haplotypes G6PD Africans (230) non-Africans (95) CD40 ligand Africans (231) non-Africans (91) “A-” protective haplotype Section 2

G6PD: long-range haplotype diversity G6PD-corehap1 G6PD-corehap6 G6PD-corehap3 G6PD-corehap7 G6PD-corehap4 G6PD-corehap8 G6PD-corehap5 G6PD-corehap G6PD-corehap8 “A-” protective haplotype Section 2

G6PD: homozygosity vs. distance EHH Distance from the core region ( kb) Section 2

G6PD: computer simulation vs. data Core haplotype frequency Relative EHH Core haplotype 8 P << Section 2

G6PD: P-values from simulation P- value Distance from the core region ( kb) Section 2

G6PD also stands out in comparison to 7 control regions Corehaplotypefrequency Relative EHH Section 2

CD40 ligand: long-range haplotype diversity corehap1 corehap4 corehap2 corehap5 corehap3 corehap4 Section 2

CD40 ligand: homozygosity vs. distance EHH Distance from the core region ( kb) Section 2

CD40 ligand: computer simulation vs. data Core haplotype frequency Relative EHH Core haplotype 4 P << Section 2

CD40 ligand: P-values from simulation P- value Distance from the core region ( kb) Section 2

CD40 ligand also stands out in comparison to 7 control regions Corehaplotypefrequency Relative EHH Section 2

Malaria resistance arose in last 10,000 years in Africa ~2,500 years ago for G6PD ~6,500 years ago for CD40 ligand Long-range linkage disequilibrium also gives a direct estimate of the date Section 2

Traditional tests fail to detect the effect Tajima’s D HKA test Mcdonald and Kreitman Fu and Li’s D Ka/Ks ratio Not significant in our data. This test is a powerful way to detect selection in last 10,000 years Section 2

Conclusions: Powerful general approach for detecting selection Section 2

Conclusions: Powerful general approach for detecting selection Section 2

Screen the genome for Postive Selection Conclusions: Powerful general approach for detecting selection Section 2

Conclusions: Genome-wide screen for natural selection We can find disease genes without patients! Section 2

What’s coming… Section 2 1.Generalization of the long-range haplotype test 2.Application of the approach genome-wide Haplotype map data set Disease gene screen data sets

Acknowledgements for Section 2 Pardis C. Sabeti John Higgins Haninah Z.P. Levine Daniel J. Richter Stephen F. Schaffner Stacey Gabriel Jill V. Platko Nicholas J. Patterson Gavin J. McDonald Hans C. Ackerman Sarah J. Campbell David Altshuler Richard Cooper Ryk Ward Eric S. Lander

Note The 3 rd section of the talk is not included here because it presents data that have not yet been published.