Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding human admixture, and association mapping in admixed populations. Simon Myers.

Similar presentations


Presentation on theme: "Understanding human admixture, and association mapping in admixed populations. Simon Myers."— Presentation transcript:

1 Understanding human admixture, and association mapping in admixed populations. Simon Myers

2 Admixture Generation 1 Generation 2 Generation 7 …. Generation 70 ….

3 Human admixture Many human populations are thought to be admixed –African Americans –Native Americans –Melanesians.... Using these groups in disease (association) mapping How have admixture events shaped human genetic diversity? –Identify admixed groups, source populations –Estimate times of admixture events

4 Association mapping Now economical to type e.g. 1,000,000 “SNPs” Association mapping looks for differences in mutation frequency between unrelated disease cases, and controls –WTCCC, 2007 –Diabetes, Coronary Artery Disease, Crohn’s disease… Admixed populations can offer more power Patients Controls * * * *

5 IDEA: Cases oversample population where causal allele is more frequent In admixed population, can combine –Long range “admixture signal” –Short range association signal –Boosts power, but must infer ancestry chunks –Existing methods (ANCESTRYMAP, STRUCTURE) cannot use dense data “Admixture LD” Figure from N. Patterson Patients Controls * * * *

6 African Americans On average, 80% African and 20% European ancestry –Well modelled as a mixture between West African and European American individuals (F st below 0.002) –Have genome wide data for 60 unrelated Yorubans from Nigeria and 60 Europeans from Utah (Haplotype Mapping project, 4 million “SNPs”)

7 Rate  B Modelling admixed chromosomes Ancestry changes occur as a Poisson (Markov) process along the genome (Falush et al. 2003) Suppose we already have data for chromosomes from populations A and B –Model variation data within an ancestry segment –New chromosome shares ancestry with other chromosomes –Recombination means ancestry changes as move along genome –Model this as a Markov process, adapted from Li and Stephens (2003) Pop A Pop B True ancestry Prior (1-    /n B Prior  B /n A Probability  B

8 HMM Inference framework Wish to infer ancestry –Ancestry is hidden state in HMM –Can use usual forward-backward approaches to infer ancestry, estimate parameters –Implemented as HAPMIX Unknown ancestry (hidden state) Pop A Pop B

9 Simulatedand real data 98% squared correlation with truth Simulated African American data (chromosome 1 shown) Simulation stitched together segments of real Yoruban and French chromosomes (from HGDP dataset) Real African American data chromosome 1

10 Moving to mapping Bayesian mapping approach Additional features: –SNP imputation (Marchini et al. 2007) –Unknown phase, computational speedups via algorithm approximation No false positives in 1000 AA cases+controls, simulated under null –Genome-wide Bayes factor: 1.09 Simulations use HAPGEN (Marchini et al. (2007); The Wellcome Trust Case Control Consortium (2007))

11 8q24 prostate cancer study 8q24 region –African American “Admixture LD” signal (Freedman et al. 2006; from 1600 prostate cancer cases) – European association signal: (DG8S737 -8 and rs1447295) associated with disease status (Amundottir et al. 2006) Are the same variants responsible? African American data: –654 early onset cases (<72), 581 controls –1600 mutations typed, in 3 million base region Haiman et al. NG (2007)

12 rs6983561 rs10090154 rs13254738 rs7000448 Broad11934905 rs6983267 DG8S737-8 Strong association signal

13 Population specific mutations rs6983561 rs10090154 rs13254738 rs7000448 Broad11934905 rs6983267 DG8S737-8 rs979200 Haiman et al. NG (2007) Gudmondsson et al. NG (2007) Yeager et al. NG (2007) Salinas et al. Canc. Epi. Bio. (2008)

14 How have admixture events shaped human genetic diversity? Worldwide variation data for the HGDP (Li et al., Science 2008) –53 worldwide populations –~1000 unrelated individuals –Typed at ~650,000 SNP loci Cavalli-Sforsza, Nat. Rev. Genet. 2005

15 Chunk sizes date admixture events Generation 1 Generation 2 Generation 7 …. Generation 70 …. Chunk sizes act as a clock on, number of generations since admixture Big Small

16 p=0.2 =7 =20 =50 No admixture! d 0 Dating admixture with accurate painting

17 Sloppy painting Thanks to Garrett Hellenthal

18 Use empirical decay curve Fit admixture time via least squares Bootstrap CIs Technical features provide robustness to: Unphased data SNP ascertainment Local LD Mislabelling of chunks Multiple events can be dealt with Dating admixture with sloppy painting 0

19 Maya Results reveal admixture event 8.2 generations ago (95% bootstrapped CI 1746-1803, assuming 28 years per generation) Native American (Colombian) European (French) ancestry Sub-Saharan African ancestry (Yoruba)

20 Older admixture: Hazara Admixture 22.2 generations ago Around 1385 (1342-1429)

21 Older admixture: Uygur Admixture 23.4 generations ago Around 1352 (1269-1421)

22 Mongol era admixture 1374 1386 1326 1352 1210 1305 1314 1288 Mongol empire 1206 – 1368 peak 1279 1374 1386 1326 1352 1210 1305 1314 1288

23 Older still: Kalash 680 BC (933 BC – 345BC)

24 Acknowledgements NIH The Broad Institute Oxford University Collaborators: –Garrett Hellenthal –Daniel Falush –Alkes Price –David Reich –Nick Patterson –Jonathan Marchini –Joe Pickrell (Phasing) Broad/ HMS: –Alicja Waliszewska –Julia Neubauer –David Altshuler –Matt Freedman –Christine Schirmer USC: –Chris Haiman –Brian Henderson

25 Multiple dates: Mozabite Mozabite show admixture between west Sub-Saharan African and Middle East/ South East Europe over millennia. 2 admixture events, very approximate dates: 38BC, 1634AD 1 admixture event: 1318AD (1157-1450)

26 Multiple sources, multiple times: Cambodian

27 Association mapping in admixed groups In admixed population, can combine –Short range association signal –Long range “admixture signal” –Boosts power, but must infer ancestry chunks –Existing methods (ANCESTRYMAP, STRUCTURE) cannot use dense data Common European mutation (25%) Rare European mutation (5%)

28 African-Americans vs. Europe Must infer ancestry chunks to gain full power No existing approaches to do this from dense data Assuming average 80% African ancestry in African Americans 1000 African Am. cases ~ 2000 European cases

29 Approximate ancestry labelling We have samples from multiple possible “parental” populations, typed at many loci Use an extension of a HMM model for genetic variation Model a new chromosome, from a different population, as a constructed from a mosaic of other sample members Li+Stephens (2003), Hellenthal et al. (2008) Truth:

30 Inferring local ancestry Wish to apply most powerful approach Must infer ancestry chunks –Required, to obtain full power on previous slide Imputation of SNPs typed in HapMap further increases power –Do this conditional on ancestry chunks High accuracy, and reliable measures of uncertainty, needed to avoid false positive “hits”

31 Bayesian mapping approach Prior Model No locus Association signal Ancestry signal only 1/21/4 Uniform priors Disease variant location Allele frequencies uniform in (0,1) Relative risk  uniform in (1.2,2.0) Uniform priors Disease variant location Population relative risk,  uniform in (1.2,2.0)

32 Testing performance Simulations of realistic data to Examine accuracy of inference Test method for false positive inferences 1000 simulated African American case individuals, 1000 controls Took 60 CEU and 60 YRI HapMap chromosomes Use HAPGEN (Marchini et al. 2007) to simulate 2000 chromosomes separately in each case Create “mosaic” African American (AA) samples with 60-100% African ancestry and 5-7 generations of admixture Randomly assign Case-control status Typed on Affy 500k chip Data thinned to match number of SNPs typed successfully by WTCCC Infer population and impute 3,000,000 additional HapMap markers for each individual Use remaining 60 CEU and YRI chromosomes as “parentals” More difficult than real situation of 120 parentals Examine “admixture” and “association” signals

33 Inferred probability of variant allele Empirical probability of variant allele Imputed probability of variant allele Frequency across imputed SNPs 1. Most SNPs are imputed with strong certainty SNP imputation on 500K data 2. SNP imputation is well calibrated

34 Accurately measure ancestry Chromosome 6 Position Mean proportion of African ancestry Smoothed inferred proportion Inferred African proportion True African proportion 2.5% error, highest in genome

35 Imputed ancestry Actual ancestry Chrom 1 Chrom 4 Chrom 7 Chrom 10 Chrom 13 Chrom 16 Chrom 19 Chrom 22 Almost all ancestry information still captured (r 2 =97%) Use CEU for both parentals

36 Results - admixture MYC


Download ppt "Understanding human admixture, and association mapping in admixed populations. Simon Myers."

Similar presentations


Ads by Google