Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar.

Similar presentations


Presentation on theme: "Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar."— Presentation transcript:

1 Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar Lao, Kate van Duijn, Paula Kersbergen, Peter de Knijff, Manfred Kayser  The American Journal of Human Genetics  Volume 78, Issue 4, Pages (April 2006) DOI: /501531 Copyright © 2006 The American Society of Human Genetics Terms and Conditions

2 Figure 1 Percentage of information explained when the number of markers that are ascertained from 8,491 SNPs by use of the genetic algorithm based on the informativeness of assignment index (In) is increased from 1 to 10, given four continental groups and the YCC panel (see main text for details). The 95% CI of each SNP combination was computed by resampling the same number of chromosomes from the populations and computing In 1,000 times. The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

3 Figure 2 STRUCTURE analysis of the YCC samples, with K=2, 3, or 4 groups, performed using genotypes of the 10 most informative SNPs ascertained using the genetic algorithm with the total YCC data. STRUCTURE analyses were computed using a model without admixture (A) and a model with admixture (B). Each analysis was repeated five times, after a Markov chain–Monte Carlo (MCMC) burning period of 50,000 and considering the next 200,000 MCMC iterations. In all five runs, good mixing was observed, and similar results were found in accordance with the model used. The natural logarithm of the estimated probability of the data (lnp) is as follows. In panel A, for K=2, lnp=−762.2; for K=3, lnp=−629.2; and, for K=4, lnp=− In panel B, for K=2, lnp=−764.9; for K=3, lnp=−631.2; and, for K=4, lnp=−559.5. The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

4 Figure 3 MDS plot based on the In matrix computed between pairs of populations by use of the genotypes of the 10 most informative SNPs in the 51 population samples from CEPH-HGDP. Four clusters of population can be identified: (i) sub-Saharan African populations, (ii) American populations, (iii) Eastern Asian and Oceanian populations, and (iv) European, Middle Eastern, North African, and Central/South Asian populations. The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

5 Figure 4 STRUCTURE analysis of the CEPH-HGDP samples, with K=2, 3, 4, or 5 groups, performed using genotypes of the 10 most informative SNPs ascertained using the genetic algorithm with the total YCC data. Two different STRUCTURE analyses were computed: a population model without admixture (A) and a population model with admixture (B). Each analysis was repeated five times after an MCMC burning period of 100,000 and considering the next 10,000 MCMC iterations. In all five runs, good mixing was observed, and similar results were found in accordance with the model used. The lnp, assuming K groups, is as follows. In panel A, for K=2, lnp=−11,801.2; for K=3, lnp=−10,977.3; for K=4, lnp=−10,279.2; and, for K=5, lnp=−10, In panel B, for K=2, lnp=−11,886.2; for K=3, lnp=−11,070.6; for K=4, lnp=−10,345.5; and, for K=5, lnp=−10, Cen. Af. Rep. = Central African Republic; S. Afr. = South Africa. The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

6 Figure 5 STRUCTURE analysis of each of the four groups detected in the HGDP-CEPH populations by previous STRUCTURE analysis (see main text) that considers models without admixture (A) and with admixture (B) and assumes K=2. A certain degree of population (sub)structure can be observed only in the case of American populations, but it disappears when three groups are considered (data not shown). Each analysis was repeated five times, after an MCMC burning period of 200,000 and considering the next 200,000 MCMC iterations. In all five runs, good mixing was observed, and similar results were found in accordance with the model used. The lnp, assuming K=2, is as follows. In panel A, for sub-Saharan Africa, lnp=−958.3; for America, lnp=−1,048.1; for East Asia and Oceania, lnp=−3,262.0; and, for Europe, the Middle East, Central/South Asia, and North-Africa, lnp=−5, In panel B, for sub-Saharan Africa, lnp=−946.7; for America, lnp=−1,057.4; for East Asia and Oceania, lnp=−3,263.5; and, for Europe, the Middle East, Central/South Asia, and North-Africa, lnp=−5,433.1. The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

7 Figure 6 BAPS 3.2 clustering results for K=2, 3, 4, and 5 groups in the HGDP-CEPH panel by use of the 10 most informative SNPs ascertained using the genetic algorithm with the YCC data. Each column represents an individual. The log (marginal likelihood) for K=2 groups is −11,687.5; for K=3, −10,832.6; for K=4, −10,164.8, and, for K=5, −10, The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

8 Figure 7 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs and the ABCA12 gene. A, Sliding-window plot of the mean value observed for each window (the gene is represented by a black bar). B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs to rs ; see main text for details). The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

9 Figure 8 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs and the VRK1 gene. A, Sliding-window plot of the mean value observed for each window (the gene is represented by a black bar). B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs to rs ; see main text for details). The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

10 Figure 9 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs A, Sliding-window plot of the mean value observed for each window. B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs to rs ; see main text for details). The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

11 Figure 10 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs A, Sliding-window plot of the mean value observed for each window. B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs to rs ; see main text for details). The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

12 Figure 11 Sliding-window and haplotype analyses performed on the genomic region that includes SNP rs (1 of the 10 most informative SNPs identified), which is located in the LOC gene, by use of Perlegene data. A, Sliding-window plot of the mean value observed for each window (the gene is represented by a black bar). B, Associated P value for comparison with an empirical distribution based on >10,000 genes (see main text). The P=.05 cutoff is represented by a black line. C, Bifurcation plots of the main core haplotypes in the three populations considered. D, Extended homozygosity versus genomic distance to the core haplotype. The region of the core haplotype was selected on the basis of the largest region that was statistically significant in the sliding-window analysis (from rs to rs ; see main text for details). Note the high frequency of the third haplotype in the case of Asian populations and the slow decay of the EHH of that haplotype compared with the other haplotypes both within and between populations. The American Journal of Human Genetics  , DOI: ( /501531) Copyright © 2006 The American Society of Human Genetics Terms and Conditions


Download ppt "Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar."

Similar presentations


Ads by Google