Lecture 13: Population Structure October 5, 2015.

Slides:



Advertisements
Similar presentations
Lab 9: Linkage Disequilibrium. Goals 1.Estimation of LD in terms of D, D’ and r 2. 2.Determine effect of random and non-random mating on LD. 3.Estimate.
Advertisements

Lab 3 : Exact tests and Measuring of Genetic Variation.
Lab 3 : Exact tests and Measuring Genetic Variation.
Alleles = A, a Genotypes = AA, Aa, aa
Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Amorphophallus titanum Largest unbranched inflorescence in the world Monecious and protogynous Carrion flower (fly/beetle pollinated) Indigenous to the.
Population Structure Partitioning of Genetic Variation.
Lecture 9: Introduction to Genetic Drift February 14, 2014.
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
MALD Mapping by Admixture Linkage Disequilibrium.
Signatures of Selection
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Quantitative Genetics
Review Session Monday, November 8 Shantz 242 E (the usual place) 5:00-7:00 PM I’ll answer questions on my material, then Chad will answer questions on.
Genetic Diversity of the Phaseolus acutifolius A. Gray Collection of the USDA National Plant Germplasm System Using Targeted Region Amplified Polymorphism.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
Population Genetics Learning Objectives
Lecture 12: Effective Population Size and Gene Flow October 5, 2012.
EM and expected complete log-likelihood Mixture of Experts
Random Sampling, Point Estimation and Maximum Likelihood.
Population Stratification
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift A. Sampling Error.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Lecture 21 Based on Chapter 21 Population Genetics Copyright © 2010 Pearson Education Inc.
Lecture 5: Genetic Variation and Inbreeding August 31, 2015.
Lecture 14: Population structure and Population Assignment October 12, 2012.
INTRODUCTION TO ASSOCIATION MAPPING
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Lab 7. Estimating Population Structure. Goals 1.Estimate and interpret statistics (AMOVA + Bayesian) that characterize population structure. 2.Demonstrate.
Lecture 6: Inbreeding September 4, Last Time uCalculations  Measures of diversity and Merle patterning in dogs  Excel sheet posted uFirst Violation.
Lecture 13: Population Structure
California Pacific Medical Center
Lecture 14: Population Assignment and Individual Identity October 8, 2015.
Lecture 12: Effective Population Size and Gene Flow
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Lab 7. Estimating Population Structure
Lecture 3: Allele Frequencies and Hardy-Weinberg Equilibrium August 24, 2015.
Mammalian Population Genetics
Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.
Lecture 6: Inbreeding September 10, Announcements Hari’s New Office Hours  Tues 5-6 pm  Wed 3-4 pm  Fri 2-3 pm In computer lab 3306 LSB.
Population stratification
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Robert Page Doctoral Student in Dr. Voss’ Lab Population Genetics.
Constrained Hidden Markov Models for Population-based Haplotyping
Why study population genetic structure?
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Signatures of Selection
Classification of unlabeled data:
Imputation-based local ancestry inference in admixed populations
Haplotype Reconstruction
Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium
Population Genetic Structure of the People of Qatar
I. Statistical Tests: Why do we use them? What do they involve?
Vineet Bafna/Pavel Pevzner
There is a Great Diversity of Organisms
Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation- Maximization Algorithm for Unphased Diploid Genotype Data  Daniele.
Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar.
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations.
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
Brian P. McEvoy, Joanne M. Lind, Eric T. Wang, Robert K
Population Genetic Structure of the People of Qatar
Presentation transcript:

Lecture 13: Population Structure October 5, 2015

Last Time uEffective population size calculations uHistorical importance of drift: shifting balance or noise? uPopulation structure

Today uCalculation of F ST uDefining populations on genetic criteria

F-Coefficients uQuantification of the structure of genetic variation in populations: population structure uPartition variation to the Total Population (T), Subpopulations (S), and Individuals (I) T S

F-Statistics Can Measure Departures from Expected Heterozygosity Due to Wahlund Effect where H T is the average expected heterozygosity in the total population H I is observed heterozygosity within a subpopulation H S is the average expected heterozygosity in subpopulations

F ST : What does it tell us?  Degree of differentiation of subpopulations  Rules of thumb:  0.05 to 0.15 is weak to moderate  0.15 to 0.25 is strong differentiation  >0.25 is very strong differentiation  Related to the historical level of gene exchange between populations  May not represent current conditions

F ST is related to life history Seed Dispersal Gravity Explosive/capsule0.262 Winged/Plumose (Loveless and Hamrick, 1984) Successional Stage Early0.411 Middle0.184 Late Life Cycle Annual0.430 Short-lived0.262 Long-lived0.077

Calculating F ST Locus with Codominant alleles for flower color Red: 10, White: 5,, Pink, 15 Red: 18, White: 2, Pink, 10 B 1 B 1 =Red; B 2 B 2 = White; and B 1 B 2 = Pink Subpopulation 1: Subpopulation 2: q 1 = =.417 H s1 = 2(0.583)(0.417) = q 2 = = H s2 = 2(0.767) (0.233) = 0.357

Calculating F ST For 2 subpopulations: H S = (H S1 +H S2 )/2 Calculate Average H E of Subpopulations (H S ) Calculate Average H E for Merged Subpopulations (H T ): Red: 10, White: 5, Pink, 15 Red: 18, White: 2, Pink, 10 H T = 2(0.675)(0.325) = H S = ( )/2 = 0.422

Bottom Line: F ST = (H T -H S )/H T = ( )/ = u3.9% of the total variation in flower color alleles is due to variation among populations AND uExpected heterozygosity is increased 3.9% when subpopulations are merged (Wahlund Effect) Red: 10, White: 5, Pink, 15 Red: 18, White: 2, Pink, 10

Nei's Gene Diversity: G ST Nei's generalization of F ST to multiple, multiallelic loci Where H S is mean H E of m subpopulations, calculated for n alleles with frequency of p j Where p j is mean allele frequency of allele j over all subpopulations

Unbiased Estimate of F ST uWeir and Cockerham's (1984) Theta uCompensates for sampling error, which can cause large biases in F ST or G ST (e.g., if sample represents different proportions of populations) uCalculated in terms of correlation coefficients Calculated by FSTAT software: Goudet, J. (1995). "FSTAT (Version 1.2): A computer program to calculate F- statistics." Journal of Heredity 86(6): Often simply referred to as F ST in the literature Weir, B.S. and C.C. Cockerham Estimating F-statistics for the analysis of population structure. Evolution 38:

Hierarchical F-Statistics Can consider differentiation at both regional and subpopulation levels  F RT : Proportion of genetic variation that is due to differentiation among regions  F SR : Differentiation among subpopulations within regions  F ST : Overall differentiation among subpopulations (without regard to region)

F ST as Variance Partitioning uThink of F ST as proportion of genetic variation partitioned among populations where V(q) is variance of q across subpopulations uDenominator is maximum amount of variance that could occur among subpopulations

Analysis of Molecular Variance (AMOVA) uAnalogous to Analysis of Variance (ANOVA)  Use pairwise genetic distances as ‘response’  Test significance using permutations uPartition genetic diversity into different hierarchical levels, including regions, subpopulations, individuals uMany types of marker data can be used  Method of choice for dominant markers, sequence, and SNP

Phi Statistics from AMOVA Correlation of random pairs of haplotypes drawn from a region relative to pairs drawn from the whole population (F RT ) Correlation of random pairs of haplotypes drawn from an individual subpopulation relative to pairs drawn from a region (F SR ) Correlation of random pairs of haplotypes drawn from an individual subpopulation relative to pairs drawn from the whole population (F ST )

What if you don’t know how your samples are organized into populations (i.e., you don’t know how many source populations you have)? What if reference samples aren’t from a single population? What if they are offspring from parents coming from different source populations (admixture)? More fundamentally, what is a population?

Defining populations on genetic criteria  Assume subpopulations are at Hardy- Weinberg Equilibrium and linkage equilibrium  Probabilistically ‘assign’ individuals to populations to minimize departures from equilibrium  Can allow for admixture (individuals with different proportions of each population) and geographic information  Bayesian approach using Monte-Carlo Markov Chain method to explore parameter space  Implemented in STRUCTURE program: Londo and Schaal 2007 Mol Ecol 16:4523

Structure Program  One of the most widely-used programs in population genetics (original paper cited >15,000 times since 2000)  Very flexible model can determine:  The most likely number of uniform groups (populations, K)  The genomic composition of each individual (admixture coefficients)  Possible population of origin

 Individuals in our sample represent a mixture of K (unknown) ancestral populations.  Each population is characterized by (unknown) allele frequencies at each locus.  Within populations, markers are in Hardy-Weinberg and linkage equilibrium. uRoughly speaking, the model sorts individuals into K clusters so as to minimize departures from HWE and Linkage Equilibrium. A simple model of population structure Slide adapted from Jonathan Pritchard, 2007 presentation to Conservation Genetics meeting

MCMC algorithm (for fixed K)  Start with random assignment of individuals to populations  Step 1: Gene frequencies in each population are estimated based on the individuals that are assigned to it.  Step 2: Individuals are assigned to populations based on gene frequencies in each population.  Continue this process many times to maximize likelihood of the arrangement  …Estimation of K performed separately. Slide adapted from Jonathan Pritchard, 2007 presentation to Conservation Genetics meeting

Admixed individuals are mosaics of ancestry from the original populations AncestralPopulations Slide adapted from Jonathan Pritchard, 2007 presentation to Conservation Genetics meeting

The two basic ancestry models used by structure.  No Admixture: each individual is derived completely from a single subpopulation  Admixture: individuals may have mixed ancestry: some fraction q k of the genome of individual i is derived from subpopulation k. Slide adapted from Jonathan Pritchard, 2007 presentation to Conservation Genetics meeting  The admixture model allows for hybrids, but it is more flexible and often provides a better fit for complicated structure. This is what we will use in lab.

Notes on Estimating the Number of Subpopulations (k) uLikelihood-based method is the simplest, but likelihood often increases continuously with k uMore variability at values of k beyond “natural” value uEvanno et al. (2005) method measures change in likelihood and discounts for variation uUse biological reasoning at arriving at final value uCan also incorporate prior expectations based on population locations, other information (e.g., Geneland package)Geneland package uOften need to do hierarchical analyses: break into subregions and run Structure separately for each

Estimating K Structure is run separately at different values of K. The program computes a statistic that measures the fit of each value of K (sort of a penalized likelihood); this can be used to help select K Assumed value of K Ln(Pr(D|KmM))) Convert to posterior probability using Bayes’ Theorem:

Another method for inference of K  The  K method of Evanno et al. (2005, Mol. Ecol. 14: ): Eckert, Population Structure, 5-Aug

Inferred human population structure Each individual is a thin vertical line that is partitioned into K colored segments according to its membership coefficients in K clusters. Africans Europeans MidEast Cent/S Asia Asia Oceania America Rosenberg et al Science 298:

Structure is Hierarchical: Groups reveal more substructure when examined separately Rosenberg et al Science 298: