Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.

Similar presentations


Presentation on theme: "Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006."— Presentation transcript:

1 Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006

2 2 Papers  Review: The application of molecular genetic approaches to the study of human evolution  L. Luca Cavalli-Sforza & Marcus W. Feldman Nature Genetics 33, 266 - 275 (2003)  Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations  Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier, Genome Res., Aug 2005; 15: 1161 - 1167.  Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa  Sohini Ramachandran, Omkar Deshpande, Charles C. Roseman, Noah A. Rosenberg, Marcus W. Feldman, and L. Luca Cavalli-Sforza, PNAS, November 1, 2005, Vol. 102, No. 44

3 3 Presentation Overview  History of genetic variation  Overview of areas of study  Human Migration Simulations

4 4 Timeline: Study of Genetic Variation 1919 Existence of human genetic variation first demonstrated in a study of ABO gene 1966 Studies showed that almost every protein has genetic variants  These variants became useful markers for population studies Marker: A gene or other segment of DNA whose position on a chromosome is known 1980 A new method to construct a genetic linkage map of the human genome by using radioisotopes generated more new markers 1986 Polymerase Chain Reaction (PCR) developed  Allows a small amount of the DNA molecule to be amplified exponentially  Expanded the number of studies that could work directly with DNA 1990s Development of automated DNA sequencing

5 5 Shaping of Genetic Variation  Human evolution  Genome structure  Population history  Human migration  Dating origin of our species  Tracking migrations of our species using DNA  Relationship of separated human populations

6 6 http://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/Variation/var17.html Gene Variation  All genetic variation is caused by mutations  Most common: Single Nucleotide Polymorphisms  A single base change, occurring in a population at a frequency of >1% is termed a single nucleotide polymorphism (SNP).  When a single base change occurs at <1% it is considered to be a mutation SNPs: DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species. e.g. AAGCCTA AAGCTTA Here, we say that there are two alleles : C and T Allelic frequency: Defines variation at a single nucleotide position Polymorphism: Variation in DNA sequences between individuals

7 7 Allele Frequency Change (1)  Natural Selection  tendency of beneficial alleles to become more common over time and detrimental ones less common  Random Genetic Drift  fundamental tendency of any allele to vary randomly in frequency over time due to statistical variation alone => Both can lead to elimination or fixation of an allele

8 8 Allele Frequency Change (2)  Migration  Avg. 1 immigrant per generation in a population  Sufficient to keep drift partially in check  Avoids complete fixation of alleles  Whole populations migrate and settle elsewhere  If initially small and then expands:  founder frequency vs. original population frequency: Δ  founder frequency vs. new location population frequency: Δ  Creates more chances for drift  Causes divergence and intergroup variation in allele frequencies => Group migration has opposite effect to individual migration

9 9 Population History (1)  Large populations that are geographically and genetically distant: History can be inferred by Population Trees  Assumptions:  Fissions occur randomly in time  Constant rate of neutral evolution in each population between fissions Neutral Evolution:  Where many changes that occur during evolution are selectively neutral  The frequency of a selectively neutral gene is as likely to decrease as to increase by genetic drift  On average the frequencies of neutral alleles remain unchanged from one generation to the next.

10 10 Summary Tree of World Population  Polymorphisms of 120 protein genes  1,915 populations Nature genetics supplement, Volume 33, March 2003, pg 267

11 11 High Resolution Population History (1)  mtDNA – maternally inherited

12 12 High Resolution Population History (2)  Y chromosome – paternally inherited

13 13 Population History (2)  Geographically close populations  Distances often highly correlated  Genetic distance of population pairs measured by F ST  Function of geographic distance between members of population pairs  Asymptote for genetic distance 1,000-2,600 miles avg.  World and Asia higher since are not at equilibrium Nature genetics supplement, Volume 33, March 2003, pg 268

14 14 Population History (3)  Statistical Methods:  Principal Components  Good for when migration is frequent between neighbors  Gives similar results to trees (assumptions hold)  Cluster Detection  Good for analyzing large population genetic datasets  Produces same primary continental clusters as earlier methods

15 15 Tracking migrations using DNA  ‘Standard model of modern human evolution’  Expansion from East Africa Nature genetics supplement, Volume 33, March 2003, pg 270 Expansion from East Africa into rest of Africa by ~ 1,000 individuals Second expansion – into Asia

16 16 Serial Founder Effect Originating in Africa  Found a linear relationship between genetic and geographic distances in a world-wide sample of human populations Ramachandran et al.,PNAS 102(44) = Within region comparison = Africa vs. Eurasia = America vs. Oceania Waypoints:  Fixed locations used in estimation of between-continental distances.  Makes estimates more reflective of human migration patterns  Based on belief humans did not cross large bodies of water while migrating (until recently)

17 17 Alternate Hypothesis  ‘Multiregional model’  Human populations originated in each continent and evolved in parallel  Study: Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations  Quantifies likelihood of Unique Origin (UO) model relative to Multiregional Evolution (ME) model. Nature genetics supplement, Volume 33, March 2003, pg 270

18 18 Method (1)  Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs) Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., and Feldman, M.W. 2002. Genetic structure of human populations. Science 298: 2381–2385 STR: A common class of polymorphism, consisting of a pattern of two or more nucleotides repeating in tandem. Repeat unit: 2-10 base pairs e.g. GATAGATAGATAGATAGATAGATA  Simulations designed to match a previous study that used 377 Short Tandem Repeats (STRs)  1052 individuals, 52 populations IndividualsSeparates different populations Populations – e.g. Han Regions - e.g. East Asia

19 19 Method (2)  Considered populations with sample size > 20 individuals  Called the Rosenberg22 data set Nicolas Ray, Mathias Currat, Pierre Berthier, and Laurent Excoffier, Recovering the geographic origin of early modern humans by realistic and spatially explicit simulations Genome Res., Aug 2005; 15: 1161 - 1167. Carrying Capacity: Population level that can be supported for an organism, given the quantity of food, habitat, water and other life infrastructure present. Friction: The relative difficulty of moving through a deme. Deme: A sub-population. Land surface of the Old World was divided into a grid of 9,226 demes – each 100 x 100 km 2

20 20 Simulations  Used software called SPLATCHE  Step 1: Forward time simulation of demographic and spatial expansion using environmental information  For each generation, record:  # individual genes/deme  # immigrant genes from 4 nearest-neighboring demes  Step 2: Backward time simulation of genealogy and gene diversity sampled at given locations using information generated in Step 1

21 21 Step 1  Origin of each expansion: single deme with 50 diploid individuals (100 nuclear genes)  25 origins of expansion  Onset of expansion: 4000 generations (20,000 years ago)  Each generation, occupied demes subject to:  Growth phase  Constant growth rate r= 0.3  Carrying capacity K depends on environment of deme  Emigration phase

22 22 Step 1 – Emigration Phase  Distributed 0.05N t emigrants to 4 nearest neighboring demes  N t = size of deme at time t  Exact number of emigrants sent to each deme (E i ) controlled through friction values (F i ), for each deme  Friction = relative difficulty of moving through a deme  F i values kept within range of 0.1 to 1  E i : computed from multinomial distribution:

23 23 Step 2  For each of 25 geographic origins, performed 10,000 simulations of genetic diversity  Simulations generated molecular diversity data at a given number of STR loci for each of the 22 population samples  Tested 10 evolutionary scenarios  ME Model: 9 combinations of population size and migration rates  For each data set:  Index of population differentiation (R ST ) was computed between all pairs of populations  Provided a measure of genetic divergence between populations  Scenarios 26 – 28: Same population size, different migration rates Scenarios 29 – 31: Africa population size > Asia and Europe population sizes Migration rate adjusted so # emigrants same Scenarios 32 – 34: Africa population size > Asia and Europe population sizes Same migration rates  Scenarios 26 – 28: Same population size, different migration rates  Scenarios 29 – 31: Africa population size > Asia and Europe population sizes Migration rate adjusted so # emigrants same Scenarios 32 – 34: Africa population size > Asia and Europe population sizes Same migration rates  Scenarios 26 – 28: Same population size, different migration rates  Scenarios 29 – 31: Africa population size > Asia and Europe population sizes Migration rate adjusted so # emigrants stayed same  Scenarios 32 – 34: Africa population size > Asia and Europe population sizes Same migration rates

24 24 Timeline 1. 30,000 generations ago: Demographic expansion following first speciation event 2. For 26,000 generations: Large, subdivided population exists 3. 4,000 generations ago: Bottleneck of 10 generations followed by range expansion 1. 30,000 generations ago: Small population went through speciation & instantaneously colonized 3 continents 2. For 26,000 generations: Continents had large populations & exchanged occasional migrants 3. 4,000 generations ago: Three range expansions from three different origins (shown in C) 1. 30,000 generations ago: Small population went through speciation & instantaneously colonized 3 continents 2. For 26,000 generations: Continents had large populations & exchanged occasional migrants 3. 4,000 generations ago: Three range expansions from three different origins (shown in C) 1. 30,000 generations ago: Small population went through speciation & instantaneously colonized 3 continents 2. For 26,000 generations: Continents had large populations & exchanged occasional migrants 3. 4,000 generations ago: Three range expansions from three different origins (shown in C)

25 25 Results - Analysis (1)  What they wanted to do: 1.Assume a given genetic data set is the product of a specific evolutionary scenario 2.Estimate likelihood of all scenarios that can generate that data. 3.Choose scenario maximizing the likelihood  What they did: 1.Select a restricted number of ME and UO scenarios 2.For each, replace likelihood by measure of goodness-of-fit of genetic data to the model 3.Choose scenario maximizing goodness of fit

26 26 Results - Analysis (2)  Goodness-of-fit between observed data and model determined using simulations:  Computed a correlation co-efficient using index of population differentiation R ST values calculated earlier  Repeated this for many simulations/model to get a probability distribution of the co-efficient under each model.  90% quantile value of distribution used as goodness-of-fit index  Value chosen as a result of previous extensive simulation experience  Called index the R90 statistic

27 27 Results - Validation  Simulations were used to evaluate how correctly geographic origin of an expansion could be recovered from STR data.  10,000 simulations performed per scenario  Divided into 2 sets of 5,000 simulations  First 5,000 runs under each scenario used as pseudo-observed data  Compared to second 5,000 generated under all scenarios  Geographic origin of expansion assigned to scenario with the largest R90 statistic  Tallied up how many times they correctly assigned pseudo- observed simulations to true origin for each model

28 28 Results - Distinguishing UO vs. ME models (1)  Used similar validation method to differentiate data sets generated under the two models  Dataset is correctly assigned if chosen scenario belongs to same evolutionary model that generated it - i.e.  Is data set generated under a UO scenario is assigned to any geographic origin under the UO model ?  Is data set generated under any ME scenario is assigned to any ME scenario ?  Correct regardless of location of origin or ME scenario

29 29 Results - Distinguishing UO vs. ME models (2)  Results showed evolutionary models well discriminated with single locus:  UO correct assignment frequency >> ME recovery rate

30 30 Results - Unknown Origins  How is probability of recovering source of expansion affected if we assume an incorrect geographical origin ?  Performed 10,000 simulations on 14 alternative ‘true’ origins  Genetic data sets from 25 simulated potential origins compared with the ‘true’ origins  Measured probability of recovering correct geographic region of origin- 4 regions  Result: Correct assignment / region was much higher when origin was known: 0.771 vs. 0.882 (20 loci) 0.852 vs. 0.999 (377 loci)

31 31 Results – Human Nuclear STRs  R90 goodness-of-fit statistics between original Rosenberg22 and scenario generated data showed:  UO model fit better overall  BUT pointed to North African origin  Suspected this was caused by 377 STRs having European ascertainment bias.  Ran tests to see if results would be affected by an ascertainment bias  Tests showed probability of inferring correct origin drops when using biased data  Re-computed R90 statistics after correcting bias, and found East African origin was now more favored.  Agrees with other recent studies

32 32 Summary  Attempted to find geographic origin of modern humans from patterns in current world-wide genetic diversity  Explicitly accounted for physical constraints to dispersion  Results showed that the origin can be well recovered if:  We have a large number of markers  Markers do not suffer ascertainment bias  Simulated origin is close to true origin  Simulations showed UO and ME models could be clearly distinguished  UO model favored  R90 statistic four times higher for best UO scenario (0.1) vs. best ME scenario (0.023)

33 Thank you CS 374 – Algorithms in Biology Tuesday, October 24, 2006


Download ppt "Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006."

Similar presentations


Ads by Google