Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Similar presentations


Presentation on theme: "Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008."— Presentation transcript:

1 Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008

2 Overview Given a DNA sequences how do we know when natural selection has occurred? Different methods of answering this question How does having the entire genome available change this?

3 Natural Selection Introduction

4 Natural Selection Introduction

5 Natural Selection Introduction

6 Natural Selection What sort of artifacts would this leave within the genome? Introduction

7 Natural Selection Introduction The frequency of the long gene increases from one generation to the next. It eventually reaches 100%, or fixation.

8 Natural Selection Gene Perspective Introduction Same process at the gene level Let the yellow dot represent the advantageous allele It begins at a small frequency (.125 in this case)

9 Natural Selection Gene Perspective Introduction During selection The allele has risen in frequency! Because of linkage, the nearby alleles have also risen in frequency

10 Natural Selection Gene Perspective Introduction The allele has reached fixation! As time goes on the nearby genes will slowly begin to reach fixation as well Diversity has been lost

11 Natural Selection Gene Perspective Introduction Effect of Selection on the Genome Next Challenge: How did this effect differ from non-selection?

12 Neutral Theory (N.T.) Problem: Need to distinguish natural selection Therefore: Need a null hypothesis Solution: Create model that approximates neutral evolution Introduction Kimura, 1960s

13 N.T. & Genetic Drift Most variation is neutral with respect to selection Therefore most changes in frequency are due to genetic drift Introduction

14 N.T. & Genetic Drift A neutral gene has an equal probability of increasing or decreasing in frequency in the next generation Introduction

15 N.T. & Mutation New alleles are introduced a constant rate (at a particular point) To think about: How will this help us search for selection? Introduction

16 N.T. & Mutation Introduction

17 N.T. & Mutation Introduction

18 N.T. & Mutation Introduction

19 N.T. & Recombination Recombination occurs at a near- constant rate at a given position Introduction

20 Testing the N. T. How would natural selection differ from these assumptions? Introduction

21 “ Positive Natural Selection in the Human Lineage” P. C. Sabeti, S. F. Schaffner, B. Fry, J. Lohmueller, P. Varilly, Shamovsky, A. Palma, T. S. Mikkelsen, D. Altshuler, E. S. Lander

22 Testing for Selection Sabeti et al. Review of current state of genomic selection Five statistical tests which use divergence from neutral theory to test for selection Ideas? Functional Alteration, Decreased Diversity, High Derived Alleles, Population Differences, Long Haplotypes

23 Sabeti et al. I. Functional Alteration Get a section of genome, and compare synonymous vs. non-synonymous mutations between two species Definition of synonymous mutation

24 I. Functional Alteration Sabeti et al. Silent/ Synonymous Non-Synonymous

25 I. Functional Alteration Sabeti et al. Long time scale, because it is an interspecies metric Limited value--only finds ongoing or recurrent selection Use a Ka/Ks statistical test, or McDonald- Kreitman

26 II. Decreased Diversity Sabeti et al. Way of detecting a selective sweep Requires you know ancestral gene, derived genes A derived gene is one that is a descendent of the ancestral one-it can be inferred using comparison to others species

27 II. Decreased Diversity Sabeti et al. The two small bars represent mutations. They are derived genes of the blue ancestor gene.

28 II. Decreased Diversity Sabeti et al. After the selective sweep the frequency of the derived alleles has jumped vis-a-vis the ancestral gene

29 II. Decreased Diversity Sabeti et al. A real example: derived alleles in red

30 II. Decreased Diversity Sabeti et al. Key idea: need to have ancestral genes present The genes must not have reached fixation! The pattern will be that of normal diversity of alleles but with skewed distribution of variation Statistical Tests: Tajima’s D, Fu and Li’s D*

31 III. New Alleles (AKA High Frequency of Derived Alleles) Another technique for detecting selective sweep Gene ‘hitch-hiking’ Limited diversity because of fixation Key idea: low frequency of new genes, but high diversity of rare alleles Sabeti et al.

32 III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Gene has reached fixation Low diversity in this region compared to other regions

33 III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Next mutations slowly increase the diversity Because they are all new the frequency remains low

34 III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. As more time progresses, any pre- selective sweep alleles die out, and diversity is replace by many derived alleles

35 III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Real world example: Red dots indicate rare alleles

36 III. New Alleles (AKA High Frequency of Derived Alleles) Sabeti et al. Key Idea: The genes will have reached fixation and decreased diversity The diversity will all be in the form of rare alleles (because they are new) Statistical Test: Fay and Wu’s H

37 Comparing Methods The difference between decreased diversity and increased frequency of new alleles? Sabeti et al. Vs.

38 IV. Population Differences Requires population split Disproportionate shift in gene frequencies Limited utility Sabeti et al.

39 IV. Population Differences Sabeti et al.

40 IV. Population Differences Sabeti et al. Tall Tree Island

41 IV. Population Differences Sabeti et al.

42 IV. Population Differences Sabeti et al. Two separated populations--specific gene will show disproportionate shift in frequency with respect to the other genes Limited to cases where there are two populations Statistical Test: F(st), P(excess)

43 V. Long Haplotypes Based on Linkage Disequilibria (LD) Long Haploblock and high frequency Sabeti et al.

44 V. Long Haplotypes Under neutral conditions, a new allele has low frequency and high linkage disequilibrium Sabeti et al.

45 V. Long Haplotypes As time goes on and the neutral allele increases in frequency recombination erodes the L.D. Sabeti et al.

46 V. Long Haplotypes Sabeti et al.

47 Genome-Wide Scanning Better estimation of background rate Helps to confirm previous studies Suggests future areas of research MORE POWER Sabeti et al.

48 Genome-Wide Scanning SNP: Single Nucleotide Polymorphisms (excludes other types of mutations) that occur at > 1% frequency SNPs are the basis of many genome wide analyses Sabeti et al.

49 “Forces Shaping the Fastest Evolving Regions in the Human Genome” K. S. Pollard, S. R. Salama, B. King, A. D. Kern, T. Dreszer, S. Katzman, A. Siepel, J. S. Pedersen, G. Bejerano, R. Baertsch, K. R. Rosenbloom, J. Kent, D. Haussler

50 Background Exploits the very recent sequencing of the chimp and human genome Uses the rate of allele replacement as test for selection Assumption is that highly changing parts of the genome have been under selective pressure Pollard et al.

51 Idea Take chimp and mouse genome, find common regions Compare these regions to human genome Pollard et al.

52 Method Part I First half: Find conserved regions. Use sequence tests to look for regions of 100bp with 96% similarity Pollard et al.

53 Results Part I

54 Conclusion: These areas represent genes with deep functionality

55 Method Part II Pollard et al. Search human genome for conserved regions

56 Method Part II Pollard et al. For every region that doesn’t match up, label Human Accelerated Region

57 Formal Description Pollard et al.

58 Results Part II Found 202 Human Accelerated Regions in total These were regions where there had been rapid evolution in the past 5 million years But evolution doesn’t mean selection Pollard et al.

59 Possible Explanations Relaxation of negative selection -- ruled out because the rate of neutral evolution is slower for 201/202 HARs Natural selection Sudden change in mutation rate Pollard et al.

60 But was it Selection? Pollard et al.

61 A Digression Biased Gene Conversion: Tendency to replace misaligned nucleotides with GC In all but two of the HARs there was no evidence of a selective sweep but significant evidence of GC favored replacement Pollard et al.

62 A Digression New Paper suggests BGC hotspots change for species Conserved areas may suddenly become a BGC hotspot, explaining the HAR’s high BGC rates Adaptation or biased gene conversion: Extending the null hypothesis of molecular evolution, Galtier & Duret 2007 Pollard et al.

63 General Implications Illustrates utility of genome wide approached-- by using the full genome to establish a background rate, signals stand out of noise Weaknesses: approach did not take into account failure to meet the assumption of neutral theory (mutation rate) Pollard et al.

64 “Global Landscape of Recent Inferred Darwinian Selection for Homo Sapiens” E. Wang, G. Kodama, P. Baldi, and R. K. Moyzis

65 Background Ever growing catalog of SNPs for human populations SNP data can be used to construct haplotype maps Can screen whole genome for haplotype outlier Wang et al.

66 Idea Take only homozygotes Bin the alleles together Calculate the L.D. for each allele Wang et al.

67 Idea Wang et al.

68 Formalized Description Wang et al.

69 Description of the Formalized Description Wang et al. Expected decay of LD for a allele of a specific frequency

70 Description of the Formalized Description Wang et al.

71 Description of the Formalized Description Wang et al. Selective sweep will be more resistant to decay

72 Description of the Formalized Description Wang et al. Normalize with respect to the sigmoidal curve

73 Advantages of Method By using the whole genome can track not only for L. D. but the exponential decay of L.D. over distance. This helps to distinguish selective sweeps from other demographic shifts such as bottlenecks Wang et al.

74 Results Wang et al.

75 Results Wang et al. “Darwin’s Fingerprint”: Using different datasets from different populations, certain areas show consistent evidence of selection

76 Discussion Wang et al. Compare regions to known gene functions Six groups predominate Test was well designed Limited detection: Genes cant be at fixation

77 Overall Conclusions It all comes down to statistics. What are the null assumptions? What are the alternate assumptions? Genome-wide scans improve by allowing us to exploit this elegant statistical method in new ways Improved data for null hypothesis Increased volume to potential candidates Wang et al.

78 Thank You!


Download ppt "Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008."

Similar presentations


Ads by Google