Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,

Similar presentations


Presentation on theme: "Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,"— Presentation transcript:

1 Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008). Image source: http://mbbnet.umn.edu Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA. Genome Research 18, 1743-1751 (2008).

2 “Forces shaping the fastest evolving regions in the human genome” by Katherine S. Pollard et al.

3 What’s the difference? Image sources: http://pro.corbis.com, http://www.science.psu.edu

4 What’s the difference? Humans have higher “brainpower” Examples: creativity, problem solving, language What part of the genome is the cause? Image source: http://www.spaceflight.esa.int

5 What’s the difference? Human and chimpanzee DNA is 98% similar The 2% difference is 29 million bases (mostly in non- coding DNA) Image source: http://en.wikipedia.org

6 Comparative Genomics Human and rodent genomes are often compared to identify conserved (presumably functional) elements. Humans and chimpanzees are compared to understand what is uniquely human about our genome. Image source: http://genome.ucsc.edu

7 Comparative Genomics Look at HARs in human genome HAR - human accelerated region. High rate of nucleotide substitution in humans, low in other vertebrates. Fastest is HAR1 – novel RNA gene expressed in development of neocortex (language, conscious thought).

8 HARs ~ 100 bp, mostly non-coding Function is likely to be gene regulation. Seem to have been under strong negative selection up to common ancestor of chimp and human. Rapid positive selection then started in humans only. Image source: http://www.shutterstock.com

9 Finding HARs Evolutionary tree based on the comparison of conserved regions in whole-genome alignments between species. Branch lengths given in substitutions per base, or in millions of years Evolution of vertebrates Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

10 Finding HARs Find HARs by using LRT, the likelihood ratio test. In statistical hypothesis testing, the likelihood ratio (Λ) is the ratio of the maximum probability of a result under a null hypothesis and alternative hypothesis. The LRT decides between the two hypothesis based on the value of the likelihood ratio.

11 Two models were used for genomic LRT. Model 1: human substitution rate is held proportional to the other substitution rates in the evolutionary tree. Model 2: human substitution rate can be accelerated relative to the rates in the rest of the tree. Finding HARs

12 ... Human Another vertebrate......... All the conserved alignments

13 Finding HARs... Human Another vertebrate......... Determine 1 st set of rates Determine 2 nd set of rates Determine 3 rd set of rates Scale all by the same amount Model 1

14 Finding HARs Human Another vertebrate......... Scale all by the same amount Model 2... Scale the human rates separately

15 Identify regions conserved between human and other vertebrates (34,498 of them)

16 For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Obtain P1 (max probability 1)

17 Identify regions conserved between human and other vertebrates (34,498 of them) For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Loop over all conserved regions. For each region, do: Obtain P1 (max probability 1)

18 Identify regions conserved between human and other vertebrates (34,498 of them) For all regions, fit model 1 and determine the proportional rates that maximize the likelihood of the tree Loop over all conserved regions. For each region, do: Fit model 2 to the region in human, find acceleration for that region that maximizes the likelihood of the tree Obtain P1 (max probability 1) Obtain P2 (max probability 2) Calculate LRT for the region as Λ = log(P2 / P1)

19 Finding HARs Big LRT value indicates an HAR. How big is big? Do 1 million simulations of the 34,498 conserved alignments. To create each simulation, use the model 1 proportional rates. Repeat the LRT calculation for each simulation. Then for each region, find proportion of simulated LRTs that are bigger than its original LRT. That proportion is a p-value that tells if the region is an HAR.

20 Finding HARs Note on methods: vertebrates that were used in selecting the conserved regions (chimp, macaque, mouse, rat, rabbit) were omitted from any LRT analysis. This ensured that the LRT test is independent of the method used to select the conserved regions.

21 Finding HARs Result: 202 HARs were found in the human genome. Image source: http://www.3dscience.com

22 Results for Conserved Elements 80.4% of the 34,498 conserved regions are non-coding. 45.4% of non-coding regions are intronic, 31% are intergenic, Non-coding regions are enriched for transcription factors, DNA-binding proteins, regulators of nucleic acid metabolism

23 Results for HARs 202 HARs have p < 0.1, 49 of them have p < 0.05 HAR1 through HAR5 have p < 4.5e-4, very accelerated Most HARs are non-coding 66.3% are intergenic, 31.7% are intronic, only 1.5% are coding Results support the hypothesis (King and Wilson) that most chimp-human differences are regulatory.

24 Results: Confirming Accelerated Selection in HARs Are the HARs just due to relaxation of negative selection? No. Compare to neutral rate for 4D sites to see. Negative selection Positive selection Image source: http://cs273a.stanford.edu [Bejerano Aut 08/09]

25 The chimp rates in all five elements fall well below the human rates, which exceed the background rates by as much as an order of magnitude. H, human; C, chimp. Genome-wide neutral rate for 4D sites in human and chimp Genome-wide neutral rate for 4D sites in human and chimp in chromosome end bands Image from: K.S. Pollard et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

26 Results: W  S Bias in HARs Dramatic AT  GC bias was observed in HARs. AT  GC substitution bias in HARs HAR1 – HAR5 HAR6 – HAR49 HAR50 – HAR202 GC  AT AT  GC Rest of ~ 34000 conserved elements Image from: Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions of the Human Genome.

27 Results: W  S Bias in HARs Top 49 HARs are 2.7 times as likely to be located near final chromosomal bands as the other conserved elements Interestingly, HAR1 and HAR5 are also in end regions in other mammals, but are not accelerated. Image source: http://www.intelihealth.com

28 HARs tend to be located in regions of high recombination in humans. All of this evidence points to biased gene conversion (BGC) as the driving force behind HARs. Results: W  S Bias in HARs

29 Genetic Recombination Paired chromosomes can exchange homologous pieces Typically occurs during meiosis

30 maternal chromosome A paternal chromosome A diploid germ cell Meiosis

31 maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication diploid germ cell Meiosis

32 maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication Recombination diploid germ cell Meiosis

33 maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication Recombination Segregation diploid germ cell Meiosis

34 maternal chromosome A paternal chromosome A centromere sister chromatids DNA replication Recombination Segregation haploid gametes diploid germ cell Meiosis

35 Recombination Recombination hotspot

36 Genetic Recombination duplex 1 duplex 2 Formation of Holliday Junction intermediate Vertical resolution with crossover Horizontal resolution with gene conversion Mismatch repair or Image source: http://www.sanger.ac.uk

37 Genetic Recombination: Chromosomal Crossover Chromosomal crossover results in exchange of DNA pieces Homologous chromosomes Recombinant chromatids Image source: http://www.emc.maricopa.edu

38 Genetic Recombination: Gene Conversion Gene conversion results in nonreciprocal transfer of DNA Mismatch repair causes DNA to revert back to its original form Recombinant chromatids Image source: http://www.emc.maricopa.edu

39 Genetic Recombination: Gene Conversion The result is a nonstandard ratio of alleles, such as 3:1 This causes homogenization of a species’ gene pool haploid gametes Image source: http://www.emc.maricopa.edu

40 Biased Gene Conversion DNA repair machinery likes to replace weak pairings with strong pairings during gene conversion. A - T is a weak pairing G - C is a strong pairing Image source: http://commons.wikimedia.org

41 Biased gene conversion results in G – C enrichment of a species’ gene pool (in addition to causing homogenization) Recombinant chromatids Biased Gene Conversion A – T replaced by G – C during mismatch repair

42 HARs and Recombination Hotspots HARs tend to be located near recombination hotspots in humans

43 Recombination Hotspots Mysterious Extremely different between chimps and humans (change rapidly during evolution) Not caused by the local DNA sequence (it is the same in human and chimp)

44 Some HARs Recombination hotspots ?

45 Possible Conclusion Recombination-caused BGC (often seen negatively) played a big role in the development of our species.

46 Alternative Explanation Isochore – DNA region (~100 kb) with high gene concentration Isochores are stabilized by many strong (GC) pairings HAR Isochore

47 Theory (Bernardi et al.) that weakly deleterious changes drive isochore to a critical point of destabilization At critical point, GC content cannot decrease – otherwise isochore becomes unstable AT  GC substitution in the isochore suddenly gains selective advantage and sweeps through the population Alternative Explanation

48 Isochore selective sweep theory vs. the BGC theory. Isochore sweep has a different DNA signature than BGC Alternative Explanation ~ 100 kb GC Isochore selective sweep ~ 100 bases GC Biased gene conversion

49 Evidence so far favors the BGC explanation for HARs However, the results are not yet conclusive Alternative Explanation

50 “Dispensability of Mammalian DNA” by Gill Bejerano and Cory McLean

51 Are mammalian CNEs dispensable? CNE – conserved non-exonic element Examples: cis-regulatory DNA, ultraconserved DNA ? Image source: http://apps.co.marion.or.us

52 Cis-regulatory DNA elements promoter or inhibitor Image source: http://cnx.org

53 Cis-regulatory DNA elements Image source: http://cnx.org

54 Ultraconserved elements 200 bp and up, many seem to be regulatory “100% identity with no insertions or deletions between orthologous regions of the human, rat, and mouse genomes.” “Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish.” (quotes from “Ultraconserved elements in the human genome” by Bejerano et al.)

55 Are mammalian CNEs dispensable? About 20% of gene knockout experiments, including cis-regulatory and ultraconserved knockouts, produce no phenotype measurable in lab settings. Image source: http://www.sciencedaily.com

56 Are mammalian CNEs dispensable? Do CNEs have functional redundancy? OR Are CNEs indispensable, but in a way that cannot be observed in the lab? Approach: look at CNEs lost in rodents due to evolution

57 Finding CNEs lost by rodents Computational Pipeline Identify conserved mammalian sequences Pick out the ones absent in rodents Remove artifacts due to assembly, alignment, structural RNA migration

58 Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

59

60 Use UCSC chains and nets To avoid assembly artifacts Ignore multi- level nets Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

61 Identify lost DNA Validate quality of results Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

62 Look at the aligned orthologous sequences in primates (human, macaque), dog, and rodents (mouse, rat). Identifying DNA lost by rodents primates A G dog rodents primates dog Different bases between primates and dog

63 100 bp window Compute primate-dog %id (percentage of identical alignment columns) Identifying DNA lost by rodents primates A G dog rodents primates dog

64 Compute primate-dog %id Identifying DNA lost by rodents primates A G dog rodents primates dog

65 primates A G dog rodents Compute primate-dog %id Deletion in rodents Identifying DNA lost by rodents primates dog !

66 primates A G dog rodents Ultraconserved-like element between primates-dog Identifying DNA lost by rodents primates dog

67 primates A G dog rodents Ultraconserved-like element that was lost in rodents Identifying DNA lost by rodents primates dog !

68 Results for non-exonic ultras 1,691,090 bp of ultraconserved-like sequences were found 1147 bp of these sequences were lost in rodents Thus only 0.086% of ultras is lost in rodents In comparison, ¼ of neutrally-evolving DNA (50%id – 65%id) is lost in rodents Thus ultraconserved-like sequences are 300 times more indispensable than neutrally-evolving DNA

69 Results for neutral DNA Expected uniform rate of lost neutrally-evolving DNA Observed that less conserved sequences are more retained Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

70 Results for neutral DNA Phenomenon due to poorly conserved sequences being adjacent to exons, and thus shielded from being lost Larger deletions are biased away from gene structures Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

71 Moving away from 100%id, there is a mixing of DNA under purifying selection and neutrally evolving DNA Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

72 To distinguish neutral DNA from conserved DNA in the mix, use longer evolutionary tree branch lengths Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

73 Example: human-dog-horse alignment has longer cumulative branch length than human-macaque-dog Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

74 Example: human-dog-horse alignment has longer cumulative branch length than human-macaque-dog Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

75 Thus human-dog-horse alignment has lower %id for neutral DNA than human-macaque-dog This shifts the neutral DNA curve shifts to the right Separating DNA under selection from neutral DNA Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

76 Results for DNA under purifying selection Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

77 Results for DNA under purifying selection 80%id to 100%id identified as DNA under purifying selection As is visible from the figure, practically none of this DNA is lost in the primates (only 0.154% of bases are lost) Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

78 Results for DNA under purifying selection The previous results were for CNEs Those results compare to the numbers for lost coding DNA: Fraction of lost CNEs: 0 at 100%id, 0.00122 at 80%id Fraction of lost exons: 0 at 100%id, 0.0000861 at 80%id Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

79 Results for DNA under purifying selection Thus CNEs under purifying selection are indispensable, similarly to coding elements.

80 CNE dispensability ranking In rodents In primates Deepest in vertebrate tree, so corresponds to the most indispensable CNEs Region of high conservation (CNEs) Left plot explanation (right plot is similar): take the h-m-d alignments, find their conservation %id in each of the shown species. Then for each of those species, plot the fraction of DNA lost in rodents vs the %id. Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

81 CNE dispensability ranking Image from: McLean, C., and Bejerano, G., Dispensability of Mammalian DNA.

82 Conclusion Many mammalian CNE knockouts produce no observable phenotype in the lab, suggesting great functional redundancy. However, evolutionary analysis shows that the CNEs, and particularly ultraconserved regions, are indispensable. Seems like the phenotype in knockouts is subtle, but very important. Image source: http://apps.co.marion.or.us


Download ppt "Comparative Genomics and Evolution Pollard, K.S., et al., Forces Shaping the Fastest Evolving Regions in the Human Genome. PLoS Genetics 2(10), 2006. McLean,"

Similar presentations


Ads by Google