Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eukaryotic Comparative Genomics

Similar presentations


Presentation on theme: "Eukaryotic Comparative Genomics"— Presentation transcript:

1 Eukaryotic Comparative Genomics
June 2018 GEP Alumni Workshop Barak Cohen

2 Detecting Conserved Sequences
Charles Darwin Motoo Kimura

3 Evolution of Neutral DNA
C T A A T T G C T G T G A T T C A G A G T A G C A G T G A T A A G T C T T T G A T G T T G T T G C A G G A G T A G T C G T A * * * * * * * * * * * * * * * * * * * * * * * * *

4 Evolution of Non-Neutral DNA
C T T A G T C C G A T G T G C G T A C C G A C C A T A A G G A T G C A C A C G T A T A C C A T G T G G T A T C C G T A C C A T A A G C A T A C T * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

5 Multi-Species Alignment
ATGTGGCGCAGCCTGTGCCAGCTGGACGATCGA ATGTAGCCTAGCCAGTGCCAGCTGGACGATCGA GTACATCGATAGCTTAGAATGCTGGACGATCTC GTACGTCGATAGCATAGAATGCTGGACGATCTC * * * * ***********

6 How to do Comparative Genomics
Choose species to analyze Align sequences Identify streches of highly conserved nucleotides

7 Choose species Closely Related Species align well not many changes
distantly related species Closely Related Species align well not many changes Distantly Related Species hard to align lots of changes

8 Schizosaccharomyces pombe
S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya Kluyveromyces lactis >350Mya Schizosaccharomyces pombe

9 Case Study: Coding vs.Non-Coding
ATG…. ORF …TAA Coding DNA -codes for protein -triplet code -open reading frame (ORF) -tend to be long ( bp) -highly constrained Non-Coding DNA -regulatory functions -short (5-15 bp) -degenerate -variable spacing

10 CASE 1: Non-Coding ATG… …TAA GAL4

11 Schizosaccharomyces pombe
S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya Kluyveromyces lactis >350Mya Schizosaccharomyces pombe

12 Closely-related sequences are uninformative
ATG… GAL4 paradoxus TCTTCTGAGACAGCATCACTTCTTCTTNTTTTTTACATAACTTATTCTTCTATAATTTTC cerevisiae TCCTTTGAGACAGCATTCGCCCAGTATTTTTTTTATTCTACA-AACCTTCTATAATTT-C ** * *********** * * ******* ** * ************ * paradoxus AACGTATTTACATAGTTCTGTATCAGTTTAATCACCATAATATTGTTTTCCCTCAACTAA cerevisiae AAAGTATTTACATAATTCTGTATCAGTTTAATCACCATAATATCGTTTTCT-----TTGT ** *********** **************************** ****** * paradoxus TGAATGCAATTAGATTTTCTTATTGTTCCCTCGCGGCTTTTTTTTGTTTTATAATCTATT cerevisiae TTAGTGCAATTAATTTTTCCTATTGTTACTTCG-GGCCTTTTTCTGTTTTATGAGCTATT * * ******** ***** ******* * *** *** ***** ******** * ***** paradoxus TTTTCCGTCATTTCTTCCCCAGATTTCCAACTTCATCTCCAGATTGTGTCTATGTAATGC cerevisiae TTTTCCGTCATC-CTTCCCCAGATTTTCAGCTTCATCTCCAGATTGTGTCTACGTAATGC *********** ************* ** ********************** ******* paradoxus ATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGCTACTGTCT cerevisiae ACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGCTACTGTCT * ** ***** ** *** * ** ****** *** ********** ***************

13 Schizosaccharomyces pombe
S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya Kluyveromyces lactis >350Mya Schizosaccharomyces pombe

14 Distantly-related sequences do not align
ATG… GAL4 Noncoding (Promoter) cerevisiae ACTTACCAT-CAAC-CATAGATGGGTAAAC---GGTTAGTAACTAGGAACACGAT castelli AGA-GTCAAACTTTTCGT—ATA--TATATATAATATGTCTGATTGCTGGTT---T * ** * * * * * * * * *

15 Schizosaccharomyces pombe
S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya Kluyveromyces lactis >350Mya Schizosaccharomyces pombe

16 Multiple sequence alignments reveal conserved elements
ATG… GAL4 cerevisiae TGAGACAGCAT-CACTTCTT-CTTNTTTTTTACATAACTTATTCTTCTATAATTTTCAAC mikatae TGAGACAGCATTCACTTCTTTCTTTTTTTTTACATATCTTATTCTTCTATAATTTTCAAC Bayanus TGAGACAGCATTCGCCCAGT--ATTTTTTTTAT-TCTACAAACCTTCTATAATTT-CAAA kudriadzevi TGAGACTGCACTCCC TCTTCCTTTC TCCATAACTT---AC ****** *** * * * ** ** ** **** ** * paradoxus GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC kluyveri GTATTTACATAGTTCTGTATCAGTTTAATCACCATAAT------ATTGTTTTCCCTCAAC cerevisiae GTATTTACATAATTCTGTATCAGTTTAATCACCATAAT------ATCGTTTTCTTTGT-- bayanus TTATTTACATAGTTTTGTATCAGTTTAATCACCATAATCGTAACACCGTTTTACCTCACC ********** ** *********************** * ***** * paradoxus TAATGAATGCAATTAGATTTTC-TTATTGTTCCC-TCGCGGCTTTTTTTTGTTTTATAAT kluyveri TAATGAATGCAATTAGATTTTCCTTATTGTTCCCCTCGCGGCTTTTTTTTGTTTTATAAT cerevisiae ---TTAGTGCAATTAATTTTTC-CTATTGTTACT-TCG-GGCCTTTTTCTGTTTTATGAG bayanus TGATGCGGG--A---ATCCTTC-AGACCGTTCTC-TCGCGC * * * *** * *** *** * paradoxus -CTATTTTTTCCGTCATTTCTTCCCC-AGATTTCCAACTTCAT-CTCCAGATTGTGTCTA kluyveri ACTATTTTTTCCGTCATTTCTTCCCCCAGATTTCCAACTTCATACTCCAGATTGTGTCTA cerevisiae -CTATTTTTTCCGTCATC-CTTCCCC-AGATTTTCAGCTTCAT-CTCCAGATTGTGTCTA bayanus CTTTTTTTTTCGTCATTTCTTCCCC-AGATCTACAACTTTAA-CTCCAGACGGTGTATA ** ****** ****** ******* **** * ** *** * ******* **** ** paradoxus TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC kluyveri TGTAATGCATGCTATCATATTGAGAAAAGATAGAGAAACAACCCTCCTGAAAAATGAAGC cerevisiae CGTAATGCACGCCATCATTTTAAGAGAGGACAGAGAAGCAAGCCTCCTGAAAGATGAAGC bayanus GGCAGTACAAGCAGTGCTTTTGGGAAGAGGCAAAGCTGCAGACCTCGAGAACAATGAAGC * * * ** ** * * ** ** * * ** ** **** *** ******* UAS1 UAS2 UES MIG1

17 CASE 2: Coding ATG… …TAA CLN3

18 Schizosaccharomyces pombe
S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya Kluyveromyces lactis >350Mya Schizosaccharomyces pombe

19 Closely-related sequences are uninformative

20 Schizosaccharomyces pombe
S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya Kluyveromyces lactis >350Mya Schizosaccharomyces pombe

21 Less distantly related species not informative either

22 Schizosaccharomyces pombe
S.cerevisiae ~10Mya S. cariocanus S. paradoxus S. mikatae S. kudriavzevii ~20Mya S. bayanus S. pastorianus S. servazzii S. unisporus S. exiguus S. diarenensis S. castellii S. kluyveri ~150Mya Kluyveromyces lactis >350Mya Schizosaccharomyces pombe

23 Distanly related species reveal functional protein domains

24 Identification of Multi-Species Conserved Regions (MCS)
Human cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct Chimp cccattcttttccaagtgtctccg--cctgcagcgattaggttagaaagcatttctctct Mouse ttcagtcgtttcccagtgtctctga-cattcagagactactttagtaagcattt-tctct Rat tcagtccttccctggcatctccag-cactcaa-gactactttagtaagcattt-tctctg Dog tcaatgactttcccagtctcttctactgggaagagattaggttgcaaatcatttttctct * * * * * * ** How can we decide if this region is “conserved?” Margulies et al (2003) Gen. Res. 13:

25 Its like flipping coins (really)

26 Binomial-Based Method for Detecting Conserved Sequences
Human: AATGG Mouse: AATCG Status: CCCDC p = probability that a site is the same between human and mouse by chance alone (Kimura), q = 1-p For an alignment N base pairs long with n identities calculate the cumulative binomial probability as: Margulies et al (2003) Gen. Res. 13:

27

28

29 Large sequencing projects are underway

30 Tree Topology Influences Power
Star Phylogeny Actual Phylogeny species A species F species B species E species C species D

31 Challenges in larger genomes
Deciding on the neutral rate of substitution Local differences in neutral rate of substitutions Multiple hypothesis testing Repeat sequences and uneven base composition

32 PhastCons and the UCSC Browser
Olig2 100 Kb upstream of Olig2

33 Motif Searching Across Several Multiple Alignments
Gene 1 Gene 2 Gene 3 Gene N Species 1 Species 2 Species 3

34 Information Content EcoR1 Random Rap1 GAATTC GCCTAC ACATTC TCATTC
CGACTC GAATTC ATATCG GAAATG TGTATGGGTG TGTTCGGATT TGCATGGGTG TGTACAGGTG TGTATGGATG TGTTCGGGTT

35 Weight Matrix Model of TATA Box
-8 10 -1 2 1 C: -10 -9 -3 -2 -12 G: -7 -4 T: -6 9 11 G. Stormo

36 Weight Matrix Model of TATA Box
Score = -24 ….A C T A T A A T G T … A: -8 10 -1 2 1 C: -10 -9 -3 -2 -12 G: -7 -4 T: -6 9 11 G. Stormo

37 Weight Matrix Model of TATA Box
Score = 43 ….A C T A T A A T G T … A: -8 10 -1 2 1 C: -10 -9 -3 -2 -12 G: -7 -4 T: -6 9 11 G. Stormo

38 Weight Matrix Model of TATA Box
N(b,i) F(b,i) S(b,i) = log[F(b,i)/P(b)] G. Stormo

39 Now we can compare motifs to each other
4 -3 5 -6 -2 -5 2 -1 11 -10 8 -4 1 15 3 -2 2 1 -1 7 -8 6 4 -3 9 C C G G T T

40 MAGMA unaligned motif finding in multispecies conserved regions
Gene 1 Gene 2 Gene 3 Gene N Species 1 Species 2 Species 3 *Ihuegbu, Stormo, & Buhler, JCB 19:139, 2012


Download ppt "Eukaryotic Comparative Genomics"

Similar presentations


Ads by Google