Presentation is loading. Please wait.

Presentation is loading. Please wait.

An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National.

Similar presentations


Presentation on theme: "An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National."— Presentation transcript:

1 An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National Institutes of Health

2 Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

3 GUAG splicing exon1exon2 intron mRNA What are Exons and Introns exon1exon2

4 Related work 234 Gilbert 2005 [hybrid; branch- specific] Koonin 2003 [Dollo Parsimony] Csuros 2005 [ML; branch- specific] Kenmochi 2005 [ML; branch- specific] Stolzfus 2004 [Bayes; gene- specific] gain Stolzfus Koonin, Kenmochi, Csuros Gilbert Koonin, Kenmochi, Csuros loss

5 Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

6 Phylogenetic tree

7 HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATT---CTCGTCGTACTCTCGTAC… Multiple alignment

8 41 118 222 230 251 309 377 453 465 539 597 602 713 SC 0 0 0 0 0 0 0 0 0 0 0 0 0 SP 0 1 1 0 0 0 1 0 0 0 0 0 0 CE 1 1 0 0 0 0 1 0 0 0 0 0 0 DM 1 1 0 0 0 0 1 0 0 1 0 0 0 HS 1 1 0 0 1 0 1 0 1 1 1 0 0 AT 1 1 0 1 0 1 1 1 0 1 0 1 1 Strong phyletic signal Presence/absence maps (proteasome component C3)

9 Missing data HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATT---CTCGTCGTACTCTCGTAC… ?

10 Missing data (proteasome component C3) 41 118 222 230 251 309 377 401 453 465 539 597 602 713 SC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SP 0 1 1 0 0 0 1 ? 0 0 0 0 0 0 CE 1 1 0 0 0 0 1 0 0 0 0 0 0 0 DM 1 1 0 0 0 0 1 0 0 0 1 0 0 0 HS 1 1 0 0 1 0 1 ? 0 1 1 1 0 0 AT 1 1 0 1 0 1 1 1 1 0 1 0 1 1

11 Bayesian Network

12 Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

13 Probability structure descendant in state 0descendant in state 1 parent in state 0 parent in state 1 root prior probability: transition probability:for gene and branch of length branch-specific loss gene-specific loss branch- specific gain gene- specific gain

14 Rate variation across sites gain variation loss variation shape parameter (gain) fraction of invariant sites shape parameter (loss)

15 Parameter Summary Global parameters – probability for intron absence in the root – fraction of invariant sites – shape parameters of the gamma distribution Gene-specific parameters – gain rate – loss rate Branch-specific parameters – gain coefficient – loss coefficient

16 Homogeneous vs. Heterogeneous Evolution The number of parameters in the model number of extant species number of genes Homogeneous Evolution setting G = 1 Heterogeneous Evolution fixing global parameters and branch-specific parameters

17 Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

18 Likelihood maximization via Expectation Maximization E-Step inward-outward recursions on the tree member in the junction-tree algorithms family missing data are naturally embedded

19 Inward (gamma) recursion ? ? ? ? ? ? q

20 Inward (gamma) recursion - Initialization

21 Inward (gamma) recursion - Recursion q

22 Outward (alpha) recursion

23 Likelihood maximization via EM E-Step inward-outward recursions on the tree member in the junction-tree algorithms family missing data are naturally embedded M-Step low-tolerance variable-by-variable maximization Newton-Raphson

24 Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

25 Intron density in ancient eukaryotes 234 Gilbert 2005 Koonin 2003 Csuros 2005 Kenmochi 2005 Stolzfus 2004

26 Evolutionary Landscape loser gainer stable dynamic

27 Modes of Evolution

28 loser gainer stable dynamic

29 Outline 234 genes 295 genes 187 genes Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

30 Gene Characteristics New features of genes: Intron gain rate Intron loss rate Old features of genes: Expression level Evolutionary rate Lethality Connectivity in protein-protein interactions Connectivity in genetic interactions

31 Combined Features

32

33 Important genes gain introns StatusAdaptabilityreactivity Gain rate0.330.060.37 Loss rate0.05-0.030.07

34 Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary

35 Conclusions Disparate landscape – both gain and loss play role in intron evolution The common ancestor of the crown group had an intron content comparable to fungi, apicomlexans and dipterans Three modes of evolution – more than one mechanism? Important genes tend to gain introns


Download ppt "An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National."

Similar presentations


Ads by Google