An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National Institutes of Health
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
GUAG splicing exon1exon2 intron mRNA What are Exons and Introns exon1exon2
Related work 234 Gilbert 2005 [hybrid; branch- specific] Koonin 2003 [Dollo Parsimony] Csuros 2005 [ML; branch- specific] Kenmochi 2005 [ML; branch- specific] Stolzfus 2004 [Bayes; gene- specific] gain Stolzfus Koonin, Kenmochi, Csuros Gilbert Koonin, Kenmochi, Csuros loss
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
Phylogenetic tree
HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATT---CTCGTCGTACTCTCGTAC… Multiple alignment
SC SP CE DM HS AT Strong phyletic signal Presence/absence maps (proteasome component C3)
Missing data HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATT---CTCGTCGTACTCTCGTAC… ?
Missing data (proteasome component C3) SC SP ? CE DM HS ? AT
Bayesian Network
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
Probability structure descendant in state 0descendant in state 1 parent in state 0 parent in state 1 root prior probability: transition probability:for gene and branch of length branch-specific loss gene-specific loss branch- specific gain gene- specific gain
Rate variation across sites gain variation loss variation shape parameter (gain) fraction of invariant sites shape parameter (loss)
Parameter Summary Global parameters – probability for intron absence in the root – fraction of invariant sites – shape parameters of the gamma distribution Gene-specific parameters – gain rate – loss rate Branch-specific parameters – gain coefficient – loss coefficient
Homogeneous vs. Heterogeneous Evolution The number of parameters in the model number of extant species number of genes Homogeneous Evolution setting G = 1 Heterogeneous Evolution fixing global parameters and branch-specific parameters
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
Likelihood maximization via Expectation Maximization E-Step inward-outward recursions on the tree member in the junction-tree algorithms family missing data are naturally embedded
Inward (gamma) recursion ? ? ? ? ? ? q
Inward (gamma) recursion - Initialization
Inward (gamma) recursion - Recursion q
Outward (alpha) recursion
Likelihood maximization via EM E-Step inward-outward recursions on the tree member in the junction-tree algorithms family missing data are naturally embedded M-Step low-tolerance variable-by-variable maximization Newton-Raphson
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
Intron density in ancient eukaryotes 234 Gilbert 2005 Koonin 2003 Csuros 2005 Kenmochi 2005 Stolzfus 2004
Evolutionary Landscape loser gainer stable dynamic
Modes of Evolution
loser gainer stable dynamic
Outline 234 genes 295 genes 187 genes Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
Gene Characteristics New features of genes: Intron gain rate Intron loss rate Old features of genes: Expression level Evolutionary rate Lethality Connectivity in protein-protein interactions Connectivity in genetic interactions
Combined Features
Important genes gain introns StatusAdaptabilityreactivity Gain rate Loss rate
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
Conclusions Disparate landscape – both gain and loss play role in intron evolution The common ancestor of the crown group had an intron content comparable to fungi, apicomlexans and dipterans Three modes of evolution – more than one mechanism? Important genes tend to gain introns