Download presentation
Presentation is loading. Please wait.
Published byMarco Morriss Modified over 9 years ago
1
An EM Algorithm for Inferring the Evolution of Eukaryotic Gene Structure Liran Carmel, Igor B. Rogozin, Yuri I. Wolf and Eugene V. Koonin NCBI, NLM, National Institutes of Health
2
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
3
GUAG splicing exon1exon2 intron mRNA What are Exons and Introns exon1exon2
4
Related work 234 Gilbert 2005 [hybrid; branch- specific] Koonin 2003 [Dollo Parsimony] Csuros 2005 [ML; branch- specific] Kenmochi 2005 [ML; branch- specific] Stolzfus 2004 [Bayes; gene- specific] gain Stolzfus Koonin, Kenmochi, Csuros Gilbert Koonin, Kenmochi, Csuros loss
5
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
6
Phylogenetic tree
7
HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATT---CTCGTCGTACTCTCGTAC… Multiple alignment
8
41 118 222 230 251 309 377 453 465 539 597 602 713 SC 0 0 0 0 0 0 0 0 0 0 0 0 0 SP 0 1 1 0 0 0 1 0 0 0 0 0 0 CE 1 1 0 0 0 0 1 0 0 0 0 0 0 DM 1 1 0 0 0 0 1 0 0 1 0 0 0 HS 1 1 0 0 1 0 1 0 1 1 1 0 0 AT 1 1 0 1 0 1 1 1 0 1 0 1 1 Strong phyletic signal Presence/absence maps (proteasome component C3)
9
Missing data HS …ATGTCGATCGTGCTCGTCGTACTCTCGTAC… DM …ATGTGGATCGTGCTCGTCGTACTCTCGTAC… CE …ATGTGGATTGTGCTCGTCGTACTCTCGTAC… AT …ATGTTGATGGTGCTCGTCGTACTCTCGTAC… SC …ATGTTGATTGTGCTCGTCGTACTCTCGTAC… SP …ATGTTGATT---CTCGTCGTACTCTCGTAC… ?
10
Missing data (proteasome component C3) 41 118 222 230 251 309 377 401 453 465 539 597 602 713 SC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SP 0 1 1 0 0 0 1 ? 0 0 0 0 0 0 CE 1 1 0 0 0 0 1 0 0 0 0 0 0 0 DM 1 1 0 0 0 0 1 0 0 0 1 0 0 0 HS 1 1 0 0 1 0 1 ? 0 1 1 1 0 0 AT 1 1 0 1 0 1 1 1 1 0 1 0 1 1
11
Bayesian Network
12
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
13
Probability structure descendant in state 0descendant in state 1 parent in state 0 parent in state 1 root prior probability: transition probability:for gene and branch of length branch-specific loss gene-specific loss branch- specific gain gene- specific gain
14
Rate variation across sites gain variation loss variation shape parameter (gain) fraction of invariant sites shape parameter (loss)
15
Parameter Summary Global parameters – probability for intron absence in the root – fraction of invariant sites – shape parameters of the gamma distribution Gene-specific parameters – gain rate – loss rate Branch-specific parameters – gain coefficient – loss coefficient
16
Homogeneous vs. Heterogeneous Evolution The number of parameters in the model number of extant species number of genes Homogeneous Evolution setting G = 1 Heterogeneous Evolution fixing global parameters and branch-specific parameters
17
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
18
Likelihood maximization via Expectation Maximization E-Step inward-outward recursions on the tree member in the junction-tree algorithms family missing data are naturally embedded
19
Inward (gamma) recursion ? ? ? ? ? ? q
20
Inward (gamma) recursion - Initialization
21
Inward (gamma) recursion - Recursion q
22
Outward (alpha) recursion
23
Likelihood maximization via EM E-Step inward-outward recursions on the tree member in the junction-tree algorithms family missing data are naturally embedded M-Step low-tolerance variable-by-variable maximization Newton-Raphson
24
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
25
Intron density in ancient eukaryotes 234 Gilbert 2005 Koonin 2003 Csuros 2005 Kenmochi 2005 Stolzfus 2004
26
Evolutionary Landscape loser gainer stable dynamic
27
Modes of Evolution
28
loser gainer stable dynamic
29
Outline 234 genes 295 genes 187 genes Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
30
Gene Characteristics New features of genes: Intron gain rate Intron loss rate Old features of genes: Expression level Evolutionary rate Lethality Connectivity in protein-protein interactions Connectivity in genetic interactions
31
Combined Features
33
Important genes gain introns StatusAdaptabilityreactivity Gain rate0.330.060.37 Loss rate0.05-0.030.07
34
Outline Background and Related Work Data Components The Model The Algorithm Results – Homogeneous Evolution Results – Heterogeneous Evolution Summary
35
Conclusions Disparate landscape – both gain and loss play role in intron evolution The common ancestor of the crown group had an intron content comparable to fungi, apicomlexans and dipterans Three modes of evolution – more than one mechanism? Important genes tend to gain introns
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.