Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prokaryotic Gene Structure

Similar presentations


Presentation on theme: "Prokaryotic Gene Structure"— Presentation transcript:

1 Prokaryotic Gene Structure
Prokaryotic genes have a simple one-dimensional structure 5’→ 5’→ ←5’ ←5’ ATG AAA ATG GCA . . . GCA TTG CTA TAG Start codon Stop codon Note that the ATG codon encodes both start and methionine

2 Prokaryotic Gene Structure
Prokaryotic gene prediction begins with ORF finding Possible start ATG AAA GCA Alternate start . . . GCA TTG CTA TAG Stop codon Because of the possibility of alternate start sites, it’s not unusual for several ORFs to share a common stop codon An ORF finder needs to be able to find overlapping ORFs, whether they end with the same stop codon, or overlap in a different frame

3 Prokaryotic Gene Structure
Prokaryotic gene prediction begins with ORF finding 'ATG(...)*?(TAA|TAG|TGA)' A regular expression crafted to find ORFs must also exhibit “non-greedy” behaviour Note that many bacteria also employ rarer alternate start codons, most commonly GTG and TTG. But we’ll pretend this doesn’t happen!

4 Higher Order Markov Chains
We don’t need to always just consider the most recent state An nth order Markov process is a stochastic process where the probabilities associated with an event depend on the previous n events in the state path 𝑷 𝒙 𝒊 𝒙 𝒊−𝟏 , 𝒙 𝒊−𝟐 ,…, 𝒙 𝟏 =𝑷( 𝒙 𝒊 | 𝒙 𝒊−𝟏 ,…, 𝒙 𝒊−𝒏 ) So far all the Markov models we have seen so far have been of order 1 In the case of a first order process this statement reduces to our statement of the Markov property

5 𝑷( 𝒙 𝒊 | 𝒙 𝒊−𝟏 ,…, 𝒙 𝒊−𝒏 ) = 𝑷 𝒙 𝒊 , 𝒙 𝒊−𝟏 ,…, 𝒙 𝒊−𝒏+𝟏 𝒙 𝒊−𝟏 ,…, 𝒙 𝒊−𝒏
Higher Order Markov Chains Higher order models have an equivalent first order model An nth order Markov chain over alphabet A is equivalent to a first order Markov chain over the alphabet An of n-tuples… 𝑷( 𝒙 𝒊 | 𝒙 𝒊−𝟏 ,…, 𝒙 𝒊−𝒏 ) = 𝑷 𝒙 𝒊 , 𝒙 𝒊−𝟏 ,…, 𝒙 𝒊−𝒏+𝟏 𝒙 𝒊−𝟏 ,…, 𝒙 𝒊−𝒏 Practically, this says we can implement a higher order model just by expanding the alphabet size of a first order model This follows trivially from P(X,Y|Y) = P(X|Y)

6 Higher Order Markov Chains
Consider this first order Markov process A = {A, B} A S e B Here the alphabet A (our set of states) consists of just A and B How would we convert this to a second order Markov process?

7 Higher Order Markov Chains Now reconfigured as a second order model
AA AB BB BA A = {AA, AB, BA, BB} Note how we have disallowed certain transitions (i.e. set their probability to zero). Start and End omitted for clarity

8 1 2 3 Inhomogeneous Markov Chains ATG GTC AAA GCA
A Markov model of genes should model codon statistics ATG GTC AAA GCA In true coding genes, each of the three positions within a codon will be statistically distinct This can be accomplished in a few different ways…

9 1 2 3 Inhomogeneous Markov Chains
A Markov model of genes should model codon statistics ATG GTC AAA GCA 𝒂 𝒙 𝟏 𝒙 𝟐 𝟏 𝒂 𝒙 𝟐 𝒙 𝟑 𝟐 𝒂 𝒙 𝟑 𝒙 𝟒 𝟑 𝒂 𝒙 𝟒 𝒙 𝟓 𝟏 𝒂 𝒙 𝟓 𝒙 𝟔 𝟐 𝒂 𝒙 𝟔 𝒙 𝟕 𝟑 One idea is to intersperse three different Markov chains in alternating fashion This can also be recast as an HMM with additional states in obvious way

10 Histograms with matplot lib
We’ll use this to look at log-odds per NT distributions import numpy as np import pylab as P . . . # probs here should be your list of probabilities # 50 here corresponds to the number of desired n, bins, patches = P.hist(probs, 50, normed=1, histtype='stepfilled') P.setp(patches, 'facecolor', 'g', 'alpha', 0.75) P.show() More histogram examples may be found at: matplotlib.org/examples/pylab_examples/histogram_demo_extended.html


Download ppt "Prokaryotic Gene Structure"

Similar presentations


Ads by Google