Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prokaryotic Gene Structure Note that the ATG codon encodes both start and methionine Prokaryotic genes have a simple one-dimensional structure 5’→ ← 5’

Similar presentations


Presentation on theme: "Prokaryotic Gene Structure Note that the ATG codon encodes both start and methionine Prokaryotic genes have a simple one-dimensional structure 5’→ ← 5’"— Presentation transcript:

1 Prokaryotic Gene Structure Note that the ATG codon encodes both start and methionine Prokaryotic genes have a simple one-dimensional structure 5’→ ← 5’ 5’→ ← 5’ Start codon Stop codon ATGTAGATGAAAGCA TTGCTA...

2 Prokaryotic Gene Structure An ORF finder needs to be able to find overlapping ORFs, whether they end with the same stop codon, or overlap in a different frame Prokaryotic gene prediction begins with ORF finding Stop codon TAGGCATTGCTA... Possible start ATG AAAGCA Alternate start Because of the possibility of alternate start sites, it’s not unusual for several ORFs to share a common stop codon

3 Prokaryotic Gene Structure Note that many bacteria also employ rarer alternate start codons, most commonly GTG and TTG. But we’ll pretend this doesn’t happen! Prokaryotic gene prediction begins with ORF finding A regular expression crafted to find ORFs must also exhibit “non-greedy” behaviour 'ATG(...)*?(TAA|TAG|TGA)'

4 Higher Order Markov Chains We don’t need to always just consider the most recent state So far all the Markov models we have seen so far have been of order 1 An n th order Markov process is a stochastic process where the probabilities associated with an event depend on the previous n events in the state path In the case of a first order process this statement reduces to our statement of the Markov property

5 Higher Order Markov Chains Higher order models have an equivalent first order model An n th order Markov chain over alphabet A is equivalent to a first order Markov chain over the alphabet A n of n -tuples… This follows trivially from P(X,Y|Y) = P(X|Y) Practically, this says we can implement a higher order model just by expanding the alphabet size of a first order model

6 Consider this first order Markov process How would we convert this to a second order Markov process? A S  B Higher Order Markov Chains Here the alphabet A (our set of states) consists of just A and B A = {A, B}

7 Note how we have disallowed certain transitions (i.e. set their probability to zero). Start and End omitted for clarity Higher Order Markov Chains Now reconfigured as a second order model AA AB BBBA A = {AA, AB, BA, BB}

8 This can be accomplished in a few different ways… Inhomogeneous Markov Chains A Markov model of genes should model codon statistics ATGGTCAAAGCA In true coding genes, each of the three positions within a codon will be statistically distinct 1 2 3

9 This can also be recast as an HMM with additional states in obvious way Inhomogeneous Markov Chains A Markov model of genes should model codon statistics ATGGTCAAAGCA One idea is to intersperse three different Markov chains in alternating fashion

10 Histograms with matplot lib More histogram examples may be found at: matplotlib.org/examples/pylab_examples/histogram_demo_extended.html We’ll use this to look at log-odds per NT distributions import numpy as np import pylab as P... # probs here should be your list of probabilities # 50 here corresponds to the number of desired n, bins, patches = P.hist(probs, 50, normed=1, histtype='stepfilled') P.setp(patches, 'facecolor', 'g', 'alpha', 0.75) P.show()


Download ppt "Prokaryotic Gene Structure Note that the ATG codon encodes both start and methionine Prokaryotic genes have a simple one-dimensional structure 5’→ ← 5’"

Similar presentations


Ads by Google