Presentation is loading. Please wait.

Presentation is loading. Please wait.

4. HMMs for gene finding HMM Ability to model grammar

Similar presentations


Presentation on theme: "4. HMMs for gene finding HMM Ability to model grammar"— Presentation transcript:

1 4. HMMs for gene finding HMM Ability to model grammar
Biological sequence problems: grammatical structure Here, eukaryotic gene structure Exon/intron: word Constraints on gene structure Sentence: never end with an intron Exon can never follow an exon Simply the grammar Formal language theory Applied to biological problems David Searls: gene finding

2 4. HMMs for gene finding Figure 8.
Represent only the simplest of grammars: regular grammar Good enough for the gene finding problem RNA folding problem: needs more complicated grammar Figure 8.

3 4.1 Signal sensors Apply HMM to many of signals in a gene structure
Acceptor/donor sites Regions around the start/stop codons Figure 9. HMM: 19 states, equivalent to a weight matrix

4 4.1 Signal sensors Dinucleotide preferences in DNA
16 probability parameters in each state instead of 4: conditional probability First order states

5 4.2 Coding regions Codon structure: important feature of coding regions Figure 10: model of bases in triplets

6 4.2 Coding regions Codon model Modeling codon statistics
Last state: order two, 64 probabilities Lack of stop codons: p(A|TA), p(G|TA), p(A|TG) = 0 Modeling codon statistics First state: 0 order Second state: 1 order Higher order states: dependencies between neighboring codons

7 4.3 Combining the models Discover genes Figure 11.
Combine the models in a way that satisfies the grammar of genes Figure 11. A state for intergenic regions A model for the region around the start codon = acceptor model The model for the coding region A model for the region around the stop codon One big HMM

8 4.3 Combining the models Gene prediction in a sequence of DNA
Viterbi algorithm: find the most probable path through the model Predict sensible genes that obey the grammatical rules Minimum requirements for unspliced gene candidates Gene will always start with a start codon and end with a stop codon The length will always be divisible by 3 Never contain stop codons in the reading frame Splicing model Splicing can happen in three different reading frames Reading frame in one exon has to fit the one in the next Using three different models of introns, one for each frame

9 4.3 Combining the models Figure 12.
Many possible variations of the model more states to the signal sensors models of promoter elements and untranslated regions of the gene


Download ppt "4. HMMs for gene finding HMM Ability to model grammar"

Similar presentations


Ads by Google