4. HMMs for gene finding HMM Ability to model grammar

Slides:



Advertisements
Similar presentations
BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials.
Advertisements

Hidden Markov Models in Bioinformatics
Ab initio gene prediction Genome 559, Winter 2011.
Computational Gene Finding using HMMs
Hidden Markov Models CBB 231 / COMPSCI 261. An HMM is a following: An HMM is a stochastic machine M=(Q, , P t, P e ) consisting of the following: a finite.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models in Bioinformatics
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Gene prediction and HMM Computational Genomics 2005/6 Lecture 9b Slides taken from (and rapidly mixed) Larry Hunter, Tom Madej, William Stafford Noble,
Gene Finding Charles Yan.
CSE182-L10 Gene Finding.
Comparative ab initio prediction of gene structures using pair HMMs
How Are Genes Expressed? Chapter11. DNA codes for proteins, many of which are enzymes. Proteins (enzymes) can be used to make all the other molecules.
Eukaryotic Gene Finding
Eukaryotic Gene Finding
Biological Motivation Gene Finding in Eukaryotic Genomes
Hidden Markov Models In BioInformatics
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Applications of HMMs Yves Moreau Overview Profile HMMs Estimation Database search Alignment Gene finding Elements of gene prediction Prokaryotes.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
DNA, RNA & Proteins Transcription Translation Chapter 3, 15 & 16.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Bio 1010 Dr. Bonnie A. Bain. DNA Structure and Function Part 2.
© 2012 Pearson Education, Inc. Lecture by Edward J. Zalisko PowerPoint Lectures for Campbell Biology: Concepts & Connections, Seventh Edition Reece, Taylor,
Doug Raiford Lesson 3.  Have a fully sequenced genome  How identify the genes?  What do we know so far? 10/13/20152Gene Prediction.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
What is the job of p53? What does a cell need to build p53? Or any other protein?
Chapter 13. The Central Dogma of Biology: RNA Structure: 1. It is a nucleic acid. 2. It is made of monomers called nucleotides 3. There are two differences.
DNA to Protein – 12 Part one AP Biology. What is a Gene? A gene is a sequence of DNA that contains the information or the code for a protein or an RNA.
Review of Protein Synthesis. Fig TRANSCRIPTION TRANSLATION DNA mRNA Ribosome Polypeptide (a) Bacterial cell Nuclear envelope TRANSCRIPTION RNA PROCESSING.
P ROTEIN SYNTHESIS. The base sequence of DNA codes for the amino acids that make up a protein (one gene codes for one polypeptide).
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
From Genomes to Genes Rui Alves.
Chapter 17 From Gene to Protein. 2 DNA contains the genes that make us who we are. The characteristics we have are the result of the proteins our cells.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Chapter 3 The Interrupted Gene.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Finding genes in the genome
Exercise 3 Inspecting the primary structure of a gene.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
ECFG for G ene Identification using DART Yuri Bendana, Sharon Chao, Karsten Temme.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Eukaryotic Gene Regulation
Ch. 11: DNA Replication, Transcription, & Translation Mrs. Geist Biology, Fall Swansboro High School.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Genome Annotation (protein coding genes)
”Gene Finding in Eukaryotic Genomes”
Eukaryotic Gene Structure
What is a Hidden Markov Model?
Chapter 5 RNA and Transcription
Protein Synthesis.
Eukaryotic Gene Finding
Ab initio gene prediction
The triplet code Starter A DNA molecule is 23% guanine.
Recitation 7 2/4/09 PSSMs+Gene finding
What are the Patterns Of Nucleotide Substitution Within Coding and
Reading Frames and ORF’s
Presentation transcript:

4. HMMs for gene finding HMM Ability to model grammar Biological sequence problems: grammatical structure Here, eukaryotic gene structure Exon/intron: word Constraints on gene structure Sentence: never end with an intron Exon can never follow an exon Simply the grammar Formal language theory Applied to biological problems David Searls: gene finding

4. HMMs for gene finding Figure 8. Represent only the simplest of grammars: regular grammar Good enough for the gene finding problem RNA folding problem: needs more complicated grammar Figure 8.

4.1 Signal sensors Apply HMM to many of signals in a gene structure Acceptor/donor sites Regions around the start/stop codons Figure 9. HMM: 19 states, equivalent to a weight matrix

4.1 Signal sensors Dinucleotide preferences in DNA 16 probability parameters in each state instead of 4: conditional probability First order states

4.2 Coding regions Codon structure: important feature of coding regions Figure 10: model of bases in triplets

4.2 Coding regions Codon model Modeling codon statistics Last state: order two, 64 probabilities Lack of stop codons: p(A|TA), p(G|TA), p(A|TG) = 0 Modeling codon statistics First state: 0 order Second state: 1 order Higher order states: dependencies between neighboring codons

4.3 Combining the models Discover genes Figure 11. Combine the models in a way that satisfies the grammar of genes Figure 11. A state for intergenic regions A model for the region around the start codon = acceptor model The model for the coding region A model for the region around the stop codon One big HMM

4.3 Combining the models Gene prediction in a sequence of DNA Viterbi algorithm: find the most probable path through the model Predict sensible genes that obey the grammatical rules Minimum requirements for unspliced gene candidates Gene will always start with a start codon and end with a stop codon The length will always be divisible by 3 Never contain stop codons in the reading frame Splicing model Splicing can happen in three different reading frames Reading frame in one exon has to fit the one in the next Using three different models of introns, one for each frame

4.3 Combining the models Figure 12. Many possible variations of the model more states to the signal sensors models of promoter elements and untranslated regions of the gene