1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008.

Slides:



Advertisements
Similar presentations
Ab initio gene prediction Genome 559, Winter 2011.
Advertisements

Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models in Bioinformatics
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
RNA and Protein Synthesis
CISC667, F05, Lec18, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation.
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Gene Finding. Finding Genes Prokaryotes –Genome under 10Mb –>85% of sequence codes for proteins Eukaryotes –Large Genomes (up to 10Gb) –1-3% coding for.
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
Gene Expression Overview
From Gene to Protein. Genes code for... Proteins RNAs.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Lecture 12 Splicing and gene prediction in eukaryotes
2.7 DNA Replication, transcription and translation
PROTEIN SYNTHESIS.
Biological Motivation Gene Finding in Eukaryotic Genomes
Gene Structure: DNA RNA Protein Dr. Jason Tasch. Nucleic Acids Sequence of Nucleotides Nucleotide composed of: –Nitrogenous Base Purine Pyrimidine –Sugar.
FROM GENE TO PROTEIN: TRANSCRIPTION & RNA PROCESSING Chapter 17.
Hidden Markov Models In BioInformatics
DNA Biology Lab 11. Nucleic Acids  DNA and RNA both built of nucleotides containing Sugar (deoxyribose or ribose) Nitrogenous base (ATCG or AUCG) Phosphate.
Doug Raiford Lesson 3.  Have a fully sequenced genome  How identify the genes?  What do we know so far? 10/13/20152Gene Prediction.
RNA and Protein Synthesis
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
1 Genes and How They Work Chapter Outline Cells Use RNA to Make Protein Gene Expression Genetic Code Transcription Translation Spliced Genes – Introns.
Chapter 21 Eukaryotic Genome Sequences
A markovian approach for the analysis of the gene structure C. MelodeLima 1, L. Guéguen 1, C. Gautier 1 and D. Piau 2 1 Biométrie et Biologie Evolutive.
DNA TO RNA Transcription is the process of creating a molecule that can carry the genetic blueprint for a particular protein coding gene from the DNA.
Peptide Bond Formation Walk the Dogma RECALL: The 4 types of organic molecules… CARBOHYDRATES LIPIDS PROTEINS (amino acid chains) NUCLEIC ACIDS (DNA.
 The type of RNA that carriers the genetic information/message from DNA and coveys it to ribosomes where the information is translated into amino acid.
Genes and How They Work Chapter The Nature of Genes information flows in one direction: DNA (gene)RNAprotein TranscriptionTranslation.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Genes – Coding and Flanking Genes are made up of different regions: –Coding region – part that contains information for producing the protein –Flanking.
Protein Synthesis. Learning Objectives By the end of this class you should understand: The purpose and mechanism of codons The two steps of protein synthesis.
Markov Chain Models BMI/CS 576 Colin Dewey Fall 2015.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
GENOME: an organism’s complete set of genetic material In humans, ~3 billion base pairs CHROMOSOME: Part of the genome; structure that holds tightly wound.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
(H)MMs in gene prediction and similarity searches.
PROTEIN SYNTHESIS DECEMBER 13, 2010 CAPE BIOLOGY UNIT 1 MRS. HAUGHTON.
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
RNA, Transcription, and the Genetic Code. RNA = ribonucleic acid -Nucleic acid similar to DNA but with several differences DNARNA Number of strands21.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
Higher Human Biology Unit 1 Human Cells KEY AREA 3: Gene Expression.
CH 12.3 RNA & Protein Synthesis. Genes are coded DNA instructions that control the production of proteins within the cell…
12-3 RNA and Protein Synthesis Page 300. A. Introduction 1. Chromosomes are a threadlike structure of nucleic acids and protein found in the nucleus of.
Chapter – 10 Part II Molecular Biology of the Gene - Genetic Transcription and Translation.
SC.912.L.16.3 DNA Replication. – During DNA replication, a double-stranded DNA molecule divides into two single strands. New nucleotides bond to each.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Genome Annotation (protein coding genes)
Eukaryotic Gene Structure
A Quest for Genes What’s a gene? gene (jēn) n.
Gene Structure: DNA RNA Protein
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
Pharmacogenetics and Pharmacoepidemiology
RNA and Protein Synthesis
RNA and Protein Synthesis
Transcription and Translation
Ab initio gene prediction
Transcription & Translation.
Pharmacogenetics and Pharmacoepidemiology
How genes on a chromosome determine what proteins to make
Protein Synthesis.
RNA and Protein Synthesis
Protein Synthesis.
DNA Transcription and Translation
Gene Structure: DNA RNA Protein
Presentation transcript:

1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008

2 Outline: 1. Markov Chain 2. DNA and Modeling 3. Markovian Models for DNA Sequences 4. Hidden Markov Models (HMM) 5. HMM for DNA Sequences 6. Future Works 7. References

3 1.Markov Chain : Alphabet: are called states, and S is the state space Notation > Sequence of random variables: A sequence of random variables is called a Markov Chain, (MC), if for all n>=1 and The conditional probability of a future event depends only upon the immediate past event

4 1.Markov Chain (cont.) Conditional Probability: Transition Matrix Property: Higher-Order Markov Chains: – Second order MC:

5 2.DNA and Modeling: Bases: {A,T,C,G} Complementary strands > sequence of bases in a single strand Sequences are always read from 5’ to 3’ end. DNA  mRNA  proteins (transcription and translation) Codons: Triples of bases which code for amino acids ‘stop’ codons Specific sequence of codons  gene  Chromosomes  genome exons: coding portion of genes introns: non-coding regions Goal: To determine the nucleotide sequence of entire genomes

6 3.Markov Chains for DNA Sequences Nucleotides are chained linearly one by one  local dependence between the bases and their neighbors Markov chains offer computationally effective ways of expressing the various frequencies and local dependencies Alphabet of bases = {A,T,C,G}  not uniformly distributed in any sequence and the composition vary within and between sequences The probability of finding a particular base at one position can depend not only on the immediate adjacent bases, but also on several more distant bases upstream or downstream  higher order Markov model, (heterogeneous) Gene finding: Markov models of coding and non-coding regions to classify segments as either exons or introns. Segmentation for decomposing DNA sequences into homogeneous regions  Hidden Markov Models

7 4.Hidden Markov Models (HMM) Stochastic process generated by two interrelated probabilistic mechanisms Underlying Markov chain with a finite number of states and a set of random functions, each associated with its respective state Changing the states: according to transition matrix Only the output of the random functions can be seen Advantage: HMM allow for local characteristics of molecular sequences to be modeled and predicted within a rigorous statistical framework, and also allow the knowledge from prior investigations to be incorporated into analysis.

8 5.HMM for DNA Sequences Every nucleotide in a DNA belongs to either a “Normal” region (N), or a GC-rich region (R). No random distribution: Larger regions of (N) sequence Example of such a sequence: NNNNNNNNNRRRRRNNNNNNNNNNNNNNNNNRRRRRRRNNNN States of HMM: {N,R} Possible DNA sequence with this underlying collection: TTACTTGACGCCAGAAATCTATATTTGGTAACCCGACGCTAA No typical random collection of nucleotides: GC in R regions: 83% vs. 23% in N regions HMM: Identify these types of feature in sequences Ability to capture both the patchiness of N and R and different compositional frequencies within the categories

9 6.Future work… Better and deeper understanding of HMM Different applications of HMM, such as, Segmentation of DNA Sequence and Gene Finding Build an automata for a simple case 7.References Koski, Timo. Hidden Markov Models for Bioinformatics. Sweden: Kluwer Academic, Birney, E.. "Hidden Markov models in biological sequence analysis". July 2001: Haussler, David. David Kulp, Martin Reese Frank Eeckman "A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA". Boufounos, Petros, Sameh El-Difrawy, Dan Ehrlich. "HIDDEN MARKOV MODELS FOR DNA SEQUENCING".