Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G.

Slides:



Advertisements
Similar presentations
GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Advertisements

BIOINFORMATICS GENE DISCOVERY BIOINFORMATICS AND GENE DISCOVERY Iosif Vaisman 1998 UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL Bioinformatics Tutorials.
Ch 17 Gene Expression I: Transcription
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Bioinformatics lectures at Rice University
Ab initio gene prediction Genome 559, Winter 2011.
Ka-Lok Ng Dept. of Bioinformatics Asia University
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
CISC667, F05, Lec18, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Gene Prediction and Regulation.
Gene prediction and HMM Computational Genomics 2005/6 Lecture 9b Slides taken from (and rapidly mixed) Larry Hunter, Tom Madej, William Stafford Noble,
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Gene Identification Lab
Comparative ab initio prediction of gene structures using pair HMMs
Eukaryotic Gene Finding
The Human Genome Project Public: International Human Genome Sequencing Consortium (aka HUGO) Private: Celera Genomics, Inc. (aka TIGR)
Eukaryotic Gene Finding
Biological Motivation Gene Finding in Eukaryotic Genomes
Hidden Markov Models In BioInformatics
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Comparative Genomics & Annotation The Foundation of Comparative Genomics The main methodological tasks of CG Annotation: Protein Gene Finding RNA Structure.
Activate Prior Knowledge
Protein Synthesis 12-3.
Mutation And Natural Selection how genomes record a history of mutations and their effects on survival Tina Hubler, Ph.D., University of North Alabama,
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Gene finding and gene structure prediction M. Fatih BÜYÜKAKÇALI Computational Bioinformatics 2012.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Chapter 11: Functional genomics
Gene Expression. Remember, every cell in your body contains the exact same DNA… …so why does a muscle cell have different structure and function than.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, , 10.4,
Applied Bioinformatics
Exam #1 is T 2/17 in class (bring cheat sheet). Protein DNA is used to produce RNA and/or proteins, but not all genes are expressed at the same time or.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
(H)MMs in gene prediction and similarity searches.
GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences Authors: Michael M. Yin and Jason T. L. Wang Sources: Information.
Finding genes in the genome
Annotation of eukaryotic genomes
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
From Genes to Genomes: Concepts and Applications of DNA Technology, Jeremy W. Dale, Malcolm von Schantz and Nick Plant. © 2012 John Wiley & Sons, Ltd.
Definitions of Annotation Interpreting raw sequence data into useful biological information Information attached to genomic coordinates with start and.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Unit 1: DNA and the Genome Structure and function of RNA.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Topics to be covers Basic features present on plasmids
EGASP 2005 Evaluation Protocol
What is a Hidden Markov Model?
Bioinformatics lectures at Rice University
Which of the following would be the corresponding amino acid sequence that would be translated as a protein product of the following segment of DNA? A.
EGASP 2005 Evaluation Protocol
Transcription Translation
Exam #1 is T 9/23 in class (bring cheat sheet).
Genes, Genomes, and Genomics
Eukaryotic Gene Finding
Ab initio gene prediction
DNA and the Genome Key Area 3b Transcription.
Gene Structure and Identification
Gene Expression Activation of a gene to transcribe DNA into RNA.
Presentation transcript:

Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G

In Vitro Approaches Transcription Translation Linked Site Specific Mutagenesis Promoter Fusions

Runoff Protocol and Controls

Ribosome Binding Sequences

In vitro translation

96-well (high througput) translation

Linked In Vitro Transcription- Translation

APC Protein Truncation Test

In Vivo Approaches Prokaryotic expression –E. coli maxicells –E. coli minicells Metazoan expression –Yeast Overexpression –Baculovirus Expression for Rapid Analysis –X. laevis oocytes Expression in Mammalian Cells –Transient Transfection –Stable Transfection –ES Cells –Transgenic Mice Knock in Knock out

Background Definitions Working Draft – A working draft sequence has come to mean a genomic sequence before it is finished. Working draft sequences contain multiple gaps, underrepresented areas and misassemblies. In addition, the error rate of working draft sequence is higher than the 1 in 10,000 error rate that is standard for finished sequences. FASTA file – A common file format used for the storage and tranfer of sequence data. It contains raw DNA or protein sequence, but no annotation information.

SENSORS An algorithm specialized to identify a feature of a sequence, such as a possible splice site.

Neural Network Neural networks are analytical techniques modeled after the (proposed) processes of learning in cognitive systems and the neurological functions of the brain. Neural networks use a data ‘training set’ to build rules that can make predictions or classifications on data sets.

Rule-Based System A type of computer algorithm that uses an explicit set of rules to make decisions.

Hidden Markov Model A type of computer algorithm that represents a system as a set of discrete states and transitions between those states. Each transition has an associated probability. Markov models are ‘hidden’ when one or more of the states cannot be directly observed.

AB INITIO GENE PREDICTION A class of software that attempts to predict genes from sequence data without the use of prior knowledge about similarities to other genes.

In Silico Approaches Sensors –Single Feature Predictors HEXON MZEF Neural Networks –GRAIL Rule Based Systems –GeneFinder under construction by Hidden Markov Models –GenScan –Genie –Fgenes –GeneMark.hmm –HMMGene

Ab Initio Methods Comparative Genomics dbEST BLASTX TAP and PASS

Evaluation of In Silico Approaches

Scheme for an Ab Initio Approach

Diagrammatic Evaluation of an In Silico Approach

Hidden Markov Model A hidden Markov model explicitly models the probabilities for the transition from one part of a gene to another. In this model, used by the GENSCAN algorithm, each circle or diamond represents a functional unit in the gene. For example Eint is the initial exon and Eterm is the last. The arrows represent the probability of a transition from one part of a gene to another. The algorithm is ‘trained’ by running a set of known genes through the model and adjusting the weights of each transition to reflect realistic transition probabilities. Thereafter, test sequence data can be run through the model one base position at a time, and the model will read out the probability of a gene being present at that position. The states that occur below the dashed line correspond to a gene in the reversed strand, and thus are symmetric with those abovethe line. E, exon, I, intron, UTR untranslated region, pro, promoter.

Evaluating Ab Initio Gene Predictions