Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G.

Similar presentations


Presentation on theme: "Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G."— Presentation transcript:

1 Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G

2 In Vitro Approaches Transcription Translation Linked Site Specific Mutagenesis Promoter Fusions

3 Runoff Protocol and Controls

4 Ribosome Binding Sequences

5 In vitro translation

6 96-well (high througput) translation

7 Linked In Vitro Transcription- Translation

8 APC Protein Truncation Test

9 In Vivo Approaches Prokaryotic expression –E. coli maxicells –E. coli minicells Metazoan expression –Yeast Overexpression –Baculovirus Expression for Rapid Analysis –X. laevis oocytes Expression in Mammalian Cells –Transient Transfection –Stable Transfection –ES Cells –Transgenic Mice Knock in Knock out

10 Background Definitions Working Draft – A working draft sequence has come to mean a genomic sequence before it is finished. Working draft sequences contain multiple gaps, underrepresented areas and misassemblies. In addition, the error rate of working draft sequence is higher than the 1 in 10,000 error rate that is standard for finished sequences. FASTA file – A common file format used for the storage and tranfer of sequence data. It contains raw DNA or protein sequence, but no annotation information.

11 SENSORS An algorithm specialized to identify a feature of a sequence, such as a possible splice site.

12 Neural Network Neural networks are analytical techniques modeled after the (proposed) processes of learning in cognitive systems and the neurological functions of the brain. Neural networks use a data ‘training set’ to build rules that can make predictions or classifications on data sets.

13 Rule-Based System A type of computer algorithm that uses an explicit set of rules to make decisions.

14 Hidden Markov Model A type of computer algorithm that represents a system as a set of discrete states and transitions between those states. Each transition has an associated probability. Markov models are ‘hidden’ when one or more of the states cannot be directly observed.

15 AB INITIO GENE PREDICTION A class of software that attempts to predict genes from sequence data without the use of prior knowledge about similarities to other genes.

16 In Silico Approaches Sensors –Single Feature Predictors HEXON http://searchlauncher.bcm.tmc.edu:9331/gene-finder/Help/hexon.html http://searchlauncher.bcm.tmc.edu:9331/gene-finder/Help/hexon.html MZEF http://sciclio.cshl.org/genefinder/ http://sciclio.cshl.org/genefinder/ Neural Networks –GRAIL http://compbio.ornl.gov/Grail-1.3/ http://compbio.ornl.gov/Grail-1.3/ Rule Based Systems –GeneFinder under construction by phg@u.washington.edu phg@u.washington.edu Hidden Markov Models –GenScan http://genes.mit.edu/GENSCAN.html http://genes.mit.edu/GENSCAN.html –Genie http://www.fruitfly.org/seq_tools/genie.html http://www.fruitfly.org/seq_tools/genie.html –Fgenes http://searchlauncher.bcm.tmc.edu:9331/gene-finder/Help/fgenes.html http://searchlauncher.bcm.tmc.edu:9331/gene-finder/Help/fgenes.html –GeneMark.hmm http://genemark.biology.gatech.edu/GeneMark/ http://genemark.biology.gatech.edu/GeneMark/ –HMMGene http://www.cbs.dtu.dk/services/HMMgene/ http://www.cbs.dtu.dk/services/HMMgene/

17 Ab Initio Methods Comparative Genomics dbEST BLASTX TAP and PASS

18 Evaluation of In Silico Approaches

19 Scheme for an Ab Initio Approach

20 Diagrammatic Evaluation of an In Silico Approach

21 Hidden Markov Model A hidden Markov model explicitly models the probabilities for the transition from one part of a gene to another. In this model, used by the GENSCAN algorithm, each circle or diamond represents a functional unit in the gene. For example Eint is the initial exon and Eterm is the last. The arrows represent the probability of a transition from one part of a gene to another. The algorithm is ‘trained’ by running a set of known genes through the model and adjusting the weights of each transition to reflect realistic transition probabilities. Thereafter, test sequence data can be run through the model one base position at a time, and the model will read out the probability of a gene being present at that position. The states that occur below the dashed line correspond to a gene in the reversed strand, and thus are symmetric with those abovethe line. E, exon, I, intron, UTR untranslated region, pro, promoter.

22 Evaluating Ab Initio Gene Predictions


Download ppt "Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G."

Similar presentations


Ads by Google