Predicting Genes in Mycobacteriophages December 8, 2014 2014 In Silico Workshop Training D. Jacobs-Sera.

Slides:



Advertisements
Similar presentations
Application to find Eukaryotic Open reading frames. Lab.
Advertisements

An Introduction to Bioinformatics Finding genes in prokaryotes.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Bioinformatics. Bioinformatics is an applied science that uses computer programs to access molecular biology databanks to make inferences about the information.
Finding Eukaryotic Open reading frames.
Gene Identification Lab
Computational Biology, Part 4 Protein Coding Regions Robert F. Murphy Copyright  All rights reserved.
Single DNA Sequence Analysis Tools BME 110: CompBio Tools Todd Lowe May 6, 2008.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Finding prokaryotic genes and non intronic eukaryotic genes
Annotation Presentation Alternative Start Codons &
Gene Structure and Identification
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Gene Prediction in silico Nita Parekh BIRC, IIIT, Hyderabad.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Cleaning Genomes: So easy - even a program head can do it Igor Bogorad.
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
Doug Raiford Lesson 3.  Have a fully sequenced genome  How identify the genes?  What do we know so far? 10/13/20152Gene Prediction.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
BME 110L / BIOL 181L Computational Biology Tools February 19: In-class exercise: a phylogenetic tree for that.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Organizing information in the post-genomic era The rise of bioinformatics.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop January 31, 2012.
Advancing Science with DNA Sequence Finding the genes in microbial genomes Natalia Ivanova MGM Workshop May 15, 2012.
From Genomes to Genes Rui Alves.
Interpolated Markov Models for Gene Finding BMI/CS 776 Mark Craven February 2002.
DNA and Translation Gene: section of DNA that creates a specific protein Approx 25,000 human genes Proteins are used to build cells and tissue Protein.
Amino acids are coded by mRNA base sequences.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
How can we find genes? Search for them Look them up.
ORF Calling. Why? Need to know protein sequence Protein sequence is usually what does the work Functional studies Crystallography Proteomics Similarity.
(H)MMs in gene prediction and similarity searches.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
TRANSLATION. Cytoplasm Nucleus DNA Transcription RNA Translation Protein.
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
Gene prediction in metagenomic fragments: A large scale machine learning approach Katharina J Hoff, Maike Tech, Thomas Lingner, Rolf Daniel, Burkhard Morgenstern.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Bacterial infection by lytic virus
ORF Calling.
bacteria and eukaryotes
Bacterial infection by lytic virus
A Quest for Genes What’s a gene? gene (jēn) n.
Isolation and characterization of the A3 bacteriophage Kady from the host Mycobacterium smegmatis John Sherwood1, Victoria Torres1, Jasmina Cunmulaj1,
SEA-PHAGES Bioinformatics Workshop Overview
Step 3 in Protein Synthesis
Comparison of Cluster S Phages
Amino acids are coded by mRNA base sequences.
Amino acids are coded by mRNA base sequences.
Amino acids are coded by mRNA base sequences.
Gene architecture and sequence annotation
Outline What is an amino acid / protein
More on translation.
Predicting Genes in Actinobacteriophages
Cuong Nguyen, Deng Xin, Dongmei, Zheng Wang
Isolation and Annotation of Arthrobacteriophage
What do you with a whole genome sequence?
Python.
_____ _____ are _____ by _____ ____ sequences.
Microbial gene identification using interpolated Markov models
Amino acids are coded by mRNA base sequences.
Figure 1a. Insertion of sequence into Claudi capsid gene
Welkin Pope SEA-PHAGES Bioinformatics Workshop, 2017
Presentation transcript:

Predicting Genes in Mycobacteriophages December 8, In Silico Workshop Training D. Jacobs-Sera

Since the beginning of time, woman (being human) has tried to make order and sense out of her surroundings. Gene annotation and analysis is just a primal instinct to make order. Young children, as they prepare to enter school, are tested to see if they are ready by recognizing patterns, a form of making order. 1. Where will the dot appear in the 4 th box? Remember, everything you need to know, you learned in kindergarten…. It is all about finding the patterns…

Remember, you are working in the putative gene world. All gene predictions are made with the best evidence to date. Most of that evidence is computational (bioinformatic), not experimental. Tomorrow’s data may give us better evidence, but your prediction today is the best it can be … today! Make good predictions following a consistent approach. Let these predictions lead to experimentation that can provide the evidence to improve future predictions. Make-Believe or Putative

How many ATCGS are in a typical mycobacteriophage genome? On average 70,000 base-pairs Range 40,000 to 165,000 bps What is the universal format for a sequence? FASTA

How many bacteriophage genome sequences are in GenBank? How many mycobacteriophage genomes are sequenced? How many mycobacteriophage genomes are published? Tricky Question Number in GenBank: 422 Number announced: ~301 Number in an additional publication: pending!

How many ATCGS are in a typical mycobacteriophage genome? On average 70,000 base-pairs Range 40,000 to 165,000 bps What is the universal format for a sequence? FASTA

How do you make sense of the ATCGs? Convert to genes How do you convert ATCGs to Genes? Codons Code for Amino Acids, Starts, Stops

Phages use the Bacterial Plastic code (NCBI: Table 11) 3 starts o ATG (methionine) o GTG (valine) o TTG (leucine) 3 stops (TAA, TAG, TGA) Space in-between: Open Reading Frame -- ORF

ATGGACCTCTCGCCC TGG ACC TCT CGC …. GGA CCT CTC GCC …. If there are 3 choices (frames) in the forward direction, how many are in the reverse direction?

Six Frame Translations

Glimmer and GeneMark Use Hidden Markov Models to identify coding potential Use a sample of the genome Identify longest ORFS in that sample Calculate patterns in the nucleotides: 2 at a time, 4 at a time Concept: Each organism has a codon usage ‘preference’. Bottom line: Codon usage is always skewed.

Codon Usage

Gene Evaluations We use 2 programs, Glimmer and GeneMark, to identify coding potential. We use Phamerator output for a visual representation of gene and nucleotide similarity As we evaluate, we can: – Add a gene – Delete a gene – Change a gene start We are always looking for the supporting data?

Other features found in Mycobacteriophage genomes tRNAs ✓ tmRNAs AttP sites Terminators Frame shifts ✓ …

GLIMMER

GeneMark Output (trained on M. tuberculosis)

p

Comparisons with what we already know Phamerator comparisons BLAST comparisons At NCBI At phagesDB

Phamerator map

Blast Comparisons

Things to do often: Save.dnam5 file often Save.dnam5 file as a new name. (Then don’t save the old named one.)

SEA-PHAGES In-Silico Workshop December 8, 2014 Getting Started

Let’s get started! 1.Gather Data 2.Basic DNA Master functions 3.Gene Assignments 4.Functional Assignments

Annotation of Sheen Found in Fort Kent, ME by Devon Cote & Zach Daigle Genome Length: Defined physical ends, 10 bp overhang GC content 63.4% SheenTimshel HINdeR

Gathering Data Obtain your genome (phagesdb.org) Use DNA Master to obtain Glimmer, GeneMark, and tRNA (Aragorn) data Obtain GeneMark data on web (trained on M. smeg) BLAST genome Phamerator data