Computational Biology, Part 8 Protein Coding Regions Robert F. Murphy Copyright  1996-2009. All rights reserved.

Slides:



Advertisements
Similar presentations
Mutations. DNA mRNA Transcription Introduction of Molecular Biology Cell Polypeptide (protein) Translation Ribosome.
Advertisements

Transcription & Translation Worksheet
Computational Biology, Part 4 Protein Coding Regions Robert F. Murphy Copyright  All rights reserved.
Transcription and Translation
Nature and Action of the Gene
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
FEATURES OF GENETIC CODE AND NON SENSE CODONS
How Proteins are Produced
Sec 5.1 / 5.2. One Gene – One Polypeptide Hypothesis early 20 th century – Archibald Garrod physician that noticed that some metabolic errors were found.
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Protein Synthesis. Review! What are some functions of proteins? What are some functions of proteins? Enzymes, which speed up chemical reactions Enzymes,
How Proteins Are Made Mrs. Wolfe. DNA: instructions for making proteins Proteins are built by the cell according to your DNA What kinds of proteins are.
PowerPoint ® Lecture Slides prepared by Janice Meeking, Mount Royal College C H A P T E R Copyright © 2010 Pearson Education, Inc. 3 Cells: The Living.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Fig Second mRNA base First mRNA base (5 end of codon) Third mRNA base (3 end of codon)
General Biology Notes TRANSLATION A.K.A. PROTEIN SYNTHESIS
Translation Protein Synthesis Part 2 pp
Protein Synthesis-- Transcription and Translation.
7. Protein Synthesis and the Genetic Code a). Overview of translation i). Requirements for protein synthesis ii). messenger RNA iii). Ribosomes and polysomes.
Cell Division and Gene Expression
Chapter 14 Genetic Code and Transcription. You Must Know The differences between replication (from chapter 13), transcription and translation and the.
The Genetic Code and Translation. Codon – three mRNA bases in a row that code for an amino acid Divide this mRNA strand into codons: AUGGGCACUCUCGCAUGA.
Chapter 17 From Gene to Protein. Protein Synthesis  The information content of DNA  Is in the form of specific sequences of nucleotides along the DNA.
The Genetic Code Objective: F2 - Explain … process of transcription & translation …, describe the role of DNA & RNA during protein synthesis, & recognize.
Chapter 17 How to read a table of codons. These are two forms in which you might see a table of codons.
©1998 Timothy G. Standish From DNA To RNA To Protein Timothy G. Standish, Ph. D.
Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.
Today 14.2 & 14.4 Transcription and Translation /student_view0/chapter3/animation__p rotein_synthesis__quiz_3_.html.
GENE EXPRESSION. Transcription 1. RNA polymerase unwinds DNA 2. RNA polymerase adds RNA nucleotides (A ↔ U, G ↔ C) 3. mRNA is formed! DNA reforms a double.
Genetic code From Wikipedia, the free encyclopedia Edited by Jungho Kim.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Translation mRNA tRNA rRNA Amino Acids Proteins. Protein Synthesis & Translation.
Figure 17.4 DNA molecule Gene 1 Gene 2 Gene 3 DNA strand (template) TRANSCRIPTION mRNA Protein TRANSLATION Amino acid ACC AAACCGAG T UGG U UU G GC UC.
How Genes Work: From DNA to RNA to Protein Chapter 17.
Gene Translation:RNA -> Protein How does a particular sequence of nucleotides specify a particular sequence of amino acids?nucleotidesamino acids The answer:
F. PROTEIN SYNTHESIS [or translating the message]
Protein Synthesis- Translation
Translation PROTEIN SYNTHESIS.
Whole process Step by step- from chromosomes to proteins.
Please turn in your homework
Protein Synthesis Part 2 pp
The blueprint of life; from DNA to Protein
From gene to protein DNA mRNA protein trait nucleus cytoplasm
Where is Cytochrome C? What is the role? Where does it come from?
Mutations.
Introduction to Bioinformatics II
Warm-Up 3/12/13 After transcription, an mRNA molecule with the sequence A U A C G C A G U was created. What was the sequence of the original DNA strand?
What is Transcription and who is involved?
From Gene to Phenotype- part 2
Ch. 17 From Gene to Protein Thought Questions
Overview: The Flow of Genetic Information
Overview: The Flow of Genetic Information
Protein Synthesis.
PROTEIN SYNTHESIS RELAY
The genetic code © 2016 Paul Billiet ODWS.
More on translation.
Methods used to study mutations
Translation -The main purpose of translation is to create proteins from mRNA  -mRNA serves as a template during protein synthesis -this means that, ultimately,
TRANSLATION Protein Synthesis
Warm Up 3 2/5 Can DNA leave the nucleus?
Today’s notes from the student table Something to write with
Central Dogma and the Genetic Code
Translation.
Central Dogma of Genetics
DNA, RNA, Amino Acids, Proteins, and Genes!.
Ch Protein Synthesis Protein Synthesis Proteins are polypeptides
Gene Protein Genome Proteome Genomics Proteomics.
DNA to proteins.
Introduction to Bioinformatics II
Presentation transcript:

Computational Biology, Part 8 Protein Coding Regions Robert F. Murphy Copyright  All rights reserved.

Sequence Analysis Tasks  Finding protein coding regions

Goal Given a DNA or RNA sequence, find those regions that code for protein(s) Given a DNA or RNA sequence, find those regions that code for protein(s)  Direct approach: Look for stretches that can be interpreted as protein using the genetic code  Statistical approaches: Use other knowledge about likely coding regions

Direct Approach

Genetic codes The set of tRNAs that an organism possesses defines its genetic code(s) The set of tRNAs that an organism possesses defines its genetic code(s) The universal genetic code is common to all organisms The universal genetic code is common to all organisms Prokaryotes, mitochondria and chloroplasts often use slightly different genetic codes Prokaryotes, mitochondria and chloroplasts often use slightly different genetic codes More than one tRNA may be present for a given codon, allowing more than one possible translation product More than one tRNA may be present for a given codon, allowing more than one possible translation product

Genetic codes Differences in genetic codes occur in start and stop codons only Differences in genetic codes occur in start and stop codons only Alternate initiation codons: codons that encode amino acids but can also be used to start translation (GUG, UUG, AUA, UUA, CUG) Alternate initiation codons: codons that encode amino acids but can also be used to start translation (GUG, UUG, AUA, UUA, CUG) Suppressor tRNA codons: codons that normally stop translation but are translated as amino acids (UAG, UGA, UAA) Suppressor tRNA codons: codons that normally stop translation but are translated as amino acids (UAG, UGA, UAA)

Genetic codes

Note additional start codons: UUA, UUG, CUG Note additional start codons: UUA, UUG, CUG Note conversion of stop codon UGA (opal) to Trp Note conversion of stop codon UGA (opal) to Trp

Reading Frames Since nucleotide sequences are “read” three bases at a time, there are three possible “frames” in which a given nucleotide sequence can be “read” (in the forward direction) Since nucleotide sequences are “read” three bases at a time, there are three possible “frames” in which a given nucleotide sequence can be “read” (in the forward direction) Taking the complement of the sequence and reading in the reverse direction gives three more reading frames Taking the complement of the sequence and reading in the reverse direction gives three more reading frames

Reading frames TTC TCA TGT TTG ACA GCT TTC TCA TGT TTG ACA GCT RF1 Phe Ser Cys Leu Thr Ala> RF2 Ser His Val *** Gln Leu> RF3 Leu Met Phe Asp Ser> AAG AGT ACA AAC TGT CGA AAG AGT ACA AAC TGT CGA RF4 <Glu *** Thr Gln Cys Ser RF5 <Glu His Lys Val Ala RF6 <Arg Met Asn Ser Leu

Open Reading Frames (ORF) Concept: Region of DNA or RNA sequence that could be translated into a peptide sequence (open refers to absence of stop codons) Concept: Region of DNA or RNA sequence that could be translated into a peptide sequence (open refers to absence of stop codons) Prerequisite: A specific genetic code Prerequisite: A specific genetic code Definition: Definition:  (start codon) (amino acid coding codon) n (stop codon) Note: Not all ORFs are actually used Note: Not all ORFs are actually used

Open Reading Frames Click boxes for List ORFS and ORF map Click boxes for List ORFS and ORF map

Check reading frame: mod(696,3)=0 -> RF3 Check reading frame: mod(696,3)=0 -> RF3

EMBOSS plotorf

Splicing ORFs For eukaryotes, which have interrupted genes, ORFs in different reading frames may be spliced together to generate final product For eukaryotes, which have interrupted genes, ORFs in different reading frames may be spliced together to generate final product ORFs from forward and reverse directions cannot be combined ORFs from forward and reverse directions cannot be combined

Block Diagram for Search for ORFs Search Engine Sequence to be searched Genetic code List of ORF positions Both strands? Ends start/stop?

Statistical Approaches

Calculation Windows Many sequence analyses require calculating some statistic over a long sequence looking for regions where the statistic is unusually high or low Many sequence analyses require calculating some statistic over a long sequence looking for regions where the statistic is unusually high or low To do this, we define a window size to be the width of the region over which each calculation is to be done To do this, we define a window size to be the width of the region over which each calculation is to be done Example: %AT Example: %AT

Base Composition Bias For a protein with a roughly “normal” amino acid composition, the first 2 positions of all codons will be about 50% GC For a protein with a roughly “normal” amino acid composition, the first 2 positions of all codons will be about 50% GC If an organism has a high GC content overall, the third position of all codons must be mostly GC If an organism has a high GC content overall, the third position of all codons must be mostly GC Useful for prokaryotes Useful for prokaryotes Not useful for eukaryotes due to large amount of noncoding DNA Not useful for eukaryotes due to large amount of noncoding DNA

Fickett’s statistic Also called TestCode analysis Also called TestCode analysis Looks for asymmetry of base composition Looks for asymmetry of base composition Strong statistical basis for calculations Strong statistical basis for calculations Method: Method:  For each window on the sequence, calculate the base composition of nucleotides 1, 4, 7..., then of 2, 5, 8..., and then of 3, 6, 9...  Calculate statistic from resulting three numbers

Codon Bias (Codon Preference) Principle Principle  Different levels of expression of different tRNAs for a given amino acid lead to pressure on coding regions to “conform” to the preferred codon usage  Non-coding regions, on the other hand, feel no selective pressure and can drift

Codon Bias (Codon Preference) Starting point: Table of observed codon frequencies in known genes from a given organism Starting point: Table of observed codon frequencies in known genes from a given organism  best to use highly expressed genes Method Method  Calculate “coding potential” within a moving window for all three reading frames  Look for ORFs with high scores

Codon Bias (Codon Preference) Works best for prokaryotes or unicellular eukaryotes because for multicellular eukaryotes, different pools of tRNA may be expressed at different stages of development in different tissues Works best for prokaryotes or unicellular eukaryotes because for multicellular eukaryotes, different pools of tRNA may be expressed at different stages of development in different tissues  may have to group genes into sets Codon bias can also be used to estimate protein expression level Codon bias can also be used to estimate protein expression level

Portion of D. melanogaster codon frequency table

Comparison of Glycine codon frequencies

Illustration of Codon Bias Plots Use Entrez via MacVector to get sequence of lexA Use Entrez via MacVector to get sequence of lexA  under “Database” select “Internet Entrez Search”  Select gene=lexA AND organism=Escherichia  Pick one (e.g., region from 89.2 to 92.8) Under “Analyze” select “Codon Preference Plots” Under “Analyze” select “Codon Preference Plots”  Choose Escherichia coli codon bias file  Choose gene region corresponding to lacZ  Click on Staden codon bias and Gribskov codon bias

Codon Preference Algorithms The Staden method (from Staden & McLachlan, 1982) uses a codon usage table directly in identifying coding regions. The codon usage table is normalized so that the sum of all 64 codons is 1. The usages for each codon in each reading frame in each window are multiplied together and normalized by the sum of the probabilities in all three positions to generate a relative coding probability. The Staden method (from Staden & McLachlan, 1982) uses a codon usage table directly in identifying coding regions. The codon usage table is normalized so that the sum of all 64 codons is 1. The usages for each codon in each reading frame in each window are multiplied together and normalized by the sum of the probabilities in all three positions to generate a relative coding probability.

Codon Preference Algorithms The Gribskov method uses a codon usage table normalized so that the sum of the alternatives for each amino acid add to 1. The values for each codon for each reading frame in each window are multiplied together and normalized by the random probability expected for that codon given the mononucleotide frequencies of the target sequence. It is the most commonly used method. The Gribskov method uses a codon usage table normalized so that the sum of the alternatives for each amino acid add to 1. The values for each codon for each reading frame in each window are multiplied together and normalized by the random probability expected for that codon given the mononucleotide frequencies of the target sequence. It is the most commonly used method.

Plot from Plot from syco

Summary Translation of nucleic acid sequences into hypothetical protein sequences requires a genetic code Translation of nucleic acid sequences into hypothetical protein sequences requires a genetic code Translation can occur in three forward and three reverse reading frames Translation can occur in three forward and three reverse reading frames Open reading frames are regions that can be translated without encountering a stop codon Open reading frames are regions that can be translated without encountering a stop codon

Summary The likelihood that a particular open reading frames is in fact a coding region (actually made into protein) can be estimated using third-codon base composition or codon preference tables The likelihood that a particular open reading frames is in fact a coding region (actually made into protein) can be estimated using third-codon base composition or codon preference tables This can be used to scan long sequences for possible coding regions This can be used to scan long sequences for possible coding regions