An Introduction to Bioinformatics Finding genes in prokaryotes.

Slides:



Advertisements
Similar presentations
1 Gene expression Transcription and Translation 2 1.Important Features a. DNA contains genetic template for proteins. b. DNA is found in the nucleus.
Advertisements

RNA and Protein Synthesis
DNA & genetic information DNA replication Protein synthesis Gene regulation & expression DNA structure DNA as a carrier Gene concept Definition Models.
Translation and Transcription
1. Important Features a. DNA contains genetic template" for proteins.
Protein Synthesis.
Genes and Protein Synthesis
Biological Motivation Gene Finding in Eukaryotic Genomes
The Genetic Code and Transcription
From Gene to Protein. Question? u How does DNA control a cell? u By controlling Protein Synthesis. u Proteins are the link between genotype and phenotype.
{ DNA Processes: Transcription and Translation By: Sidney London and Melissa Hampton.
Gene Structure and Identification
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
Protein Synthesis Chapter 12 p. 300 Transcription and Translation.
Chapter 17 Notes From Gene to Protein.
FROM DNA TO PROTEIN Transcription – Translation We will use:
CENTRAL DOGMA OF BIOLOGY. Transcription & Translation How do we make sense of the DNA message? Genotype to Phenotype.
Genes and how they work!. Genetic Code How does the order of nucleotides in DNA encode information to specify the order of amino acids?
Part Transcription 1 Transcription 2 Translation.
FROM DNA TO PROTEIN Transcription – Translation. I. Overview Although DNA and the genes on it are responsible for inheritance, the day to day operations.
Chapter 10 Transcription RNA processing Translation Jones and Bartlett Publishers © 2005.
1 Gene expression Transcription and Translation. 2 1.Important Features: Eukaryotic cells a. DNA contains genetic template for proteins. b. DNA is found.
1 Genes and How They Work Chapter Outline Cells Use RNA to Make Protein Gene Expression Genetic Code Transcription Translation Spliced Genes – Introns.
Raven - Johnson - Biology: 6th Ed. - All Rights Reserved - McGraw Hill Companies Genes and How They Work Chapter 15 Copyright © McGraw-Hill Companies Permission.
Chapter 13. The Central Dogma of Biology: RNA Structure: 1. It is a nucleic acid. 2. It is made of monomers called nucleotides 3. There are two differences.
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Chapter 17 Central Dogma of Molecular Biology From Genes to Protein One gene – one polypeptide hypothesis One gene dictates the production of a single.
Protein Synthesis Process that makes proteins
Protein Synthesis Chapter Protein synthesis- the production of proteins The amount and kind of proteins produced in a cell determine the structure.
PROTEIN SYNTHESIS The formation of new proteins using the code carried on DNA.
Protein Synthesis. Transcription DNA  mRNA Occurs in the nucleus Translation mRNA  tRNA  AA Occurs at the ribosome.
Chapter 17 From Gene to Protein. Gene Expression DNA leads to specific traits by synthesizing proteins Gene expression – the process by which DNA directs.
AP Biology Discussion Notes Friday 02/06/2015. Goals for Today Be able to describe RNA processing and why it is EVOLUTIONARILY important. In a more specific.
Protein Synthesis Transcription and Translation. Protein Synthesis: Transcription Transcription is divided into 3 processes: –Initiation, Elongation and.
Transcription and mRNA Modification
Genes and How They Work Chapter The Nature of Genes information flows in one direction: DNA (gene)RNAprotein TranscriptionTranslation.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
DNA, RNA, and Protein Replication Transcription Translation.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Transcription Vocabulary of transcription: transcription - synthesis of RNA under the direction of DNA messenger RNA (mRNA) - carries genetic message from.
DNA in the Cell Stored in Number of Chromosomes (24 in Human Genome) Tightly coiled threads of DNA and Associated Proteins: Chromatin 3 billion bp in Human.
Protein Synthesis-Transcription Why are proteins so important? Nearly every function of a living thing is carried out by proteins … -DNA replication.
Genes and Protein Synthesis
Protein Synthesis.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Finding genes in the genome
Cells use information in genes to build several thousands of different proteins, each with a unique function. But not all proteins are required by the.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
RNA, Transcription, and the Genetic Code. RNA = ribonucleic acid -Nucleic acid similar to DNA but with several differences DNARNA Number of strands21.
Transcription and The Genetic Code From DNA to RNA.
PROTEIN SYNTHESIS The formation of new proteins using the code carried on DNA.
The Central Dogma of Life. replication. Protein Synthesis The information content of DNA is in the form of specific sequences of nucleotides along the.
Chapter 17 From Gene to Protein. One gene, one protein Chapter 17 From Gene to Protein.
From Gene to Protein Chapter 17. Overview of Transcription & Translation.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Biological Motivation Gene Finding in Eukaryotic Genomes Rhys Price Jones Anne R. Haake.
Eukaryotic Gene Structure
A Quest for Genes What’s a gene? gene (jēn) n.
CENTRAL DOGMA OF BIOLOGY
Transcription.
Molecular Biology DNA Expression
Protein Synthesis.
Transcription and Translation
Recitation 7 2/4/09 PSSMs+Gene finding
Introduction to Bioinformatics II
Chapter 17 From Gene to Protein.
copyright cmassengale
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
Gene Structure.
Gene Structure.
Presentation transcript:

An Introduction to Bioinformatics Finding genes in prokaryotes

AIMS To establish the concept of ORFs and their relationship to genes To describe the features used by software to find ORFs/genes To become familiar with Web-based programmes used to find ORFs/genes OBJECTIVES To be able to distinguish between the concepts of ORF and gene Use ORF Finder to find ORFs in prokaryotic nucleotide sequences

Usually the primary challenge that follows the sequencing of anything from a small segment of DNA to a complete genome is to establish where the location functional elements such as: genes (intron/exon boundaries) promoters, terminators etc DNA sequences that may potentially encode proteins are called Open Reading Frames (ORFs) The situation in prokaryotes is relatively straightforward since scarcely any eubacterial and archaeal genes contain introns

FINDING ORFs The simplest method in prokaryotes is to scan the DNA for start and stop codons The DNA is double stranded and each strand has three potential reading frames (codons are groups of 3 bases) THE CAT ATE THE RAT Frame 1 T HEC ATA TET HER AT Frame 2 TH ECA TAT ETH ERA T Frame 3 The scan must look at all 6 reading frames

Any region of DNA between a start codon and a stop codon in the same reading frame could potentially code for a polypeptide and is therefore an ORF Start AUG (methionine)Stop UAA UAG UGA small potential coding sequences like this will occur frequently by chance, and therefore the longer they are the more likely they are to represent real coding regions, genes Problems Small genes may be missed The actual start codon may be internal to the ORF There may be overlapping genes

The simplest tool for finding ORFs is ORF Finder at NCBI It simply scans all 6 reading frames and shows the position of the ORFs which are greater than a user defined minimum size The genetic code used for the analysis can be altered by the user This would be important if e.g. mitochondrial or ciliate nuclear DNA were being analysed

To overcome the limitations of ORF finder, more sophisticated programmes detect compositional biases and increase the reliability of gene detection These compositional biases are regular, though very diffuse, And arise for a variety of reasons: many organisms there is a detectable preference for G or C over A and T in the third ("wobble") position in a codon all organisms do not utilize synonymous codons with the same frequency - consequently there is a codon bias there is an unequal usage of amino acids in proteins sufficient to cause a bias in all three positions of codons and increase the overall codon bias

the %GC content of the first two codon positions of the universal genetic code is approximately 50%, therefore, organisms which have a low or high %GC content will exhibit a marked bias at the third position of codons to achieve their overall %GC content The most recent approaches to using compositional features to distinguish coding from non-coding regions employ ‘Markov models’ such approaches include the popular GENEMARK and GLIMMER programs

Finding Genes in Eukaryotes An Introduction to Bioinformatics

AIMS To establish the concept of ORFs and their relationship to genes To describe the features used by software to find ORFs/genes To become familiar with Web-based programmes used to find ORFs/genes OBJECTIVES To be able to distinguish between the concepts of ORF and gene Use ORF Finder to find ORFs in prokaryotic nucleotide sequences To describe the complications of the eukaryote “signals” To be aware of the Web-based programmes To be able to use the eukaryote programmes for a number of organisms

Organisms whose cells have a membrane-bound nucleus and many specialised structures located within their cell boundary. In these organisms, genetic material is organized into chromosomes that reside in the nucleus.

Principles Content - codon usage – often species or class specific Signals - PWMs – principle is the same, signals are different – Complication of introns/exons

Eukaryotic promoter TATA boxGC boxCAAT box 5’ 3’ mRNA In addition - transcription factor binding sites Genes can be enormous! Controlled by “distant” enhancers

AAUAA ~ 12bp polyA AAAAA…... Kozak sequence At translational start Polyadenylation sequence AUG Signals on the mRNA STOP

Introns and Exons Chicken 1  2 collagen gene has - 38 kb > 50 Introns Muscular Dystrophy gene is 2.5 Mb and has ? Exons!

Splicing signals C A T C A G C T AGGT AGT N AGG () >11 5’Exon 3’Exon GT-AG rule

Exon finding Initial exons, from the initiation codon to the first splice site; Internal exons from splice site to splice site; Terminal exons from splice site to stop codon; Single introns corresponding to uninterrupted, intronless genes, i.e., running from initiation codon to stop codon.

Intergrated Gene Parsing Search for signals Perform a content analysis Define the intron/exon boundaries

Gene finding web sites >25 listed sites GENSCAN FGENES