Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis Mikhail Gelfand Institute for Information.

Slides:



Advertisements
Similar presentations
GENE REGULATION Virtually every cell in your body contains a complete set of genes But they are not all turned on in every tissue Each cell in your body.
Advertisements

Prokaryotic Gene Regulation:
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Biology Ch. 12 Review.
Riboswitches Sharon Epstein 30/03/2006 Frontiers in Metabolome sciences Feinberg Graduate School.
Regulation and Control of Metabolism in Bacteria
Warm up Mon 11/3/14 Adv Bio 1. What does the phrase “gene regulation” mean? 2. If the lac operon cannot bind to the repressor.. What would be the outcome?
From computer to the wet lab and back Comparative genomics in the discovery of a family of bacterial transporters with a new mode of action Mikhail Gelfand.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Gene regulation. Gene expression models  Prokaryotes and Eukaryotes employ common and different methods of gene regulation  Prokaryotic models 1. Trp.
Genome organization Lesk, Ch 2 (Lesk, 2008). Genomes and proteomes Genome of a typical bacterium comes as a single DNA molecule of about 5 million characters.
AP Biology Chapter 13: Gene Regulation
Chap. 7 Transcriptional Control of Gene Expression (Part A) Topics Control of Gene Expression in Bacteria Overview of Eukaryotic Gene Control and RNA Polymerases.
Copyright © 2005 Brooks/Cole — Thomson Learning Biology, Seventh Edition Solomon Berg Martin Chapter 13 Gene Regulation.
CHAPTER 8 Metabolic Respiration Overview of Regulation Most genes encode proteins, and most proteins are enzymes. The expression of such a gene can be.
3.1 An overview of genetic possesses 3.2 The basis of hereditary 3.3 DNA replication 3.4 RNA and protein synthesis 3.5 Gene expression.
Section 8.6: Gene Expression and Regulation
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Mutations Georgia Standard:
Gene Expression.
Molecular properties of plasmids
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
Regulation of Gene Expression
Draw 8 boxes on your paper
Riboswitches: the oldest regulatory system? Mikhail Gelfand Research and Training Center on Bioinformatics Institute for Information Transmission Problems.
Biology 10.2 Gene Regulation and Structure Gene Regulation and Structure.
Microbial Genetics: DNA Replication Gene Expression
Today: Genetic Technology Wrap-up Exam Review Remember: Final Exam is Wednesday, 12/13 at 1 pm!
Gene Expression Cells use information in genes to build hundreds of different proteins, each with a specific function. But, not all proteins are required.
RNA-metabolite interactions (riboswitches). RNA aptamers RNA aptamers are structures that bind specifically to target ligands Many aptamers have been.
RNA and Protein Synthesis
Regulation of Gene Expression Eukaryotes
Genetics: Chapter 7. What is genetics? The science of heredity; includes the study of genes, how they carry information, how they are replicated, how.
Control of Gene Expression. Steps of gene expression Transcription – DNA is read to make a mRNA in the nucleus of our cells Transcription – DNA is read.
Chapter 16 Outline 16.4 Some Operons Regulate Transcription Through Attenuation, the Premature Termination of Transcription, Antisense RNA Molecules.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
Gene Expression. Cell Differentiation Cell types are different because genes are expressed differently in them. Causes:  Changes in chromatin structure.
Section 2 CHAPTER 10. PROTEIN SYNTHESIS IN PROKARYOTES Both prokaryotic and eukaryotic cells are able to regulate which genes are expressed and which.
A Biology Primer Part III: Transcription, Translation, and Regulation Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Gene Regulation Organisms have lots of genetic information, but they don’t necessarily want to use all of it (or use it fully) at one particular time.
Gene Regulation II : The Ribosome Strikes Back!. Mechanisms Covered Attenuation Control –Tryptophan Biosynthesis Riboswitches –Tryptophan Biosynthesis.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
Comparative genomics of RNA regulatory elements Mikhail Gelfand Research and Training Center “Bioinformatics” Institute for Information Transmission Problems.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Lecture 4: Transcription in Prokaryotes Chapter 6.
Finding genes in the genome
Ribonucleotide reductases (RNRs) catalyse the reduction of ribonucleotides to their corresponding 2`-deoxyribonucleotides and therefore play an essential.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Genes in ActionSection 2 Section 2: Regulating Gene Expression Preview Bellringer Key Ideas Complexities of Gene Regulation Gene Regulation in Prokaryotes.
KEY CONCEPT 8.5 Translation converts an mRNA message into a polypeptide, or protein.
Regulation of Gene Expression in Bacteria (Trp operon) Fahareen-Binta-Mosharraf MNS.
CAMPBELL BIOLOGY IN FOCUS © 2014 Pearson Education, Inc. Urry Cain Wasserman Minorsky Jackson Reece Lecture Presentations by Kathleen Fitzpatrick and Nicole.
8.2 KEY CONCEPT DNA structure is the same in all organisms.
Chapter 7: The Blueprint of Life, from DNA to Protein.
Transcription.
Control of Gene Expression in Prokaryotes
Characterization of Transition Metal-Sensing Riboswitches
1st lesson Medical students Medical Biology Molecular Biology
Figure 18.3 trp operon Promoter Promoter Genes of operon DNA trpR trpE
GENE EXPRESSION AND REGULATION
Regulation of Gene Expression
A Brief History What is molecular biology?
From Mendel to Genomics
Gene Regulation in Prokaryotes
credit: modification of work by NIH
Comparison Of DNA And RNA Synthesis in Prokaryotes and Eukaryotes
DNA, RNA, & Proteins Vocab review
Presentation transcript:

Comparative genomics: functional characterization of new genes and regulatory interactions using computer analysis Mikhail Gelfand Institute for Information Transmission Problems (The Kharkevich Institute), RAS Workshop at the Landau Instiute of Theoretical Physics, RAS September 27-28, 2007, Moscow

The genome is decyphered!

Is it? To intercept a message does not mean to understand it

Fragment of a genome (0.1% of E. coli) A typical bacterial genome: several million nucleotides ~600 through ~9,000 genes (~90% of the genome encodes proteins)

Propaganda sequences in GenBank (~genes) articles in PubMed (~experiments)

More propaganda Most genes will never be studied in experiment Even in E.coli: only new genes per year (hundreds are still uncharacterized) “Universally missing genes” – not a single known gene even for ~10% reactions of the central metabolism. No genes for >40% reactions overall. “Conserved hypothetical genes” (5-15% of any bacterial genome) – essential, but unknown function.

The local goal: to characterize the genes What? –function (rather, role) When? –regulation (conditions) gene expression lifetime (mRNA, protein) Where? –Localization Cellular/membrane/secreted How? –Mechanism of action Specificity, regulation (biochemistry)

2007: > 1200 bacterial genomes Propaganda-2: complete genomes

The global goal: to predict the organism’s properties given its genome (plus some additional information, e.g. the initial state after cell division) and “to understand” the evolution of genomes/organisms

Haemophilus influenzae, 1995

Vibrio cholerae, 2000

The metabolic map, the bird’s view

Metabolic pathways, the eagle’s view

A submap (metabolism of arginine and proline)

Approaches Similarity => homology (common origin) Homology => common function “The Pearson Principle” (after Karl Pearson): important features are conserved –functional sites in proteins –regulatory (protein-binding) sites in DNA –not necessarily sequences: structure of protein and RNA gene localization on chromosomes co-expression of genes Allows one to annotate 50-75% of genes in a bacterial genome Necessary first step, may be automated (to some extent)

… but not so simple Similarity ≠ homology –Low complexity regions, unstructured domains, transmembrane segments and other regions with non-strandard amino acid composition The need for correct similarity measures – Does homology always follow from the structural similarity? What is structural similarity? How can it be measured? Convergent evolution of structures? Independent emergence of folds? Homology ≠ same function –What is «the same function»? Biochemical details and cellular role

“The Fermi principle” (after Enrico Fermi) Purely homology-based annotation: boring (nothing radically new) It turns out, one can predict something completely new Comparative genomics

Positional clustering Genes that are located in immediate proximity tend to be involved in the same metabolic pathway or functional subsystem –caused by operon structure, but not only horizontal transfer of loci containing several functionally linked operons compartmentalisation of products in the cytoplasm –very weak evidence stronger if observed in may unrelated genomes May be measured –e.g. the STRING database/server (P.Bork, EMBL) –and other sources

STRING: trpB – positional clusters

Functionally dependent genes tend to cluster on chromosomes in many different organisms Vertical axis: number of gene pairs with association score exceeding a threshold. Control: same graph, random re-labeling of vertices

More genomes (stronger links) => highly significant clustering

Fusions If two (or more) proteins form a single multidomain protein in some organism, they all are likely to be tightly functionally related Very useful for the analysis of eukaryotes Sometimes useful for the analysis of prokaryotes

STRING: trpB – fusions

Phyletic patterns Functionally linked genes tend to occur together Enzymes with the same function (isozymes) have complementary phyletic profiles

STRING: trpB – co-occurrence (phyletic patterns)

Phyletic patterns in the Phe/Tyr pathway shikimate kinase

Archaeal shikimate-kinase Chorismate biosynthesis pathway (E. coli)

Arithmetics of phyletic patterns 3-dehydroquinate dehydratase (EC ): Class I (AroD) COG0710 aompkzyq---lb-e----n---i-- Class II (AroQ) COG y-vdr-bcefghs-uj---- Two forms combined aompkzyqvdrlbcefghsnuj-i enolpyruvylshikimate 3-phosphate synthase (EC ) AroA COG0128 aompkzyqvdrlbcefghsnuj-i-- Shikimate dehydrogenase (EC ): AroE COG0169 aompkzyqvdrlbcefghsnuj-i-- + Shikimate kinase (EC ): Typical (AroK) COG yqvdrlbcefghsnuj-i-- Archaeal-type COG1685 aompkz Two forms combined aompkzyqvdrlbcefghsnuj-i-- Chorismate synthase (EC ) AroC COG0082 aompkzyqvdrlbcefghsnuj-i--

Distribution of association scores: monotonic for subunits, bimodal for isozymes

Comparative analysis of regulation Phylogenetic footprinting: regulatory sites are more conserved than non-coding regions in general and are often seen as conserved islands in alignments of gene upstream regions Consistency filtering: regulons (sets of co- regulated genes) are conserved => –true sites occur upstream of orthologous genes –false sites are scattered at random

Riboflavin (vitamin B2) biosynthesis pathway

5’ UTR regions of riboflavin genes from bacteria

Conserved secondary structure of the RFN- element Capitals: invariant (absolutely conserved) positions. Lower case letters: strongly conserved positions. Dashes and stars: obligatory and facultative base pairs Degenerate positions: R = A or G; Y = C or U; K = G or U; B= not A; V = not U. N: any nucleotide. X: any nucleotide or deletion

RFN: the mechanism of regulation Transcription attenuation Translation attenuation

Early observation: an uncharacterized gene (ypaA) with an upstream RFN element

Phylogenetic tree of RFN-elements (regulation of riboflavin biosynthesis) duplications no riboflavin biosynthesis

YpaA a.k.a. RibU: riboflavin transporter in Gram-positive bacteria 5 predicted transmembrane segments => a transporter Upstream RFN element (likely co-regulation with riboflavin genes) => transport of riboflaving or a precursor S. pyogenes, E. faecalis, Listeria sp.: ypaA, no riboflavin pathway => transport of riboflavin Prediction: YpaA is riboflavin transporter (Gelfand et al., 1999) Validation: YpaA transports flavines (riboflavin, FMN, FAD): by genetic analysis (Kreneva et al., 2000) by direct measurement (Burgess et al., 2006; Vogl et al., 2007 ) ypaA is regulated by riboflavin: by microarray expression study (Lee et al., 2001) … via attenuation of transcription (and to some extent inhibition of translaition) (Winkler et al., 2003)

Conserved structures of riboswitches (circled: X-ray)

Mechanisms gcvT: ribozyme, cleaves its mRNA (the Breaker group) THI-box in plants: inhibition of splicing (the Breaker and Hanamoto groups)

Characterized riboswitches (more are predicted) RFNRiboflavin biosynthesis and transport FMN (flavin mononucleotide) Bacillus/Clostridium group, proteobacteria, actinobacteria, other bacteria THIBiosynthesis and transport of thiamin and related compounds TPP (thiamin pyrophosphate) Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, other bacteria, archea (thermoplasmas), plants, fungi B12Biosynthesis of cobalamine, transport of cobalt, cobalamin- dependent enzymes Coenzyme B12 (adenosyl- cobalamin) Bacillus/Clostridium group, proteobacteria, actinobacteria, cyanobacteria, spirochaetes, other bacteria S-box SAM-II SAM-III Metabolism of methionine and cystein SAM (S-adenosyl- methionine) Bacillus/Clostridium group and some other bacteria SAM-II (alpha), SAM-III (Streptococci) LYSLysine metabolismlysineBacillus/Clostridium group, enterobacteria, other bacteria G-boxMetabolism of purines purinesBacillus/Clostridium group and some other bacteria glmS (ribozyme) Synthesis of glucosamine-6- phosphate glucosamine-6- phosphate Bacillus/Clostridium group gcvT (tandem) Catabolism of glycine glycineBacillus/Clostridium group

Properties of riboswitches Direct binding of ligands High conservation –Including “unpaired” regions: tertiary interactions, ligand binding Same structure – different mechanisms: transcription, translation, splicing, (RNA cleavage) Distribution in all taxonomic groups –diverse bacteria –archaea: thermoplasmas –eukaryotes: plants and fungi Correlation of the mechanism and taxonomy: –attenuation of transcription (anti-anti-terminator) – Bacillus/Clostridium group –attenuation of translation (anti-anti-sequestor of translation initiation) – proteobacteria –attenuation of translation (direct sequestor of translation initiation) – actinobacteria Evolution: horizontal transfer, duplications, lineage-specific loss Sometimes very narrow distribution: evolution from scratch?

Conserved signal upstream of nrd genes

Identification of the candidate regulator by the analysis of phyletic patterns COG1327: the only COG with exactly the same phylogenetic pattern as the signal –“large scale” on the level of major taxa –“small scale” within major taxa: absent in small parasites among alpha- and gamma- proteobacteria absent in Desulfovibrio spp. among delta-proteobacteria absent in Nostoc sp. among cyanobacteria absent in Oenococcus and Leuconostoc among Firmicutes present only in Treponema denticola among four spirochetes

COG1327 “Predicted transcriptional regulator, consists of a Zn-ribbon and ATP-cone domains”: regulator of the riboflavin pathway (RibX)?

Additional evidence: co-localization nrdR is sometimes clustered with nrd genes or with replication genes dnaB, dnaI, polA

Additional evidence: co-regulated genes In some genomes, candidate NrdR- binding sites are found upstream of other replication- related genes –dNTP salvage –topoisomerase I, replication initiator dnaA, chromosome partitioning, DNA helicase II

Multiple sites (nrd genes): FNR, DnaA, NrdR

Mode of regulation Repressor (overlaps with promoters) Co-operative binding: –most sites occur in tandem (> 90% cases) –the distance between the copies (centers of palindromes) equals an integer number of DNA turns: mainly (94%) bp, in 84% bp – 3 turns 21 bp (2 turns) in Vibrio spp bp (4 turns) in some Firmicutes

Experimental validations

Acknowledgements Dmitry Rodionov (comparative genomics) Andrei Mironov (software) Alexei Vitreschak (riboswitches) Funding: –Howard Hughes Medical Institute –Russian Foundation of Basic Research –RAS, program “Molecular and Cellular Biology” –INTAS