Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Section 8.6: Gene Expression and Regulation
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
Comparative Motif Finding
TRANSFAC Project Roadmap Discussion.  Structure DNA-binding domain (DBD)  The portion (domain) of the transcription factor that binds DNA Trans-activating.
Introduction to BioInformatics GCB/CIS535
Alternative splicing and evolution Daniel Jeffares.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
Promoter structure and gene regulation. Bacterial Promoters Source:
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Cis-regulatory element study in transcriptome Jin Chen CSE Fall
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Grupo 5. 5’site 3’site branchpoint site exon 1 intron 1 exon 2 intron 2 AG/GT CAG/NT.
Genome Organization & Evolution. Chromosomes Genes are always in genomic structures (chromosomes) – never ‘free floating’ Bacterial genomes are circular.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Statistical Analysis for Word counting in Drosophila Core Promoters Yogita Mantri April Bioinformatics Capstone presentation.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Identification of Compositionally Similar Cis-element Clusters in Coordinately Regulated Genes Anil G Jegga, Ashima Gupta, Andrew T Pinski, James W Carman,
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.
Using blast to study gene evolution – an example.
Complexities of Gene Expression Cells have regulated, complex systems –Not all genes are expressed in every cell –Many genes are not expressed all of.
Bioinformatics and Computational Biology
Cis-regulatory Modules and Module Discovery
Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Motif Search and RNA Structure Prediction Lesson 9.
Finding genes in the genome
Accessing and visualizing genomics data
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
Regulation of Gene Expression
Detection of genome regulation sequences
Basics of Comparative Genomics
Molecular Mechanisms of Gene Regulation
Recitation 7 2/4/09 PSSMs+Gene finding
Relationship between Genotype and Phenotype
Basics of Comparative Genomics
Nora Pierstorff Dept. of Genetics University of Cologne
Relationship between Genotype and Phenotype
Presentation transcript:

Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park

Genome Projects GOLD: Genomes Online Database (

Genome Projects GOLD: Genomes Online Database (

Genome Projects GOLD: Genomes Online Database (

Genome Browsers -NCBI Map Viewer -Ensembl -UCSC Genome Browser The three databases use the same genome assembly, which is generated by NCBI.

Ensembl

-genomic regions -alignments with synthenic sequences -genes - Homologs, SNPs - transcripts - EMBL mRNAS, ESTs, Expression -proteins -Gene Ontology (function), protein domains, disease associations

Ensembl - Biomart - retrieval of information on gene datasets

Gene comparative sequence analysis Genome and transcriptome projects have generated a vast amount of information on protein-coding and non-coding gene sequences. Identification of conserved sequences in different genes can help us understand gene evolution and identify functional regions. species 1 species 2 x N genes (orthologs)... promoter coding species m

Non-coding sequences in vertebrate genomes -only 1.2% of the human genome codes for proteins -but 5% exhibits high sequence conservation levels, compatible with negative selection (MGSC, 2002) -non-coding - Transcription regulatory regions - Introns - Non-protein coding exons/genes (miRNAs, etc.) - Repetitive elements (Alus, etc.) - Ultra-conserved elements

Gene transcription regulatory sequences Maston et al., 2006 Annu. Rev. Genomics Hum. Genet. 7: 29-59

Frequently-found metazoan motifs in the core promoter Maston et al., 2006

Wray et al. (2003), Mol. Biol. Evol. 20(9): Eukaryotic promoter diversity

High evolvability of regulatory sequences -most of the changes in regulatory networks are likely to occur in cis; changes in trans (transcription factors) may often have too strong effects. -one single mutation may lead to the acquisition of a new DNA-factor interaction (rapid turnover) -the expression in one tissue may evolve independently of expression in another tissue (promoter modular organization) Wray et al. (2003) The Evolution of Transcriptional Regulation in Eukaryotes. Mol. Biol. Evol. 20(9):

Transcription factor binding sites (TFBS) are short and imprecise -short sequence motifs (6-12 bp) - some positions of the motif are variable - sometimes different transcription factors can recognize the same sequence motif TATAAA TATAGA TATAAA GATAAA TATAAA TATAAT *** TATA box

Transcription factor binding sites (TFBS) Weight matrices TATAAA TATAGA TATAAA GATAAA TATAAA TATAAT *** A C G T > can be used to search for putative motifs in sequences

TRANSFAC TRRD Place ooTFD / rTFD SCPD RegulonDB Transcription factor binding site databases

TFBS prediction using weight matrices PROMO Farré, D., et al. (2003). Nucleic Acids Research 31:

High false positive rate in TFBS prediction Test Sequences: 200 vertebrate promoter sequences 607 experimentally-verified sites Blanco, E., et al.. (2006). Nucleic Acids Research 34: D63-D67. Predictions: Transfac v.6.4 SENSITIVITY: 46% SPECIFICITY: 2% Very low!

Comparative approaches are necessary - orthologous sequences : phylogenetic footprinting - co-expressed genes : shared regulatory motifs Select those motifs or regions that are shared by:

Boffelli D, Nobrega MA, Rubin EM. (2004) Nat Rev Genet. 5: Phylogenetic footprinting

Highly conserved enhancer in gene DACH1 Phylogenetic footprinting

Proximal promoter pre-initiation complex

Motif positional bias Signal Search Analysis Server (SIB)

Why some motifs should show positional bias? - promoter structure - protein-protein interaction positional constraints Predicted element Reference element (known) TFBS 1 proximal promoter TSS PIC ACT TFBS 1 TFB 2 regulatory module TF1TF2

PEAKS: identification of motif positional bias functionally-related sequences (ex. co-expressed) random Predicted element Reference element (known) TSSTFBS over-representation

seq1 seq2 seq3 seq4 PEAKS Step 1. Construct motif frequency profile profile sliding window Predicted element Reference element (known)

PEAKS Step 1. Construct motif frequency profile 308 housekeeping genes Transfac v.6.4 matrix library TSS

PEAKS Step 2. Measure significance of peaks Score (max peak) = Sa x Sb x Sc Sa = max peak / num motif Sb = max peak / num seq Sc = max peak / average num motifs maximum peak For each matrix: CAAT-box average signal difference

PEAKS Step 2. Measure significance of peaks - determine random expectation score cut-off for different levels of significance using 1000 random datasets - define significant signal range: cut-off max peak CAAT-box aver signal

PEAKS Step 3. Build “promoter type” 52 genes regulated by NFkB, p < 0.5% TATA Sp1 NFkB BACH1

PEAKS server Bellora, Farré and Albà (2007). Bioinformatics 23,

308 housekeeping genes52 NFkB regulated genes TATA CAAT GC-box YY TATA NFkB GC-box BACH1 PEAKS results human promoter sequences TRANSFAC vertebrate matrices

PEAKS results promoters from yeast genes, amino acid metabolism (86 genes) - 54 yeast weight matrices tested - significant regions detected by the method show significant enrichment in experimentally-validated sites

Measuring promoter sequence divergence promoter species 1 species 2 promoter species 1 species 2 Divergence (Non-aligned promoter fraction or dSM) Castillo-Davis et al., highly divergent -> less constraints 2. highly conserved -> more constraints

Variability in promoter sequence divergence 8385 human-mouse orthologues 2 Kb from transcription start site Average divergence = 70%

Regulatory genes contain more conserved promoters than structural/metabolic genes Functional classes enriched in high score promoter alignments Lee et al. (2006). BMC Genomics 6: consistent with results by Iwama and Gojobori (2004)

Structural/metabolic genes contain less highly conserved promoters than regulatory genes Functional classes enriched in low score promoter alignments Lee et al. (2006). BMC Genomics 6: 188

Comparison neurogenesis versus ribosomal neurogenesis ribosomal Lee et al. (2006). BMC Genomics 6: 188

Is expression breadth related to promoter sequence divergence? Expression data from Zhang et al. (2004) tissue-specific intermediate housekeeping orthologues human-mouse

promoter species 1 species 2 Measure sequence divergence -tissue-specific -intermediate -housekeeping Divergence = non-aligned promoter fraction 2 Kb

Relationship between promoter divergence and expression breadth number of tissues Coding sequence evolutionary rate Promoter divergence but.. housekeeping tissue-specific intermediate promoter divergence coding sequence divergence

Relationship between promoter divergence and expression breadth - divergence measured in 100 nt bins housekeeping non-housekeeping TSS % conservation

Promoter divergence and gene function highly divergent promoter RNA binding ligase activity hydrolase activity catalytic activity highly conserved promoter receptor binding signal transducer activity receptor activity structural molecule activity transcription regulator activity transcription factor activity DNA binding GO class > 50 genes, p-value < 0.01

Promoter divergence and gene function divergence

Summary - the prediction of transcription factor binding sites is very noisy, we need to use comparative genomics - some motifs show positional bias, this property can help us understand the structure of promoters and improve motif predictions -promoter sequence conservation is related to gene function and to gene expression breadth. the fact that housekeeping genes contain less conserved promoters may obey to a more simple gene expression regulation

Nicolas Bellora Domènec Farré Loris Mularoni Macarena Toll The team Evolutionary Genomics Group Universitat Pompeu Fabra, Barcelona Medya Shikhagaie