Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Asking translational research questions using ontology enrichment analysis Nigam Shah
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Gene expression analysis summary Where are we now?
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
TRANSFAC Project Roadmap Discussion.  Structure DNA-binding domain (DBD)  The portion (domain) of the transcription factor that binds DNA Trans-activating.
Introduction to BioInformatics GCB/CIS535
Tutorial 5 Motif discovery.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
Biological Motivation Gene Finding in Eukaryotic Genomes
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Introduction to gene expression Seema Zargar. Lecture outline Introduction to all terms used in Gene expression.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Microarray data analysis towards the understanding the role of hzy in the formation of rhabdomeres Ashwini Oke School of Informatics, Indiana University.
CSCE555 Bioinformatics Lecture 11 Promoter Predication
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Whole transcriptome analysis of germinating smoke water treated maize seeds Endre Sebestyén ARI-HAS Department of Applied Genomics H-2462 Martonvásár,
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Sequence analysis – an overview A.Krishnamachari
Grupo 5. 5’site 3’site branchpoint site exon 1 intron 1 exon 2 intron 2 AG/GT CAG/NT.
Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Introduction to Gene Expression
Comparative Genomics Gene Regulatory Networks (GRNs) Anil Jegga Biomedical Informatics Contact Information: Anil Jegga Biomedical Informatics Room # 232,
From Genomes to Genes Rui Alves.
Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
E14.5E16.5E18.5 Normalized mRNA level Get1 Nfix Smarcd3 A Supplementary Figure 1 (A) The microarray expression levels of bladder terminal differentiation.
Local Multiple Sequence Alignment Sequence Motifs
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, , 10.4,
Transcription factor binding motifs (part II) 10/22/07.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
زیست شناسی سلولی و مولکولی (Cellular and Molecular Biology)
Regulation of Gene Expression
The Transcriptional Landscape of the Mammalian Genome
Detection of genome regulation sequences
Recitation 7 2/4/09 PSSMs+Gene finding
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Volume 5, Issue 2, Pages (March 2012)
Mapping Global Histone Acetylation Patterns to Gene Expression
Nora Pierstorff Dept. of Genetics University of Cologne
BIOBASE Training TRANSFAC® ExPlain™
Presentation transcript:

Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Transcription factor binding sites TFs bind short, often degenerate DNA sequences Promoters are variable length 5’ sequences ▫With TFBSs TFBSs are usually conserved in a nonconserved surrounding sequence Some well known TFBSs ▫TATA box ▫GC box ▫CpG island Lots of other, less genereal TFBSs Similarly expressed genes, or homologues should contain similar TFBSs

Transcription

TFBS search and promoter analysis Wet-lab methods ▫DNAse footprinting ▫Electrophoretic mobility shift assay ▫ChIP-Chip, ChIP-Seq In silico methods ▫Experimentally verified sites  Consensus sequences  Consensus matrices ▫De novo motif discovery  Oligo frequency  Phylogenetic footprinting  Other methods

Experimentally verified sites TRANSFAC JASPAR PLACE PlantCARE

De novo motif discovery Orthologous gene groups ▫Evolutionary conserved functional sites Co-regulated genes ▫Same tissue, body part ▫Same developmental stage ▫Etc

„Real” promoter structure No general motifs ▫No TATA-box, GC-box, etc Lots of false positive TFBS ▫With wet-lab and in silico methods Sometimes no apparent common TFBSs between coregulated genes

Database of Orthologous Promoters Orthologous promoter sequence collections ▫Based on a BLAST search with first exons of reference species  Plants (Viridiplantae)  Reference species: Arabidopsis thaliana  Chordates  Reference species: Homo sapiens ▫500/1000/3000 bp 5’ upstream regions  Conserved sequence regions  Annotations  Xrefs to other databases  Annotated transcription start sites

DoOP

DoOP cluster number

DoOP subsets Cluster > Subset ▫Subset: collection of evolutionary monophyletic sequences in a cluster ▫Plant subsets  Brassicaceae  Arabidopsis thaliana  Brassicaceae species  Eudicotyledons  Grape, Solanum species, papaya, tobacco  Magnoliophyta  Maize, rice  Viridiplantae

DoOP subsets

Gene types – Gene Ontology Standardized annotation for genes ▫Biological process  What does it do?  Transcription, translation, stress response, etc ▫Cellular component  Where is it located?  Membrane, ribosome, cytosol, etc ▫Molecular function  How does it work?  Dehydrogenase, ATP binding, etc

Gene types – Gene Ontology 500 bp promoters ▫Search for significantly enriched terms in annotation  Brassicaceae  Eudicotyledons  Magnoliophyta  Viridiplantae  BP: transcription, translation, protein folding, stress response  CC: plasma membrane, ribosome parts  MF: ATP/GTP binding, DNA binding, ribosome parts

Motif generation Phylogenetic footprinting Functional TFBSs should be conserved Local sequence alignment Define conserved regions

Motif generation Magnoliophyta eudicotyledons Brassicaceae

Motif statistics Motif number Brassicaceae eudicotyledons Magnoliophyta Viridiplantae

Motif statistics % conserved Brassicaceae eudicotyledons532 Magnoliophyta652 Viridiplantae421 Avg length Brassicaceae999 eudicotyledons777 Magnoliophyta898 Viridiplantae999

TFBS databases DatabaseTFBSs TRANSFAC977 JASPAR18 PLACE416 PlantCARE646 ABS650 AGRIS72 Lots of redundant data Low quality, not updated More than a 100 different version for TATA box

Synthetic biology ▫iGEM competition ▫BioBricks ▫MIT Registry of Standard Biological Parts  UV responsive promoter  Promoter expressed in roots  Etc Synthetic promoters ▫Define basic promoter elements ▫Build and use custom made promoters ▫Gene expression more or less when and where you want it

SNP conservation Gene expression levels change because ▫Regulatory elements change ▫Usually NOT protein coding regions Conserved promoter regions might be functional regulatory elements ▫Search for SNPs in this regions ▫These SNPs might be interesting for breeders as theye are likely to be functional ones

A real example Vilmos Soós, Endre Sebestyén, Angéla Juhász, János Pintér, Marnie E. Light, Johannes Van Staden, Ervin Balázs (2009) Stress-related genes define essential steps in the response of maize seedlings to smoke-water. Functional and Integrative Genomics, Volume 9, Number 2, Pages ; doi: /s Microarray experiments ▫Maize kernels (Mv 540) ▫24 and 48 h – control vs smoke treated samples ▫Up and downregulated genes  Promoter sequences up to 1500 bp were extracted if available

Analysis of promoters TRANSFAC database version 12.1 ▫Collection of TFBSs ▫More than a 100 plant TFBSs  DRE-element: GCCGAC Scan for the TFBSs in the maize promoters ▫Up and downregulated Also count the frequencies of all 5-8mer sequences ▫In all available maize promoters, not only the up or downregulated Calculate the over or underrepresentation of a TFBS by the following ▫Observed frequency in up or downregulated promoters divided by the expected frequency in all promoters ▫If ratio > 1 : overrepresented ▫If ratio < 1 : underrepresented

Analysis of promoters Results ▫Binding sites related to  Organogenesis  Meristem development  Housekeeping functions  Biotic stress  Cold and dehydration stress  ABA related motifs

Thank you for your attention!