Cis-regulatory element study in transcriptome Jin Chen CSE891-001 Fall 2012 1.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Gene Regulation and Expression
STRATEGY FOR GENE REGULATION 1.INFORMATION IN NUCLEIC ACID – CIS ELEMENT CIS = NEXT TO; ACTS ONLY ON THAT MOLECULE 2.TRANS FACTOR (USUALLY A PROTEIN) BINDS.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Introduction to BioInformatics GCB/CIS535
Tutorial 5 Motif discovery.
Alternative splicing and evolution Daniel Jeffares.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Regulatory Motif Finding
Finding Regulatory Motifs in DNA Sequences
Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)
DNA & genetic information DNA replication Protein synthesis Gene regulation & expression DNA structure DNA as a carrier Gene concept Definition Models.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Express yourself That darn ribosome Mighty Mighty Proteins Mutants RNA to the Rescue
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Computational Genomics III: Gibbs motif sampler & advanced motif.
By: Kristen Wade. Short (~4 bps), non- contacted DNA sequences that do not directly interact with the binding protein Image adapted from:
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Gene Expression and Regulation
Gary Stormo by Andrew Bardee. History Born 1950 in South Dakota Undergraduate in Biology from Caltech PhD in Molecular Biology from University of Colorado.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Sequence analysis – an overview A.Krishnamachari
Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Identification of Compositionally Similar Cis-element Clusters in Coordinately Regulated Genes Anil G Jegga, Ashima Gupta, Andrew T Pinski, James W Carman,
Conservation and Evolution of Cis-Regulatory Systems Tal El-Hay Computational Biology Seminar חנוכה תשס"ו December 2005.
The TRANSFAC ® System comprises 7 databases: TRANSFAC ® Professional Suite TRANSFAC ® Professional Transcription factor database TRANSCompel ® Professional.
Motif discovery and Protein Databases Tutorial 5.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Chapter 11 Molecular Mechanisms of Gene regulation Jones and Bartlett Publishers © 2005.
Local Multiple Sequence Alignment Sequence Motifs
Regulation of Gene Expression in Bacteria and Their Viruses
Construction of Substitution matrices
Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
1 How do regulatory networks evolve? Module = group of genes co-regulated by the same regulatory system * Evolution of individual gene targets Gain or.
Last time … * Constraint on transcription factor binding sites Sites with the most ‘information content’ generally evolve slowest * Stabilizing selection.
Change in Pufs and their RNA InteractionsAnalogous change in transcription factors and their gene regulation Puf binding specificity tends to be conserved.
A Very Basic Gibbs Sampler for Motif Detection
Background for Molecular Biology of Lactase Persistence
Inferring Models of cis-Regulatory Modules using Information Theory
Edited by: Mr. Cistaro 01/13/13
Edited by: Mr. Cistaro 01/13/13
Mapping Global Histone Acetylation Patterns to Gene Expression
The Human Transcription Factors
Volume 24, Issue 4, Pages (November 2006)
Presentation transcript:

cis-regulatory element study in transcriptome Jin Chen CSE Fall

What is Cis-element Courey and Jia (2001) 2 A cis-regulatory element or cis-element is a region of DNA or RNA that regulates the expression of genes located on that same molecule of DNA Latin word “cis” means "on the same side as"

Cis-element properties Typically found in 5’ untranscribed region of the gene (promoter region) Can be specific sites for binding of activators or repressors Position and orientation of cis-element relative to transcriptional start site is usually fixed 3

Cis-element properties Short sequences Recurring patterns Sequence-specific binding sites 4

Cis-element Representations A G T A T A A G A T T A C G A C T C A G T G T A A G T G T G Consensus sequence: Prob(A) Prob(C) Prob(G) Prob(T) Probability Matrix & sequence logo: A G W N T A Sequence 1: Sequence 2: Sequence 3: Sequence 4: Sequence 5: 5

Cis-element Representation 1 Consensus based method – Refer to a sequence that matches all examples of the binding site closely but not exactly – Trade-off between ambiguity and sensitivity 6 codedescription AAdenine CCytosine GGuanine TThymine UUracil RPurine (A or G) YPyrimidine (C, T, or U) MC or A KT, U, or G WT, U, or A SC or G BC, T, U, or G (not A) DA, T, U, or G (not C) HA, T, U, or C (not G) V A, C, or G (not T, not U) N Any base (A, C, G, T, or U) IUPAC codes

Cis-element Representation 2 Sequence logos – A visual representation of the probability matrix – The total height of each column is proportional to its information content 7

Cis-element matching/discovery Pattern Matching – Discovery patterns in sequences from co- regulated genes using JASPAR and TRANSFAC matrices – Pscan Pattern Discovery – Discovery patterns in sequences from co- regulated genes without using known patterns – MEME, hmmbuild 8

Pattern Matching 9

Pattern Matching 10

Pattern Matching 11

12

13

14

Cis-element evolution Composition Location Modules chiken  A mouse  A mouse  1 Gene control regions for eye lens chrystallins Molecular Biology of the Cell, Alberts et al., 4 th ed. 15

Large Scale Analysis Identify 264 co-regulated gene groups in S. serevisiae Putative cis-regulatory elements – 80 known consensus binding sites – 597 elements by motif discovery with MEME Score enrichment of genes containing each putative element - 42 cis-elements in 35 unique groups Orthologous modules in other species Enrichment of orthologous modules 16 A. P. Gasch et al., PLoS Biol., 2004

Conservation of S. cerevisiae motifs G1 phase cell cycle ACGCGMCB Amino acid biosynthesis TGACTMGcn4p Nitrogen source GATAA GATA factors Proteasome GGTGGCAAARpn4p 17

Positions of binding sites Non random distribution Similar across species No correlations in locations across species 18

Spacing between binding sites in Methionine Biosynthesis genes Small distance between Cbf1p and Met31/32p Conserved across species Independent of exact positions 19

20 Control of iron metabolism in Mycobacterium tuberculosis. Rodriguez, Marcela. Trends in Microbiology, 2006.

Poisson Method for module discovery Look for matches to consensus sequences Mcm1 : DCCYWWWNNRG Ste12 : TGAAACA Random DNA sequence: “Pearson type III distribution”:Exponential distribution: 21 Wagner A (1999) Bioinformatics 15(10):

Cister & Comet DNA sequencesegment Cluster model: Poisson-distributed cis-elements, embedded in random DNA 22 Frith MC, Hansen U, Weng Z (2001) Bioinformatics 17(10): Frith MC, Spouge JL, Hansen U, Weng Z (2002) Nucleic Acids Research