Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
AHM 2002 Tutorial on Scientific Data Mediation Example 1.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
TRANSFAC Project Roadmap Discussion.  Structure DNA-binding domain (DBD)  The portion (domain) of the transcription factor that binds DNA Trans-activating.
Tutorial 5 Motif discovery.
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik,
Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush.
Multiple sequence alignments and motif discovery Tutorial 5.
Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Copyright OpenHelix. No use or reproduction without express written consent1.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
The TRANSFAC ® System comprises 7 databases: TRANSFAC ® Professional Suite TRANSFAC ® Professional Transcription factor database TRANSCompel ® Professional.
Comparative Genomics Gene Regulatory Networks (GRNs) Anil Jegga Biomedical Informatics Contact Information: Anil Jegga Biomedical Informatics Room # 232,
Motif discovery and Protein Databases Tutorial 5.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Local Multiple Sequence Alignment Sequence Motifs
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Construction of Substitution matrices
2016/1/27Summer Course1 Pattern Search Problems Part I: Fundament Concept.
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Projects
A Very Basic Gibbs Sampler for Motif Detection
Genome Center of Wisconsin, UW-Madison
Introduction to Bioinformatics II
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
Nora Pierstorff Dept. of Genetics University of Cologne
Problems from last section
Basic Local Alignment Search Tool
Presentation transcript:

Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription Factor DB

WebLogo - Input Aligned Sequences (e.g. output of ClulatlW) RUN !

Genes: WebLogo - Output Proteins:

MEME Motif discovery from unaligned sequences  Genomic or protein sequences Identifies profile motifs  Multiple motifs for any input Flexible model of motif presence  Motif can be absent in some sequences  Can appear several times in one sequence

MEME Input addressMultiple input sequences How many times in each sequence? How many motifs? How many sites? Range of motif lengths

MEME Output (1) Motif length Number of times Like BLAST “Position-Specific Probability Matrix” = Motif Profile Diversion of motif position from background Most popular symbols

MEME Output (2) Sequence names Reverse complement (genomic input only) Position in sequence Strength of match Motif within sequence

MEME Output (3) Overall strength of motif matches Original sequence lengths Motif instance

MAST Searches for motifs (one or more) in sequence databases:  Like BLAST but motifs for input  Similar to iterations of PSI-BLAST Profile defines strength of match  Multiple motif matches per sequence  Combined E value for all motifs MEME uses MAST to summarize results:  Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

MAST Input address Database (like BLAST) Motif file (e.g. MEME output) Consider matched sequence length E value threshold

MAST Output (1) Matched accession Match E value Length of sequence Link to GenBank

MAST Output (2) Motif diagram

MAST Output (3) Position of each instance P value of instance Matched parts of sequence Motif ‘consensus’ Motif and orientation

TRANSFAC Database of eukaryotic DNA transcription regulation: Individual regulatory sites (SITES table)  Genes to which they belong  Proteins which bind them Proteins which bind sites (FACTORS table)  Cellular source of protein  Nucleotide motif profile for binding  Some grouping and classification Classification of factors (CLASS table) Position-specific matrices for select factors (MATRIX table) Cell localization (CELL table)

Searching TRANSFAC Search a single table  By identifier, factor name, gene name  By species, author Browse your way from table to table Search within a sequence  MatInspector, TFScan (EMBOSS package)

TRANSFAC Factor DT Date; author FA Factor name GE Encoding gene SF Structural features CP Cell specificity (positive) CN Cell specificity (negative) EX Expression pattern FF Functional features IN Interacting factors MX Matrix BS Binding SITE DR External databases References: RN Reference no. RX MEDLINE ID RA Reference authors RT Reference title RL Reference data

TRANSFAC Matrix Accession Position Specific Matrix Statistical basis Concensus (IUPAC subset symbols)

TRANSFAC Site (1) Accession number DNA or RNA Gene Gene region Sequence of regulatory element Position range of factor binding site

TRANSFAC Site (2) Binding factor accession Factor name Binding ‘quality’ 1functionally confirmed 2binding of pure protein 3 immunologically characterized extract 4 via known binding sequence 5 extract protein binding to bona fide element 6unassigned Organism Cellular source Methods of identifying site External links

TRANSFAC Factor (1) AC: Accession number FA: Factor name SX: Other names OS: Organism OC: Taxonomy HO: Homologs CL: Classification SZ: Size SX: Amino acid sequence

TRANSFAC Factor (2) Protein sequence reference Features and positions Structural features Cell specificity

Question A biologist at your university has found 15 target genes that she thinks are co-regulated. She gives you 15 upstream regions of length 50 base pairs in FASTA format, file DNASample50.txt, and asks you to identify the motif, and - if possible - the potential regulating protein. She tells you the sequences are from Homo sapiens, and by intuition feels the motifs of length 8. She wants you to suggest only the best possible candidate motif.

Question After you ran all the programs your biologist friend confesses that she is not sure if her intuition about the motif length was correct. Re-run the tool without knowledge of motif length. Do you get the same results? Determine a potential DNA binding protein using TRANSFACTRANSFAC