Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly 2010-2011 Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering.

Slides:



Advertisements
Similar presentations
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Advertisements

A Novel Knowledge Based Method to Predicting Transcription Factor Targets
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
STRATEGY FOR GENE REGULATION 1.INFORMATION IN NUCLEIC ACID – CIS ELEMENT CIS = NEXT TO; ACTS ONLY ON THAT MOLECULE 2.TRANS FACTOR (USUALLY A PROTEIN) BINDS.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
DNA Regulatory Binding Motif Search Dong Xu Computer Science Department 109 Engineering Building West
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Tutorial 5 Motif discovery.
Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Cbio course, spring 2005, Hebrew University Class: Motif Finding CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University, Jerusalem.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
CisGreedy Motif Finder for Cistematic Sarah Aerni Mentors: Ali Mortazavi Barbara Wold.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Finding Regulatory Motifs in DNA Sequences
Finding Regulatory Motifs in DNA Sequences. Motifs and Transcriptional Start Sites gene ATCCCG gene TTCCGG gene ATCCCG gene ATGCCG gene ATGCCC.
Promoter structure and gene regulation. Bacterial Promoters Source:
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
MOTIFS MOTIFS MOTIFSMARTIFAMORIFSMOOTIFSMICIFC A sequence motif is a nucleotide or amino-acid sequence pattern that is widespread (repeated) and has.
Finding Regulatory Motifs in DNA Sequences An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Chapter 11: Transcription Initiation Complex Copyright © Garland Science 2007.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Outline More exhaustive search algorithms Today: Motif finding
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
Motif discovery and Protein Databases Tutorial 5.
Algorithms in Bioinformatics: A Practical Introduction
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Local Multiple Sequence Alignment Sequence Motifs
Construction of Substitution matrices
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Regulation of Gene Expression
Regulatory Motif Finding
A Very Basic Gibbs Sampler for Motif Detection
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Learning Sequence Motif Models Using Expectation Maximization (EM)
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
Finding Regulatory Motifs in DNA Sequences
CSE 5290: Algorithms for Bioinformatics Fall 2009
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Presentation transcript:

Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering Department of Biotechnology

Main Lines Definition Definition Motif types Motif types Motifs problem Motifs problem Motifs: Profiles and Consensus Motifs: Profiles and Consensus Motif Logo Motif Logo Motif Search in Local Database Motif Search in Local Database

Definition A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA. A motif is a short conserved sequence pattern associated with distinct functions of a protein or DNA.

Motif Types 1.Regulatory sequences

Combinatorial Gene Regulation A microarray experiment showed that when gene X is knocked out, 20 other genes are not expressed – How can one gene have such drastic effects? Combinatorial Gene Regulation

Gene X encodes regulatory protein, a.k.a. a transcription factor (TF) The 20 unexpressed genes rely on gene X’s TF to induce transcription A single TF may regulate multiple genes Regulatory Protein

Every gene contains a regulatory region (RR) typically stretching bp upstream of the transcriptional start site Every gene contains a regulatory region (RR) typically stretching bp upstream of the transcriptional start site Located within the RR are the Transcription Factor Binding Sites (TFBS), also known as motifs, specific for a given transcription factor Located within the RR are the Transcription Factor Binding Sites (TFBS), also known as motifs, specific for a given transcription factor TFs influence gene expression by binding to a specific location in the respective gene’s regulatory region - TFBS TFs influence gene expression by binding to a specific location in the respective gene’s regulatory region - TFBS Regulatory Regions

A TFBS can be located anywhere within the Regulatory Region. A TFBS can be located anywhere within the Regulatory Region. TFBS may vary slightly across different regulatory regions since non-essential bases could mutate TFBS may vary slightly across different regulatory regions since non-essential bases could mutate Transcription Factor Binding Sites

gene ATCCCG gene TTCCGG gene ATCCCG gene ATGCCG gene ATGCCC Motifs and Transcriptional Start Sites

TTGACA -35 hexamer spacer TATAAT -10 hexamer Transcription start site interval bases bases A weight matrix contains more information A T G C A T G C Based on ~450 known promoters Consensus considerations

GAL4 in Yeast – Activator of galactose- induced genes (convert galactose to glucose) – Protein structure determines motif DNA-protein interactions require certain bases at specified locations Motif reflects homodimer structure Example

Motif Types 2.Motifs in protein structure

Importance Functional relationships between proteins cannot be distinguished through simple BLAST or FASTA database. Functional relationships between proteins cannot be distinguished through simple BLAST or FASTA database. Proteins often perform multiple functions that cannot be fully described using a single annotation. Proteins often perform multiple functions that cannot be fully described using a single annotation. To resolve these issues, identification of the motifs and domains becomes very useful. To resolve these issues, identification of the motifs and domains becomes very useful.

atgaccgggatactgataccgtatttggcctaggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatactgggcataaggtaca tgagtatccctgggatgacttttgggaacactatagtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgaccttgtaagtgttttccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatggcccacttagtccacttatag gtcaatcatgttcttgtgaatggatttttaactgagggcatagaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtactgatggaaactttcaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttggtttcgaaaatgctctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatttcaacgtatgccgaaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttctgggtactgatagca Random Sample

Implanting Motif AAAAAAAGGGGGGG atgaccgggatactgatAAAAAAAAGGGGGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaataAAAAAAAAGGGGGGGa tgagtatccctgggatgacttAAAAAAAAGGGGGGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgAAAAAAAAGGGGGGGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAAAAAAAAGGGGGGGcttatag gtcaatcatgttcttgtgaatggatttAAAAAAAAGGGGGGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtAAAAAAAAGGGGGGGcaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttAAAAAAAAGGGGGGGctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatAAAAAAAAGGGGGGGaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttAAAAAAAAGGGGGGGa

Hard to identify – Relatively short sequences (as small as 6 bases) – Many positions not well conserved Factors improving identification – Usually localized in certain proximity of a gene (search within 3 kb upstream) – Some positions highly conserved – Use other data (Microarray?) The Challenge

Find a motif in a sample of: Find a motif in a sample of: 20 “random” sequences (e.g. 600 nt long) 20 “random” sequences (e.g. 600 nt long) each sequence containing an implanted pattern of length 15. each sequence containing an implanted pattern of length 15. each pattern appearing with 4 mismatches as (15,4) motif. each pattern appearing with 4 mismatches as (15,4) motif. Challenge Problem

atgaccgggatactgatagaagaaaggttgggggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacaataaaacggcggga tgagtatccctgggatgacttaaaataatggagtggtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgcaaaaaaagggattgtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatataataaaggaagggcttatag gtcaatcatgttcttgtgaatggatttaacaataagggctgggaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtataaacaaggagggccaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttaaaaaatagggagccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatactaaaaaggagcggaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttactaaaaaggagcgga Where is the Motif???

AgAAgAAAGGttGGG cAAtAAAAcGGcGGG..|..|||.|..||| Why Finding (15,4) Motif is Difficult? atgaccgggatactgatAgAAgAAAGGttGGGggcgtacacattagataaacgtatgaagtacgttagactcggcgccgccg acccctattttttgagcagatttagtgacctggaaaaaaaatttgagtacaaaacttttccgaatacAAtAAAAcGGcGGGa tgagtatccctgggatgacttAAAAtAAtGGaGtGGtgctctcccgatttttgaatatgtaggatcattcgccagggtccga gctgagaattggatgcAAAAAAAGGGattGtccacgcaatcgcgaaccaacgcggacccaaaggcaagaccgataaaggaga tcccttttgcggtaatgtgccgggaggctggttacgtagggaagccctaacggacttaatAtAAtAAAGGaaGGGcttatag gtcaatcatgttcttgtgaatggatttAAcAAtAAGGGctGGgaccgcttggcgcacccaaattcagtgtgggcgagcgcaa cggttttggcccttgttagaggcccccgtAtAAAcAAGGaGGGccaattatgagagagctaatctatcgcgtgcgtgttcat aacttgagttAAAAAAtAGGGaGccctggggcacatacaagaggagtcttccttatcagttaatgctgtatgacactatgta ttggcccattggctaaaagcccaacttgacaaatggaagatagaatccttgcatActAAAAAGGaGcGGaccgaaagggaag ctggtgagcaacgacagattcttacgtgcattagctcgcttccggggatctaatagcacgaagcttActAAAAAGGaGcGGa

a G g t a c T t C c A t a c g t Alignment Alignment a c g t T A g t a c g t C c A t C c g t a c g G _________________ A Profile Profile C G T _________________ Consensus Consensus A C G T A C G T Line up the patterns by their start indexes s = (s 1, s 2, …, s t ) Construct matrix profile with frequencies of each nucleotide in columns Consensus nucleotide in each position has the highest score in column Motifs: Profiles and Consensus

Motif Search in Local Database