Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial 5 Motif discovery.

Similar presentations


Presentation on theme: "Tutorial 5 Motif discovery."— Presentation transcript:

1 Tutorial 5 Motif discovery

2 Multiple sequence alignments and motif discovery
MEME MAST TOMTOM GOMO PROSITE

3 Can we find motifs using multiple sequence alignment?
A widespread pattern with a biological significance ..YDEEGGDAEE.. ..YGEEGADYED.. ..YDEEGADYEE.. ..YNDEGDDYEE.. ..YHDEGAADEE.. 1 2 3 4 5 6 7 8 9 10 A 3/6 1/6 2/6 D 5/6 E 4/6 G 1/3 H N Y

4 Can we find motifs using multiple sequence alignment (MSA)?
YES! NO

5 Using MSA for motif discovery
Can only work if things align nicely alone For most motifs this is not the case!

6 ClustalW - Input Input sequences Scoring matrix Gap scoring
Input sequences Scoring matrix Gap scoring Output format address

7 Muscle Input sequences Output format Email address
Input sequences Output format address

8 Motif search: from de-novo motifs to motif annotation
gapped motifs Large DNA data

9 MEME – Multiple EM* for Motif finding
Motif discovery from unaligned sequences Genomic or protein sequences Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence) *Expectation-maximization

10 How many times in each sequence? Input file (fasta file)
MEME - Input address How many times in each sequence? Input file (fasta file) Range of motif lengths How many motifs? How many sites?

11 MEME - Output Motif score

12 MEME - Output Motif score Motif length Number of times

13 High information content
MEME - Output Low uncertainty = High information content

14 MEME - Output Multilevel Consensus

15 Patterns can be presented as regular expressions
[AG]-x-V-x(2)-{YW} [] - Either residue x - Any residue x(2) - Any residue in the next 2 positions {} - Any residue except these Examples: AYVACM, GGVGAA

16 MEME - Output Position in sequence Strength of match Sequence names
Motif within sequence

17 Motif location in the input sequence Overall strength of motif matches
MEME - Output Sequence names Motif location in the input sequence Overall strength of motif matches

18 What can we do with motifs?
MAST - Search for them in non annotated sequence databases (protein and DNA) TOMTOM - Find the protein who binds the DNA motifs. GOMO - Find putative target genes (DNA) of motifs and analyze their associated annotation terms. PROSITE - Search for them in annotated protein sequence databases.

19 MAST Searches for motifs (one or more) in sequence databases:
Searches for motifs (one or more) in sequence databases: Like BLAST but motifs for input Similar to iterations of PSI-BLAST Profile defines strength of match Multiple motif matches per sequence Combined E value for all motifs MEME uses MAST to summarize results: Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences.

20 MAST - Input address Database Input file (motifs)

21 Presence of the motifs in a given database
MAST - Output Input motifs Presence of the motifs in a given database

22 TOMTOM Searches one or more query DNA motifs against one or more databases of target motifs, and reports for each query a list of target motifs, ranked by p-value. The output contains results for each query, in the order that the queries appear in the input file.

23 Background frequencies
TOMTOM - Input Input motif Background frequencies Database

24 DNA IUPAC* code Example: YCAY = [TC]CA[TC]
A --> adenosine M --> A C (amino) C --> cytidine S --> G C (strong) G --> guanine W --> A T (weak) T --> thymidine B --> G T C D --> G A T R --> G A (purine) H --> A C T Y --> T C (pyrimidine) V --> G C A K --> G T (keto) N --> A G C T (any) Example: YCAY = [TC]CA[TC] *IUPAC = International Union of Pure and Applied Chemistry

25 TOMTOM - Output Input motif Matching motifs

26 TOMTOM – Output Wrong input, ok results

27 JASPAR Profiles Open data accesss Transcription factor binding sites
Multicellular eukaryotes Derived from published collections of experiments Open data accesss

28 logo Name of gene/protein organism score

29 GOMO GOMO takes DNA binding motifs to find putative target genes and analyze their associated GO terms. A list of significant GO terms that can be linked to the given motifs will be produced. GOMO returns a list of GO-terms that are significantly associated with target genes of the motif. Gene Ontology provides a controlled vocabulary to describe gene and gene product attributes in any organism.

30 GOMO - Input address Database Input file (motifs)

31 GOMO - Output MF - Molecular function BP - Biological process
Input motifs GO annotation MF - Molecular function BP - Biological process  CC - Cellular compartment

32 Prosite ProSite is a database of protein domains and motifs that can be searched by either regular expression patterns or sequence profiles.

33

34 Input motif a regular expression
Prosite - input Database Filters

35 Location in the protein sequence
Input motif Prosite - Output Location in the protein sequence protein


Download ppt "Tutorial 5 Motif discovery."

Similar presentations


Ads by Google