Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple sequence alignments and motif discovery Tutorial 5.

Similar presentations


Presentation on theme: "Multiple sequence alignments and motif discovery Tutorial 5."— Presentation transcript:

1 Multiple sequence alignments and motif discovery Tutorial 5

2 Multiple sequence alignment –ClustalW –Muscle Motif discovery –MEME –Jaspar Multiple sequence alignments and motif discovery

3 More than two sequences –DNA –Protein Evolutionary relation –Homology  Phylogenetic tree –Detect motif Multiple Sequence Alignment GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A DB C GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC

4 Dynamic Programming –Optimal alignment –Exponential in #Sequences Progressive –Efficient –Heuristic Multiple Sequence Alignment GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A DB C GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC

5 ClustalW “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al Pairwise alignment – calculate distance matrix Guided tree Progressive alignment using the guide tree

6 ClustalW Progressive –At each step align two existing alignments or sequences –Gaps present in older alignments remain fixed -TGTTAAC -TGT-AAC -TGT--AC ATGT---C ATGT-GGC

7 ClustalW - Input http://www.ebi.ac.uk/Tools/clustalw2/index.html Input sequences Gap scoring Scoring matrix Email address Output format

8 ClustalW - Output Match strength in decreasing order: * :.

9 ClustalW - Output

10

11

12 Pairwise alignment scores Building alignment Final score Building tree

13 ClustalW - Output

14 ClustalW Output Sequence namesSequence positions Match strength in decreasing order: * :.

15 ClustalW - Output

16 Branch length

17 ClustalW - Output

18

19 http://www.ebi.ac.uk/Tools/muscle/index.html Muscle

20 Muscle - output

21 What’s the difference between Muscle and ClustalW? ClustalWMuscle

22 http://www.megasoftware.net/index.html

23 Can we find motifs using multiple sequence alignment? 12345678910 A000000.51/61/300 D00.51/3001/65/61/60 E002/31000015/6 G01/60011/30000 H01/600000000 N0 00000000 Y1000000.5 00 1 3 5 7 9..YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE.. * :** *: Motif A widespread pattern with a biological significance

24 Can we find motifs using multiple sequence alignment? YES! NO

25 MEME – Multiple EM* for Motif finding http://meme.sdsc.edu/ Motif discovery from unaligned sequences –Genomic or protein sequences Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence) *Expectation-maximization

26 MEME - Input Email address Input file (fasta file) How many times in each sequence? How many motifs? How many sites? Range of motif lengths

27 MEME - Output Motif score

28 MEME - Output Motif length Number of times Motif score

29 MEME - Output Low uncertainty = High information content

30 MEME - Output Multilevel Consensus

31 Sequence names Position in sequence Strength of match Motif within sequence MEME - Output

32 Overall strength of motif matches Motif location in the input sequence MEME - Output Sequence names

33 MAST Searches for motifs (one or more) in sequence databases: –Like BLAST but motifs for input –Similar to iterations of PSI-BLAST Profile defines strength of match –Multiple motif matches per sequence –Combined E value for all motifs MEME uses MAST to summarize results: –Each MEME result is accompanied by the MAST result for searching the discovered motifs on the given sequences. http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi

34 MEME - Input Email address Input file (motifs) Database

35 JASPARJASPAR Profiles –Transcription factor binding sites –Multicellular eukaryotes –Derived from published collections of experiments Open data accesss

36 JASPARJASPAR profiles –Modeled as matrices. –can be converted into PSSM for scanning genomic sequences. 12345678910 A000000.51/61/300 D00.51/3001/65/61/60 E002/31000015/6 G01/60011/30000 H01/600000000 N0 00000000 Y1000000.5 00

37 Search profile http://jaspar.genereg.net/

38 score organism logo Name of gene/protein


Download ppt "Multiple sequence alignments and motif discovery Tutorial 5."

Similar presentations


Ads by Google