Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vorlesung Grundlagen der Bioinformatik

Similar presentations


Presentation on theme: "Vorlesung Grundlagen der Bioinformatik"— Presentation transcript:

1 Vorlesung Grundlagen der Bioinformatik

2 Information from a Single Sequence Alone Sequence alignment in molecular data analysis:

3 Information from a Single Sequence Alone Multi-Organism High Quality Sequences Sequence alignment in molecular data analysis: (M. Brudno)

4 Tools for multiple sequence alignment seq1 T Y I M R E A Q Y E seq2 T C I V M R E A Y E seq3 Y I M Q E V Q Q E seq4 Y I A M R E Q Y E

5 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

6 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

7 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

8 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

9 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Functionally important regions more conserved than non-functional regions

10 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Functionally important regions more conserved than non-functional regions Local sequence conservation indicates functionality!

11 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 - Y I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Astronomical Number of possible alignments!

12 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V - M R E A Y E seq3 - Y I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Astronomical Number of possible alignments!

13 Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 - Y I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Which one is the best ???

14 Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? objective function (`score) (2) How to find a good alignment? optimization algorithm First question far more important !

15 Tools for multiple sequence alignment Most important scoring scheme for multiple alignment: Sum-of-pairs score for global alignment.

16 Divide-and-Conquer Alignment (DCA) J. Stoye, A. Dress (Bielefeld) Approximate optimal global multiple alignment Divide sequences into small sub-sequences Use MSA to calculate optimal alignment for sub- sequences Concatenate sub-alignments

17 Divide-and-Conquer Alignment (DCA)

18

19 Tools for multiple sequence alignment Problems with traditional approach: Results depend on gap penalty Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction Algorithm produces global alignments.

20 First step in sequence comparison: alignment global alignment (Needleman and Wunsch, 1970; Clustal W) atctaatagttaatactcgtccaagtat atctgtattactaaacaactggtgctacta

21 First step in sequence comparison: alignment global alignment (Needleman and Wunsch, 1970; Clustal W) atc--taatagttaat--actcgtccaagtat ||| || || | || ||| || | | || atctgtattact-aaacaactggtgctacta-

22 First step in sequence comparison: alignment global alignment (Needleman and Wunsch, 1970; Clustal W) atc--taatagttaat--actcgtccaagtat ||| || || | || ||| || | | || atctgtattact-aaacaactggtgctacta- local alignment (Smith and Waterman, 1983) atctaatagttaatactcgtccaagtat gcgtgtattactaaacggttcaatctaacat

23 First step in sequence comparison: alignment global alignment (Needleman and Wunsch, 1970; Clustal W) atc--taatagttaat--actcgtccaagtat ||| || || | || ||| || | | || atctgtattact-aaacaactggtgctacta- local alignment (Smith and Waterman, 1983) atctaatagttaatactcgtccaagtat gcgtgtattactaaacggttcaatctaacat

24 First step in sequence comparison: alignment global alignment (Needleman and Wunsch, 1970; Clustal W) atc--taatagttaat--actcgtccaagtat ||| || || | || ||| || | | || atctgtattact-aaacaactggtgctacta- local alignment (Smith and Waterman, 1983) atc--taatagttaatactcgtccaagtat || || | || gcgtgtattact-aaacggttcaatctaacat

25 New question: sequence families with multiple local similarities Neither local nor global methods appliccable

26 New question: sequence families with multiple local similarities Alignment possible if order conserved

27 The DIALIGN approach Morgenstern, Dress, Werner (1996), PNAS 93, Combination of global and local methods Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments)

28 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

29 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

30 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

31 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

32 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

33 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

34 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

35 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

36 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc cctgaattgaataa

37 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac gg-ttcaatcgcg caaa--gagtatcacc cctgaattgaataa

38 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac gg-ttcaatcgcg caaa--gagtatcacc cctgaattgaataa Consistency!

39 The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc GG-TTCAATcgcg caaa--GAGTATCAcc CCTGaaTTGAATaa

40 The DIALIGN approach Score of an alignment: Define score of fragment f: l(f) = length of f s(f) = sum of matches (similarity values) P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences. Score w(f) = -ln P(f)

41 The DIALIGN approach Score of an alignment: Define score of fragment f: Define score of alignment as sum of scores of involved fragments No gap penalty!

42 The DIALIGN approach Score of an alignment: Goal in fragment-based alignment approach: find Consistent collection of fragments with maximum sum of weight scores

43 The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment:

44 The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment: recursive algorithm finds optimal chain of fragments.

45 The DIALIGN approach atctaatagttaaaccccctcgtgcttag agatccaaac cagtgcgtgtattactaac ggttcaatcgcgcacatccgc-- Pair-wise alignment: recursive algorithm finds optimal chain of fragments.

46 The DIALIGN approach atctaatagttaaaccccctcgtgcttag agatccaaac cagtgcgtgtattactaac ggttcaatcgcgcacatccgc-- Optimal pairwise alignment: chain of fragments with maximum sum of weights found by dynamic programming: Standard fragment-chaining algorithm Space-efficient algorithm

47 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

48 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaccctgaattgaagagtatcacataa (1) Calculate all optimal pair-wise alignments

49 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments

50 The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments

51 The DIALIGN approach Fragments from optimal pair-wise alignments might be inconsistent

52 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

53 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

54 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

55 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

56 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

57 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

58 The DIALIGN approach Fragments from optimal pair-wise alignments might be inconsistent 1. Sort fragments according to scores 2. Include them one-by-one into growing multiple alignment – as long as they are consistent (greedy algorithm, comparable to rucksack problem)

59 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

60 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

61 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

62 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

63 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Consistency problem

64 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Consistency problem

65 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

66 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

67 The DIALIGN approach atc------taatagt taaactcccccgtgcttag Cagtgcgtgtattact aacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

68 The DIALIGN approach atc------taata-----gttaaactcccccgtgcttag Cagtgcgtgtatta-----ctaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa Upper and lower bounds for alignable positions

69 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions site x = [i,p] (sequence i, position p)

70 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions Calculate upper bound b l (x,i) and lower bound b u (x,i) for each x and sequence i

71 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa Upper and lower bounds for alignable positions b l (x,i) and b u (x,i) updated for each new fragment in alignment

72 The DIALIGN approach Consistency bounds are to be updated for each new fragment that is included in to the growing Alignment Efficient algorithm (Abdeddaim and Morgenstern, 2002)

73 The DIALIGN approach Advantages of segment-based approach: Program can produce global and local alignments! Sequence families alignable that cannot be aligned with standard methods

74 Program input Program usage: > dialign2-2 [options] = multi-sequence file in FASTA-format

75 Program output DIALIGN ************* Program code written by Burkhard Morgenstern and Said Abdeddaim contact: Published research assisted by DIALIGN 2 should cite: Burkhard Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, For more information, please visit the DIALIGN home page at program call:./dialign2-2 -nt -anc s Aligned sequences: length: ================== ======= 1) dog_il ) bla 200 3) blu 200 Average seq. length: Please note that only upper-case letters are considered to be aligned.

76 Program output Alignment (DIALIGN format): =========================== dog_il4 1 cagg GTTTGA atctgataca ttgc bla 1 ctga GC CAAGTGGGAA blu 1 ttttgatatg agaaGTGTGA aacaagctat cctatattGC TAAGTGGCAG dog_il ATGGCACT GGGGTGAATG AGGCAGGCAG CAGAATGATC bla 17 ggtgtgaata catgggtttc cagtaccttc tgaggtccag agtacc---- blu 51 ccctggcttt ctATGTGCAC AGAATGGGAG GAAAGTGCCT GCTAGTGAGC dog_il4 63 GTACTGCAGC CCTGAGCTTC CACTGGCCCA TGTTGGTATC CTTGTATTTT bla TTTCCCA TGTGCTCCAT GGTGGAATGG blu 101 CAGGGACTCA GAGAGAATGG AGTATAGGGG TCAGGGCat dog_il4 113 TCCGCCCCTT CCCAGCACca gcattatcct ---GGGATTG GAGAAGGGGG bla 90 ACCACTCCTT CTCAGCACaa caaagcccaa gaaGGTGTTG CGTTCTAGAC blu GGGGTGG CCTTAGGCTC

77 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

78 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

79 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

80 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

81 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

82 The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

83 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

84 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

85 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc cctgaattgaataa

86 The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaac ggttcaatcgcg caaa--gagtatcacc cctgaattgaataa

87 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac gg-ttcaatcgcg caaa--gagtatcacc cctgaattgaataa

88 The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc GG-TTCAATcgcg caaa--GAGTATCAcc CCTGaaTTGAATaa--

89 The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac gg-ttcaatcgcg caaa--gagtatcacc cctgaattgaataa

90 Alignment of large genomic sequences Fragment-based alignment approach useful for alignment of genomic sequences. Possible applications: Detection of regulatory elements Identification of pathogenic microorganisms Gene prediction

91 DIALIGN alignment of human and murine genomic sequences

92 DIALIGN alignment of tomato and Thaliana genomic sequences

93 Alignment of large genomic sequences Gene-regulatory sites identified by mulitple sequence alignment (phylogenetic footprinting)

94 Alignment of large genomic sequences

95 Performance of long-range alignment programs for exon discovery (human - mouse comparison)

96 Performance of long-range alignment programs for exon discovery (thaliana - tomato comparison)


Download ppt "Vorlesung Grundlagen der Bioinformatik"

Similar presentations


Ads by Google