Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pairwise Sequence Alignment

Similar presentations


Presentation on theme: "Pairwise Sequence Alignment"— Presentation transcript:

1 Pairwise Sequence Alignment

2 WHAT?

3 WHAT? Given any two sequences (DNA or protein) Seq 1: Seq 2:
CATATTGCAGTGGTCCCGCGTCAGGCT Seq 2: TAAATTGCGTGGTCGCACTGCACGCT we are interested to know to what extent they are similar? CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

4 WHY?

5 Discover function Study evolution Find crucial features within a sequence Identify cause of diseases

6 Discover function Sequences that are similar probably have the same function

7 Study evolution If two sequences from different organisms are similar , they may have a common ancestor

8 Find crucial features Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse. Regions in the sequences that are strongly conserved between different sequences can indicate their functional importance CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

9 Identify cause of disease
Comparison of sequences between individuals can detect changes that are related to diseases

10 Sickle Cell Anemia Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source:

11 Healthy Individual >gi| |ref|NM_ | Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi| |ref|NP_ | beta globin [Homo sapiens] MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

12 Diseased Individual >gi| |ref|NM_ | Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi| |ref|NP_ | beta globin [Homo sapiens] MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

13 How do sequences change?

14 Sequence Modifications
Three types of changes Substitution (point mutation) Insertion Deletion Indel (replication slippage) TCCGT TCAGT TCAGT TCGAGT TCGT

15 In order to align two sequences we need a quantitive model to evaluate similarity between sequences.
How do we quantitate sequence similarity ?

16 Substitutions Only not including indels
Sequences compared base-by-base Count the number of matches and mismatches For example :Matches score +2, Mismatches score -1 TTCGTCGTAGTCGGCTCGACCTG GTACGTCTAGCGAGCGTGATCCT 9 matches +18 14 mismatches -14 Total score +4 A weak match

17 TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT-
Including Indels Create an ‘alignment’ Count matches within alignment Indels are scored as mismatches -1 TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT- 17 matches +34 2 mismatches - 2 8 indels - 8 Total score +24 A strong match

18 Choosing an Alignment Many different alignments are possible Should consider all possible Take the best score found There may be more than one best alignment TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT- +24 -TTCGT-CGTAGTC-GGCTCG-ACCTG GTAC-GTCTA-GCGAGCGT-GATCC-T

19 Why is it hard ? Alignment requires an algorithm that performs a number of comparisons roughly proportional to the square of the average sequence length n2.

20 Dynamic Programming A method for reducing a complex problem
to a set of identical sub-problems The best solution to one sub-problem is independent from the best solution to the other sub-problem

21 Dynamic Programming A method for reducing a complex problem
to a set of identical sub-problems The best solution to one sub-problem is independent from the best solution to the other sub-problem

22 What does it mean? If a path from X→Z passes through Y, the best path from X→Y is independent of the best path from Y→Z

23 Sequence Global Alignment Needleman-Wunsch
Sequences: A = ACGCTG, B = CATGT A C G C T G 1 2 3 4 5 6 C 1 A 2 T 3 G 4 T Z 5

24 Example -1 2 2 3 4 5 -2 ? Sequences: A = ACGCTG, B = CATGT
Score of best alignment between AC and CATG 2 …between ACG and CATG 2 3 4 5 -2 …between AC and CATGT Calculate score between ACG and CATGT ? Match:+2, Other:-1

25 Example 2 3 4 5 Align the next Insertion in the letter in the
sequences Insertion in the first sequence (del) 3 5 2 3 4 5 - 5 Insertion in the Second sequence 3 -

26 Example 2 3 4 5 -1 2 1 -2 Sequences: A = ACGCTG, B = CATGT
-1 from before plus -1 for mismatch of G against T  -2 2 from before plus -1 for mismatch of – against T  1 2 3 4 5 -1 2 -2 from before plus -1 for mismatch of G against –  -3 1 Cell gets highest score of -2,1,-3  1 -2 Sequences: A = ACGCTG, B = CATGT

27 Sequences: A = ACGCTG, B = CATGT
Example 2 3 4 5 1 -1 2 -2 Sequences: A = ACGCTG, B = CATGT

28 A 1 C 2 G 3 4 T 5 6 C 1  A 2  T 3  G 4  T 5 

29 A 1 C 2 G 3 4 T 5 6 -1 C 1  A 2  T 3  G 4  T 5  A -

30 ACGCTG ------ A 1 C 2 G 3 4 T 5 6 0 -1 -2 -3 -4 -5 -6 C 1 A 2 T 3 G 4
A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACGCTG ------

31 ----- CATGT A 1 C 2 G 3 4 T 5 6 0 -1 -2 -3 -4 -5 -6 C 1 A 2 T 3 G 4
A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ----- CATGT

32 A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  A C

33 A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  AC -C

34 A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACG -C-

35 ACGC ---C ACGC -C-- A 1 C 2 G 3 4 T 5 6 0 -1 -2 -3 -4 -5 -6 C 1 A 2
A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACGC ---C ACGC -C--

36 A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACG -CA

37 A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5 

38 A 1 C 2 G 3 4 T 5 6 -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5 

39 A 1 C 2 G 3 4 T 5 6 -1 C 1  A 2  T 3  G 4  T 5 

40 A 1 C 2 G 3 4 T 5 6 -1 C 1  A 2  T 3  G 4  T 5  ACGCTG- -C-ATGT

41 A 1 C 2 G 3 4 T 5 6 -1 C 1  A 2  T 3  G 4  T 5  ACGCTG- -CA-TGT

42 A 1 C 2 G 3 4 T 5 6 -1 C 1  A 2  T 3  G 4  T 5  -ACGCTG CATG-T-

43 Needleman-Wunsch Alignment
Summary Needleman-Wunsch Alignment Global alignment between sequences Compare entire sequence against another Create scoring table Sequence A across top, B down left Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B Global alignment score is bottom right cell

44 Global vs. Local alignment
DorothyHodkin DorothyCrowfootHodkin DOROTHY HODGKIN Global alignment: DOROTHY HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:

45 Local Alignment Smith-Waterman
Best score for aligning part of sequences Often beats global alignment score Global Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

46 Global vs. Local alignment
Alignment of two Genomic sequences >Human DNA CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA >Mouse DNA CATGCGTCTGACgctttttgctagcgatatcggactATCGATATA

47 Global vs. Local alignment
Alignment of two Genomic sequences Global Alignment Human:CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA Mouse:CATGCGTCTGACgct---ttttgctagcgatatcggactATCGAT-ATA ****** ***** * *** * ****** *** Human:CATGCGACTGAC Mouse:CATGCGTCTGAC Human:ATCGATCATA Mouse:ATCGAT-ATA Local Alignment

48 Global vs. Local alignment
Alignment of two Genomic DNA and mRNA >Human DNA CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA >Human mRNA CATGCGACTGACATCGATCATA

49 Global vs. Local alignment
Alignment of two Genomic DNA and mRNA Global Alignment DNA: CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA mRNA:CATGCGACTGAC ATCGATCATA ************ ********** DNA: CATGCGACTGAC mRNA:CATGCGACTGAC DNA: ATCGATCATA mRNA:ATCGATCATA Local Alignment


Download ppt "Pairwise Sequence Alignment"

Similar presentations


Ads by Google