Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence comparison: Introduction and motivation

Similar presentations


Presentation on theme: "Sequence comparison: Introduction and motivation"— Presentation transcript:

1 Sequence comparison: Introduction and motivation
Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

2 Logistics Syllabus Should I take this class?

3 BioPics Please enroll in BioPics on your computer.

4 Motivation Why align two protein or DNA sequences?

5 Motivation Why align two protein or DNA sequences?
Determine whether they are descended from a common ancestor (homologous). Infer a common function. Locate functional elements (motifs or domains). Infer protein structure, if the structure of one of the sequences is known.

6

7

8 Sequence comparison overview
Problem: Find the “best” alignment between a query sequence and a target sequence. To solve this problem, we need a method for scoring alignments, and an algorithm for finding the alignment with the best score. The alignment score is calculated using a substitution matrix, and gap penalties. The algorithm for finding the best alignment is dynamic programming.

9 GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLP
G F+ G CP FD+ + G W+EI K+P GQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIP LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE E +G C A Y S + NG E ASFE-KGNCIQANY SLMENGNIE YMEGDLEIAPDAKY------TKQGKYVMTFKFGQ + D E++PD KQ K VL--DKELSPDGTMNQVKGEAKQSNVSEPAKLEV RVVNLVP----WVLATDYKNYAINYNCD-----Y + L+P W+LATDY+NYA+ Y+C QFFPLMPPAPYWILATDYENYALVYSCTTFFWLF HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT H D WIL ++ L T L HVD------FFWILGRNPYLPPETITYLKDILT-

10 Y mutates to V receives -1 M mutates to L receives 2
GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLP G F+ G CP FD+ + G W+EI K+P GQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIP LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE E +G C A Y S + NG E ASFE-KGNCIQANY SLMENGNIE YMEGDLEIAPDAKY------TKQGKYVMTFKFGQ + D E++PD KQ K VL--DKELSPDGTMNQVKGEAKQSNVSEPAKLEV RVVNLVP----WVLATDYKNYAINYNCD-----Y + L+P W+LATDY+NYA+ Y+C QFFPLMPPAPYWILATDYENYALVYSCTTFFWLF HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT H D WIL ++ L T L HVD------FFWILGRNPYLPPETITYLKDILT- Y mutates to V receives -1 M mutates to L receives 2 E gets deleted receives -10 G gets deleted receives -10 D matches D receives 6 Total score = -13

11 A simple alignment problem.
Problem: find the best pairwise alignment of GAATC and CATAC.

12 Scoring alignments GAATC CATAC GAAT-C C-ATAC -GAAT-C C-A-TAC GAATC-
We need a way to measure the quality of a candidate alignment. Alignment scores consist of two parts: a substitution matrix, and a gap penalty.

13 Scoring aligned bases Purine A G Pyrimidine C T Transversion
(low score) Transition (high score)

14 Scoring aligned bases GAATC CATAC Purine A G Pyrimidine C T A C G T 10
Transversion Transition A hypothetical substitution matrix: GAATC CATAC A C G T 10 -5 = 5

15 BLOSUM 62 A R N D C Q E G H I L K M F P S T W Y V B Z X

16 Scoring aligned bases GAAT-C CA-TAC Purine A G Pyrimidine C T A C G T
Transversion (expensive) Transition (cheap) A hypothetical substitution matrix: GAAT-C CA-TAC A C G T 10 -5 ? ? + 10 = ?

17 Scoring gaps Linear gap penalty: every gap receives a score of d.
Affine gap penalty: opening a gap receives a score of d; extending a gap receives a score of e. GAAT-C d=-4 CA-TAC = 17 G--AATC d=-4 CATA--C e=-1 = 5

18 Y mutates to V receives -1 M mutates to L receives 2
GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLP G F+ G CP FD+ + G W+EI K+P GQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIP LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE E +G C A Y S + NG E ASFE-KGNCIQANY SLMENGNIE YMEGDLEIAPDAKY------TKQGKYVMTFKFGQ + D E++PD KQ K VL--DKELSPDGTMNQVKGEAKQSNVSEPAKLEV RVVNLVP----WVLATDYKNYAINYNCD-----Y + L+P W+LATDY+NYA+ Y+C QFFPLMPPAPYWILATDYENYALVYSCTTFFWLF HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT H D WIL ++ L T L HVD------FFWILGRNPYLPPETITYLKDILT- Y mutates to V receives -1 M mutates to L receives 2 E gets deleted receives -10 G gets deleted receives -10 D matches D receives 6 Total score = -13

19 You should be able to ... Explain why sequence comparison is useful.
Define substitution matrix and different types of gap penalties. Compute the score of an alignment, given the substitution matrix and gap penalties.


Download ppt "Sequence comparison: Introduction and motivation"

Similar presentations


Ads by Google