Presentation is loading. Please wait.

Presentation is loading. Please wait.

GENOME 559: Introduction to statistical and computational genomics

Similar presentations


Presentation on theme: "GENOME 559: Introduction to statistical and computational genomics"— Presentation transcript:

1 GENOME 559: Introduction to statistical and computational genomics
Prof. William Stafford Noble Department of Genome Sciences University of Washington

2 Logistics Syllabus Should I take this class?

3 Course goals By the end of this course, you should be able to This course will not Describe some basic computational challenges in bioinformatics. Implement and use several basic algorithms in this field. Describe several advanced algorithms. Teach molecular biology techniques. Teach you how to use off-the- shelf bioinformatics software.

4 Sequence comparison: Introduction and motivation

5 Motivation Why align two protein or DNA sequences?

6 Motivation Why align two protein or DNA sequences?
Determine whether they are descended from a common ancestor (homologous). Infer a common function. Locate functional elements (motifs or domains). Infer protein structure, if the structure of one of the sequences is known.

7

8

9 Sequence comparison overview
Problem: Find the “best” alignment between a query sequence and a target sequence. To solve this problem, we need a method for scoring alignments, and an algorithm for finding the alignment with the best score. The alignment score is calculated using a substitution matrix, and gap penalties. The algorithm for finding the best alignment is dynamic programming.

10 GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLP
G F+ G CP FD+ + G W+EI K+P GQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIP LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE E +G C A Y S + NG E ASFE-KGNCIQANY SLMENGNIE YMEGDLEIAPDAKY------TKQGKYVMTFKFGQ + D E++PD KQ K VL--DKELSPDGTMNQVKGEAKQSNVSEPAKLEV RVVNLVP----WVLATDYKNYAINYNCD-----Y + L+P W+LATDY+NYA+ Y+C QFFPLMPPAPYWILATDYENYALVYSCTTFFWLF HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT H D WIL ++ L T L HVD------FFWILGRNPYLPPETITYLKDILT-

11 Y mutates to V receives -1 M mutates to L receives 2
GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLP G F+ G CP FD+ + G W+EI K+P GQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIP LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE E +G C A Y S + NG E ASFE-KGNCIQANY SLMENGNIE YMEGDLEIAPDAKY------TKQGKYVMTFKFGQ + D E++PD KQ K VL--DKELSPDGTMNQVKGEAKQSNVSEPAKLEV RVVNLVP----WVLATDYKNYAINYNCD-----Y + L+P W+LATDY+NYA+ Y+C QFFPLMPPAPYWILATDYENYALVYSCTTFFWLF HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT H D WIL ++ L T L HVD------FFWILGRNPYLPPETITYLKDILT- Y mutates to V receives -1 M mutates to L receives 2 E gets deleted receives -10 G gets deleted receives -10 D matches D receives 6 Total score = -13

12 A simple alignment problem.
Problem: find the best pairwise alignment of GAATC and CATAC.

13 Scoring alignments GAATC CATAC GAAT-C C-ATAC -GAAT-C C-A-TAC GAATC-
We need a way to measure the quality of a candidate alignment. Alignment scores consist of two parts: a substitution matrix, and a gap penalty.

14 Scoring aligned bases Purine A G Pyrimidine C T Transversion
(low score) Transition (high score)

15 Scoring aligned bases GAATC CATAC Purine A G Pyrimidine C T A C G T 10
Transversion Transition A hypothetical substitution matrix: GAATC CATAC A C G T 10 -5 = 5

16 BLOSUM 62 A R N D C Q E G H I L K M F P S T W Y V B Z X

17 Scoring aligned bases GAAT-C CA-TAC Purine A G Pyrimidine C T A C G T
Transversion (expensive) Transition (cheap) A hypothetical substitution matrix: GAAT-C CA-TAC A C G T 10 -5 ? ? + 10 = ?

18 Scoring gaps GAAT-C d=-4 CA-TAC G--AATC d=-4 CATA--C e=-1
Linear gap penalty: every gap receives a score of d. Affine gap penalty: opening a gap receives a score of d; extending a gap receives a score of e. GAAT-C d=-4 CA-TAC = 17 G--AATC d=-4 CATA--C e=-1 = 5

19 Y mutates to V receives -1 M mutates to L receives 2
GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLP G F+ G CP FD+ + G W+EI K+P GQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIP LENENQGKCTIAEYKYDGKKASVYNSFVSNGVKE E +G C A Y S + NG E ASFE-KGNCIQANY SLMENGNIE YMEGDLEIAPDAKY------TKQGKYVMTFKFGQ + D E++PD KQ K VL--DKELSPDGTMNQVKGEAKQSNVSEPAKLEV RVVNLVP----WVLATDYKNYAINYNCD-----Y + L+P W+LATDY+NYA+ Y+C QFFPLMPPAPYWILATDYENYALVYSCTTFFWLF HPDKKAHSIHAWILSKSKVLEGNTKEVVDNVLKT H D WIL ++ L T L HVD------FFWILGRNPYLPPETITYLKDILT- Y mutates to V receives -1 M mutates to L receives 2 E gets deleted receives -10 G gets deleted receives -10 D matches D receives 6 Total score = -13

20 You should be able to ... Explain why sequence comparison is useful.
Define substitution matrix and different types of gap penalties. Compute the score of an alignment, given the substitution matrix and gap penalties.


Download ppt "GENOME 559: Introduction to statistical and computational genomics"

Similar presentations


Ads by Google