Presentation is loading. Please wait.

Presentation is loading. Please wait.

BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.

Similar presentations


Presentation on theme: "BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE."— Presentation transcript:

1 BNFO 240 Usman Roshan

2 Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE

3 Database searching Suppose we have a set of 1,000,000 sequences You have a query sequence q and want to find the m closest ones in the database---that means 1,000,000 pairwise alignments! How to speed up pairwise alignments?

4 FASTA FASTA was the first software for quick searching of a database Introduced the idea of searching for k- mers Can be done quickly by preprocessing database

5 FASTA: combine high scoring hits into diagonal runs

6 BLAST Key idea: search for k-mers (short matchig substrings) quickly by preprocessing the database.

7 BLAST This key idea can also be used for speeding up pairwise alignments when doing multiple sequence alignments

8 Biologically realistic scoring matrices PAM and BLOSUM are most popular PAM was developed by Margaret Dayhoff and co-workers in 1978 by examining 1572 mutations between 71 families of closely related proteins BLOSUM is more recent and computed from blocks of sequences with sufficient similarity

9 PAM We need to compute the probability transition matrix M which defines the probability of amino acid i converting to j Examine a set of closely related sequences which are easy to align---for PAM 1572 mutations between 71 families Compute probabilities of change and background probabilities by simple counting

10 PAM In this model the unit of evolution is the amount of evolution that will change 1 in 100 amino acids on the average The scoring matrix S ab is the ratio of M ab to p b

11 PAM M ij matrix (x10000)

12 Multiple sequence alignment “Two sequences whisper, multiple sequences shout out loud”---Arthur Lesk Computationally very hard---NP-hard

13 Formally…

14 Multiple sequence alignment Unaligned sequences GGCTT TAGGCCTT TAGCCCTTA ACACTTC ACTT Aligned sequences _G_ _ GCTT_ TAGGCCTT_ TAGCCCTTA A_ _CACTTC A_ _C_ CTT_ Conserved regions help us to identify functionality

15 Sum of pairs score

16 What is the sum of pairs score of this alignment?

17 Profiles Before we see how to construct multiple alignments, how do we align two alignments? Idea: summarize an alignment using its profile and align the two profiles

18 Profile alignment

19 Iterative alignment (heuristic for sum-of-pairs) Pick a random sequence from input set S Do (n-1) pairwise alignments and align to closest one t in S Remove t from S and compute profile of alignment While sequences remaining in S –Do |S| pairwise alignments and align to closest one t –Remove t from S

20 Iterative alignment Once alignment is computed randomly divide it into two parts Compute profile of each sub-alignment and realign the profiles If sum-of-pairs of the new alignment is better than the previous then keep, otherwise continue with a different division until specified iteration limit

21 Progressive alignment Idea: perform profile alignments in the order dictated by a tree Given a guide-tree do a post-order search and align sequences in that order Widely used heuristic


Download ppt "BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE."

Similar presentations


Ads by Google