Presentation is loading. Please wait.

Presentation is loading. Please wait.

Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.

Similar presentations


Presentation on theme: "Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally."— Presentation transcript:

1 Local alignments Seq X: Seq Y:

2 Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally maximal: can not make it better by trimming/extending the alignment Seq X: Seq Y:

3 Local alignment Seq X: Seq Y:  Why local? –Parts of sequence diverge faster evolutionary pressure does not put constraints on the whole sequence –Proteins have modular construction sharing domains between sequences

4 Global  local alignment  Take the old, good equation  Look at the result of the global alignment (on the right) -seque - s e q CAGCACTTGGATTCTCG- CA-C-----GATTCGT-G a) global align b) retrieve the result c)  score along the result  align. pos.

5 Local alignment – breaking the alignment  A recipe –Just don’t let the score go below 0 –Start the new alignment when it happens –Where is the result in the matrix? Before:After:  align. pos. q e s 0- euqes- q e 0s 0- euqes-  

6 Local alignment – the equation  Init the matrix with 0’s  Read the maximal value from anywhere in the matrix  Find the result with backtracking M[i, j] = M[i, j- 1 ] – g M[i- 1, j] – g M[i- 1, j- 1 ] + score(X[i],Y[j]) max 0 Great contribution to science! q e s - euqes-

7 Amino acid substitution matrices

8 Point Accepted Mutations distance and matrices  Accepted by natural selection –not lethal –not silent  Def.: S 1 and S 2 are PAM 1 distant if on avg. there was one mutation per 100 aa  Q.: If the seqs are PAM 8 distant, how many residues may be diffent?

9 PAM matrix  Created from “easy”alignments –pairwise –gapless –85% id

10 PAM matrix  How to calculate M for PAM 2 distance? –Take more distant sequences –or extrapolate...

11 Alignments BLAST - Basic Local Alignment Search Tool

12 Genome vs. gene

13 The amount of genetic information in organisms

14 http://www.cbs.dtu.dk/staff/dave/roanoke/genetics980313.html Largest genome: amoeba Chaos chaos (200x human genome)‏ http://www.lawrence.edu/dept/biology/animal/

15 Sequence searching - challenges  Exponential growth of databases

16 Sequence searching – definition  Task: –Query: short, new sequence (~1000 letters)‏ –Database (searching space): very many sequences –Goal: find seqs homologous to the query

17 Sequence searching – definition  We want: –fast tool –primarily a filter: most sequences will be unrelated to the query –fine-tune the alignment later

18 Database Search Algorithms: Sensitivity, Selectivity True Positive (TP) – a homology detected (positive) correctly (true)

19 Database Search Algorithms: Sensitivity, Selectivity Sensitivity =TP/(TP+FN)‏ Selectivity =TN/(TN+FP)‏ Sensitivity Selectivity Courtesy of Gary Benson (ISSCB 2003)‏

20 What is BLAST  Basic Local Alignment Search Tool  Bad news: it is only a heuristics –Heuristics: A rule of thumb that often helps in solving a certain class of problems, but makes no guarantees. Perkins, DN (1981) The Mind's Best Work  Basic idea: –High scoring segments have well conserved (almost identical) part –As well conserved part are identified, extend it to the real alignment q e s - euqes-

21 What means well conserved for BLAST?  BLAST works with k-words (words of length k)‏ –k is a parameter –different for DNA (>10) and proteins (2..4)‏  word w 1 is T-similar to w 2 if the sum of pair scores is at least T (e.g. T=12)‏ Similar 3-words W 1 :R K P W 2 :R R P Score:9 –1 7 sum = 15

22 BLAST algorithm 3 basic steps  Preprocess the query: extract all the k-words  Scan for T-similar matches in database  Extend them to alignments 1) Preprocess 2) Scan 3) Extend

23 BLAST, Step 1: Preprocess the query  Take the query (e.g. LVNRKPVVP )‏  Chop it into overlapping k-words (k=3 in this case)‏ Query:LVNRKPVVP Word1:LVN Word2: VNR Word3: NRK … For each word find all similar words (scoring at least T) E.g. for RKP the following 3-words are similar: QKP KKP RQP REP RRP RKP 1) Preprocess 2) Scan 3) Extend

24 Finite state machine AC*T|GGC  abstract machine  constant amount of memory (states)‏  used in computation and languages  recognizes regular expressions –cp dmt*.pdf /home/john 1) Preprocess 2) Scan 3) Extend

25 BLAST, Step 2: Find ¨exact¨ matches with scanning  Use all the T-similar k-words to build the Finite State Machine  Scan for exact matches...VLQKPLKKPPLVKRQPCCEVVRKPLVKVIRCLA... QKP KKP RQP REP RRP RKP... movement 1) Preprocess 2) Scan 3) Extend

26 BLAST, Step 3: Extending ¨exact¨ matches  Having the list of exact matches we extend alignment in both directions Query: L V N R K P V V P T-similar: R R P Subject: G V C R R P L K C Score:-3 4 -3 5 2 7 1 -2 -3 …till the sum of scores drops below some level X (e.g. X=-100) from the best known - what with gaps? 1) Preprocess 2) Scan 3) Extend

27 Gapped BLAST (now standard)‏  gapped local alignments are computed: much, much, much slower  therefore: modified “Hit criteria” 1) Preprocess 2) Scan 3) Extend

28 Hit criteria  Extends the alignment only if there are close two hits on the same diagonal –sensitivity would drop without lowering T –reduces extensions (90% time is spend on extensions)‏  Gapped local alignments are computed –increased sensitivity allows us raise T –raising T speeds up the search 1) Preprocess 2) Scan 3) Extend dbpos query pos close hit, same diag

29 Gapped BLAST v BLAST  We end up with –same speed (even a bit faster overall) –gapped alignments! –much higher sensitivity

30 BLAST flavours  blastp : protein query, protein db  blastn : DNA query, DNA db  blastx : DNA query, protein db –in all reading frames. Used to find potential translation products of an unknown nucleotide sequence.  tblastn : protein query, DNA db –database dynamically translated in all reading frames.  tblastx : DNA query, DNA db –all translations of query against all translations of db


Download ppt "Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally."

Similar presentations


Ads by Google