Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Last lecture summary.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 3: BLAST Sequence Analysis.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Heuristic alignment algorithms and cost matrices
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
We continue where we stopped last week: FASTA – BLAST
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Introduction to bioinformatics
CENTRFORINTEGRATIVE BIOINFORMATICSVU E [1] Sequence Analysis Sequence Analysis Lecture 3 C E N T R F O R I N T E G R A T I V E B I O I N F O.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
CENTRFORINTEGRATIVE BIOINFORMATICSVU E [1] Sequence Analysis Alignments 2: Local alignment Sequence Analysis
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Pairwise alignment Computational Genomics and Proteomics.
CENTRFORINTEGRATIVE BIOINFORMATICSVU E [1] Sequence Analysis C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Master.
Sequence alignment, E-value & Extreme value distribution
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
BLAST Workshop Maya Schushan June 2009.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Construction of Substitution Matrices
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Part 2- OUTLINE Introduction and motivation How does BLAST work?
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Construction of Substitution matrices
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Step 3: Tools Database Searching
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Bioinformatics and BLAST
Sequence alignment, Part 2
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Local alignments Seq X: Seq Y:

Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally maximal: can not make it better by trimming/extending the alignment Seq X: Seq Y:

Local alignment Seq X: Seq Y:  Why local? –Parts of sequence diverge faster evolutionary pressure does not put constraints on the whole sequence –Proteins have modular construction sharing domains between sequences

Global  local alignment  Take the old, good equation  Look at the result of the global alignment (on the right) -seque - s e q CAGCACTTGGATTCTCG- CA-C-----GATTCGT-G a) global align b) retrieve the result c)  score along the result  align. pos.

Local alignment – breaking the alignment  A recipe –Just don’t let the score go below 0 –Start the new alignment when it happens –Where is the result in the matrix? Before:After:  align. pos. q e s 0- euqes- q e 0s 0- euqes-  

Local alignment – the equation  Init the matrix with 0’s  Read the maximal value from anywhere in the matrix  Find the result with backtracking M[i, j] = M[i, j- 1 ] – g M[i- 1, j] – g M[i- 1, j- 1 ] + score(X[i],Y[j]) max 0 Great contribution to science! q e s - euqes-

Amino acid substitution matrices

Point Accepted Mutations distance and matrices  Accepted by natural selection –not lethal –not silent  Def.: S 1 and S 2 are PAM 1 distant if on avg. there was one mutation per 100 aa  Q.: If the seqs are PAM 8 distant, how many residues may be diffent?

PAM matrix  Created from “easy”alignments –pairwise –gapless –85% id

PAM matrix  How to calculate M for PAM 2 distance? –Take more distant sequences –or extrapolate...

Alignments BLAST - Basic Local Alignment Search Tool

Genome vs. gene

The amount of genetic information in organisms

Largest genome: amoeba Chaos chaos (200x human genome)‏

Sequence searching - challenges  Exponential growth of databases

Sequence searching – definition  Task: –Query: short, new sequence (~1000 letters)‏ –Database (searching space): very many sequences –Goal: find seqs homologous to the query

Sequence searching – definition  We want: –fast tool –primarily a filter: most sequences will be unrelated to the query –fine-tune the alignment later

Database Search Algorithms: Sensitivity, Selectivity True Positive (TP) – a homology detected (positive) correctly (true)

Database Search Algorithms: Sensitivity, Selectivity Sensitivity =TP/(TP+FN)‏ Selectivity =TN/(TN+FP)‏ Sensitivity Selectivity Courtesy of Gary Benson (ISSCB 2003)‏

What is BLAST  Basic Local Alignment Search Tool  Bad news: it is only a heuristics –Heuristics: A rule of thumb that often helps in solving a certain class of problems, but makes no guarantees. Perkins, DN (1981) The Mind's Best Work  Basic idea: –High scoring segments have well conserved (almost identical) part –As well conserved part are identified, extend it to the real alignment q e s - euqes-

What means well conserved for BLAST?  BLAST works with k-words (words of length k)‏ –k is a parameter –different for DNA (>10) and proteins (2..4)‏  word w 1 is T-similar to w 2 if the sum of pair scores is at least T (e.g. T=12)‏ Similar 3-words W 1 :R K P W 2 :R R P Score:9 –1 7 sum = 15

BLAST algorithm 3 basic steps  Preprocess the query: extract all the k-words  Scan for T-similar matches in database  Extend them to alignments 1) Preprocess 2) Scan 3) Extend

BLAST, Step 1: Preprocess the query  Take the query (e.g. LVNRKPVVP )‏  Chop it into overlapping k-words (k=3 in this case)‏ Query:LVNRKPVVP Word1:LVN Word2: VNR Word3: NRK … For each word find all similar words (scoring at least T) E.g. for RKP the following 3-words are similar: QKP KKP RQP REP RRP RKP 1) Preprocess 2) Scan 3) Extend

Finite state machine AC*T|GGC  abstract machine  constant amount of memory (states)‏  used in computation and languages  recognizes regular expressions –cp dmt*.pdf /home/john 1) Preprocess 2) Scan 3) Extend

BLAST, Step 2: Find ¨exact¨ matches with scanning  Use all the T-similar k-words to build the Finite State Machine  Scan for exact matches...VLQKPLKKPPLVKRQPCCEVVRKPLVKVIRCLA... QKP KKP RQP REP RRP RKP... movement 1) Preprocess 2) Scan 3) Extend

BLAST, Step 3: Extending ¨exact¨ matches  Having the list of exact matches we extend alignment in both directions Query: L V N R K P V V P T-similar: R R P Subject: G V C R R P L K C Score: …till the sum of scores drops below some level X (e.g. X=-100) from the best known - what with gaps? 1) Preprocess 2) Scan 3) Extend

Gapped BLAST (now standard)‏  gapped local alignments are computed: much, much, much slower  therefore: modified “Hit criteria” 1) Preprocess 2) Scan 3) Extend

Hit criteria  Extends the alignment only if there are close two hits on the same diagonal –sensitivity would drop without lowering T –reduces extensions (90% time is spend on extensions)‏  Gapped local alignments are computed –increased sensitivity allows us raise T –raising T speeds up the search 1) Preprocess 2) Scan 3) Extend dbpos query pos close hit, same diag

Gapped BLAST v BLAST  We end up with –same speed (even a bit faster overall) –gapped alignments! –much higher sensitivity

BLAST flavours  blastp : protein query, protein db  blastn : DNA query, DNA db  blastx : DNA query, protein db –in all reading frames. Used to find potential translation products of an unknown nucleotide sequence.  tblastn : protein query, DNA db –database dynamically translated in all reading frames.  tblastx : DNA query, DNA db –all translations of query against all translations of db