Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.

Slides:



Advertisements
Similar presentations
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
1 CAP5510 – Bioinformatics Database Searches for Biological Sequences or Imperfect Alignments Tamer Kahveci CISE Department University of Florida.
Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Database Searching for Similar Sequences Search a sequence database for sequences that are similar to a query sequence Search a sequence database for sequences.
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
Heuristic alignment algorithms and cost matrices
We continue where we stopped last week: FASTA – BLAST
Bioinformatics For MNW 2 nd Year Lecture 20: Homology searching using heuristic methods Integrative Bioinformatics Institute VU (IBIVU)
. Class 4: Fast Sequence Alignment. Alignment in Real Life u One of the major uses of alignments is to find sequences in a “database” u Such collections.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
1 BLAST – A heuristic algorithm Anjali Tiwari Pannaben Patel Pushkala Venkataraman.
Sequence Alignment III CIS 667 February 10, 2004.
Heuristic Approaches for Sequence Alignments
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 16 th, 2014.
1-month Practical Course Genome Analysis Homology searching using heuristic methods Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
CENTRFORINTEGRATIVE BIOINFORMATICSVU E [1] Sequence Analysis C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Master.
Sequence alignment, E-value & Extreme value distribution
From Pairwise Alignment to Database Similarity Search.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 17 th, 2013.
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
BLAST What it does and what it means Steven Slater Adapted from pt.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Introduction to Bioinformatics Lecture 11: Homology searching using heuristic methods Centre for Integrative Bioinformatics VU (IBIVU)
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
FASTA and BLAST Chitta Baral. FASTA : Basic Steps Step 1: –Set a word size. (usually 6 for DNA and 2 for proteins) –Make a plot. –Find the long diagonals.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Part 2- OUTLINE Introduction and motivation How does BLAST work?
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Construction of Substitution matrices
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Heuristic Alignment Algorithms Hongchao Li Jan
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
Sequence database searching – Homology searching Dynamic Programming (DP) too slow for repeated database searches. Therefore fast heuristic methods: FASTA.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
BLAST BNFO 236 Usman Roshan. BLAST Local pairwise alignment heuristic Faster than standard pairwise alignment programs such as SSEARCH, but less sensitive.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Homology Search Tools Kun-Mao Chao (趙坤茂)
Homology searching using heuristic methods
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
Homology Search Tools Kun-Mao Chao (趙坤茂)
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Homology Search Tools Kun-Mao Chao (趙坤茂)
Fast Sequence Alignments
BIOINFORMATICS Fast Alignment
Basic Local Alignment Search Tool
Homology Search Tools Kun-Mao Chao (趙坤茂)
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Heuristic Methods for Sequence Database Searching BMI/CS Colin Dewey Fall 2010

query database

BLAST Results

BLAST Programs ProgramQueryDatabase BLASTPProtein BLASTNDNA BLASTX Translated DNA Protein TBLASTNProteinTranslated DNA TBLASTX Translated DNA

Heuristic Alignment Motivation too slow for large databases with high query traffic heuristic methods do fast approximation to dynamic programming –FASTA [Pearson & Lipman, 1988] –BLAST [Altschul et al., 1990; Altschul et al., Nucleic Acids Research 1997]

Heuristic Alignment Motivation consider the task of searching SWISS-PROT against a query sequence: –say our query sequence is 362 amino-acids long –SWISS-PROT release 38 contained 29,085,265 amino acids –finding local alignments via dynamic programming would entail matrix operations many servers handle thousands of such queries a day (NCBI > 100,000)

BLAST Overview Basic Local Alignment Search Tool BLAST heuristically finds high scoring local alignments typically used to search a query sequence against a database of sequences key tradeoff made: sensitivity vs. speed

Overview of BLAST Algorithm given: query sequence q, word length w, word score threshold T, segment score threshold S –compile a list of “words” (of length w) that score at least T when compared to words from q –scan database for matches to words in list –extend all matches to seek high-scoring alignments return: alignments scoring at least S

Determining Query Words Given: query sequence: QLNFSAGW word length w = 2 (default for protein usually w = 3) word score threshold T = 9 Step 1: determine all words of length w in query sequence QL LN NF FS SA AG GW

Determining Query Words Step 2: determine all words that score at least T when compared to a word in the query sequence QLQL=9 LNLN=10 NFNF=12, NY=9 … SAnone... words from sequence query words w/ T=9

Scanning the Database search database for all occurrences of query words approach: –index database sequences into table of words (pre-compute this) –index query words into table (at query time) NP NS NT NW NY QLNFSAGW MFNYT… NPGAT… QECFT… query sequence database sequences

Extending Hits BLAST extends hits into local alignments The original version of BLAST extended each hit separately

Extending Hits in Original Blast extend hits in both directions (without allowing gaps) terminate extension in one direction when score falls certain distance below best score for shorter extensions return segment pairs scoring at least S initial hit best extension b current extension c

Sensitivity vs. Running Time the main parameter controlling the sensitivity vs. running- time trade-off is T (threshold for what becomes a query word) –small T: greater sensitivity, more hits to expand –large T: lower sensitivity, fewer hits to expand

The Two-Hit Method extension step typically accounts for 90% of BLAST’s execution time key idea: do extension only when there are two hits on the same diagonal within distance A of each other to maintain sensitivity, lower T parameter –more single hits found –but only small fraction have associated 2nd hit

The Two-Hit Method hits with T > 12 hits with T > 10 extend these cases Figure from: Altschul et al. Nucleic Acids Research 25, 1997

Gapped BLAST trigger gapped alignment if two-hit extension has a sufficiently high score find length-11 segment with highest score; use central pair in this segment as seed run DP process both forward & backward from seed prune cells when local alignment score falls a certain distance below best score yet

Gapped BLAST seed Figure from: Altschul et al. Nucleic Acids Research 25, 1997

PSI ( Position Specific Iterated ) BLAST basic idea –use results from BLAST query to construct a profile matrix –search database with profile instead of query sequence –iterate

A Profile Matrix amino acids sequence positions A R D N C

PSI BLAST: Searching with a Profile aligning profile matrix to a simple sequence –like aligning two sequences –except score for aligning a character with a matrix position is given by the matrix itself – not a substitution matrix A R D N C CNARsequence profile

PSI BLAST: Constructing the Profile Matrix Figure from: Altschul et al. Nucleic Acids Research 25, 1997 these sequences contribute to the matrix at position 108 query sequence

BLAST Notes it’s heuristic: may miss some good matches it’s fast: empirically, 10 to 50 times faster than Smith-Waterman large impact: –NCBI’s BLAST server handles more than 100,000 queries a day –most used bioinformatics program in the world