Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

BLAST Sequence alignment, E-value & Extreme value distribution.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Sequence Similarity Searching Class 4 March 2010.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Practical algorithms in Sequence Alignment Sushmita Roy BMI/CS 576 Sep 16 th, 2014.
Sequence alignment, E-value & Extreme value distribution
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
BLAST : Basic local alignment search tool B L A S T !
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Bacterial Genetics - Assignment and Genomics Exercise: Aims –To provide an overview of the development and.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
What is BLAST? Basic BLAST search What is BLAST?
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Identifying templates for protein modeling:
Genome Center of Wisconsin, UW-Madison
BLAST.
BLAST.
Sequence alignment, Part 2
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Searching Molecular Databases with BLAST

Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration and exercises

Why learn sequence database searching? What have I cloned ? Is this really “my gene” ? Has someone else already found it ? What is this protein’s function ? What is it related to ? Can I get more sequence easily ?

Search programs are sequence alignment programs They try to find the best alignment between your probe sequence and every target sequence in the database Finding optimal alignments is computationally a very resource intensive process It is usually not necessary to find optimal alignments, particularly for large databases Alignments are ranked and only top scores are reported

Practical database search methods incorporate shortcuts The fastest sequence database searching programs use heuristic algorithms The basic concept is to break the search and alignment process down into several steps At each step, only a best scoring subset is retained for further analysis

What does ‘HEURISTIC’ mean? “a commonsense rule (or set of rules) intended to increase the probability of solving some problem” Why consider every possible alignment once a reasonably good alignment is found?

Heuristic programs find approximate alignments They are less sensitive than “dynamic programming” algorithms such as Smith-Waterman for detecting weak similarity In practice, they run much faster and are usually adequate The BLAST program developed by Stephen Altschul and coworkers at the NCBI is the most widely used heuristic program

BLAST is a collection of five programs for different combinations of query and database sequences

ProgramQueryDatabase BLASTNDNA BLASTPprotein BLASTXtranslated DNA protein TBLASTNproteintranslated DNA TBLASTXtranslated DNA translated DNA

Why BLAST is great Very fast and can be used to search extremely large databases Sufficiently sensitive and selective for most purposes Robust - the default parameters can usually be used

BLAST scores are reported in two columns Raw values based on the specific scoring matrix employed As bits, which are matrix independent normalized values (bigger = better) Significance is represented by E values (smaller = better)

Typical BLAST Output Sorted by E value

The EXPECT (E) threshold is used to control score reporting A match will only be reported if its E value falls below the threshold set The default value for E is 10, which means that 10 matches with scores this high are expected to be found by chance Lower EXPECT thresholds are more stringent, and report fewer matches

Interpreting BLAST scores Score interpretation is based on context –What is the question? –What else do you know about the sequences? –Scoring is highly dependent on probe length Exact matches will usually have the highest scores (and lowest E values) –Short exact matches may score lower than longer partial matches

Interpreting BLAST scores Short exact matches are expected to occur at random. Partial matches over the entire length of a query are stronger evidence for homology than are short exact matches.

Homology vs Identity Homologous sequences are descended from a common ancestral sequence. Homology is either true or false. It can never be partial! Saying two sequences are 45% homologous is a misuse of the term. Sequence identity and similarity can be described as a percentage and are used as evidence of homology.

BLAST Example Is this sequence known? What does it encode?

Search Strategy Choose the BLAST program: –nucleotide query vs. nucleotide db –megabalst: optimized to find identical sequences –blastn: will find identical and similar sequences Choose the Database –nr (non-redundant) – everything –genome specific

blastn Options Paste Query Sequence HERE Choose Database HERE Choose search program HERE

Each line is a hit in the database sorted vertically by E value Colored rectangles along the X axis show where in the query sequence a similarity in the database has been found. Color indicates degree of similarity

Output sorted by E value

Link to GenBank file

Link to alignment

Link to Entrez Gene

blastn Alignment

BLASTP Example

blastp input

blastp Databases

nr - All non-redundant GenBank CDS translations + PDB + SwissProt+PIR swissprot - the last major release of the SWISS- PROT protein sequence database pat - patented sequences pdb - Sequences derived from the 3-dimensional structure Protein Data Bank env_nr - Non-redundant environmental samples blastp Databases

BLASTP Output Conserved Domain Search Conserved domains are shown graphically. Link to explanation of the domain.

blastp Output

blastp Alignment

Protein Scoring Matrices Blosom 62 is the default BLASTP scoring matrix

Different Matrices Produce slightly different alignments

Other BLAST Programs: Psi-BLAST 4.6 PSI-BLAST is designed for more sensitive protein-protein similarity searches. Position-Specific Iterated (PSI)-BLAST is the most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family. Use PSI-BLAST when your standard protein-protein BLAST search either failed to find significant hits, or returned hits with descriptions such as "hypothetical protein" or "similar to...".

Other BLAST Programs: Phi-BLAST 4.7 PHI-BLAST can do a restricted protein pattern search. Pattern-Hit Initiated (PHI)-BLAST is designed to search for proteins that contain a pattern specified by the user AND are similar to the query sequence in the vicinity of the pattern. This dual requirement is intended to reduce the number of database hits that contain the pattern, but are likely to have no true homology to the query.

Sequence filters Since only a limited number of matches are reported, hits to simple repeats and other low complexity sequences can obscure other more biologically meaningful similarities Filters are used to remove low complexity sequences from the probe Low Complexity, human repeats (blastn)

Low Complexity Sequences are Filtered Out

BLASTN vs BLASTP Protein sequences have much higher information content than nucleotide sequence To find evidence for sequence homology, use BLASTP and search protein sequences Is my sequence already in the database? To find identical sequences, search nucleotide databases

Translated BLAST Searches translations use all 6 frames computationally intensive tblastx searches can be very slow with some large databases must specify genetic code

Alternate Genetic Codes

Translated BLAST Searches

Taxonomy Reports

BLAST Genomes

Align 2 Sequences with BLAST

BLAST from ORF Finder

Primer BLAST

BLAST Tutorial BLAST tutorial on Biocomp Web page Goal: demonstrate utility and difference between BLASTN and BLASTP searches BLASTN: is my DNA sequence in the database? BLASTP: are there related (homologs) proteins in the database?