SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Run BLAST in command line mode Yanbin Yin Fall
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Bioinformatics and Phylogenetic Analysis
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Similar Sequence Similar Function Charles Yan Spring 2006.
BLAST.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
From Pairwise Alignment to Database Similarity Search.
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
BLAST : Basic local alignment search tool B L A S T !
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Identifying templates for protein modeling:
Genome Center of Wisconsin, UW-Madison
BLAST.
BLAST.
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM

OUTLINE BLAST BWA-MEM Comparisons

BLAST Basic Local Alignment Search Tool Developed by NCBI NCBI - National Center for Biotechnology Information NLM – US National Library of Medicine NIH – National Institute of Health Latest Version (executable) ftp://ftp.ncbi.nlm.nih.gov/blast+/LATEST/

BLAST A suite of tools that work together to search for similar sequences of different protein or nucleotide DNA sequences. Three Categories of Applications 1.Search Tools 2.BLAST Database Tools 3.Sequence Filtering Tools BLAST Command Line User Manual

SEARCH APPLICATIONS Execute a BLAST search. blastn – Nucleotide Blast Nucleotide database using nucleotide query. blastp - Protein Blast Protein database using protein query. blastx Protein database using translated nucleotide query. tblastx Translated nucleotide database using a translated nucleotide query. tblastn Translated nucleotide database using a protein query.

SEARCH APPLICATIONS CONT. psiblast Position-Specific Iterated BLAST Finds sequences significantly similar to the query in a database search and uses the resulting alignments to build a Position-Specific Score Matrix (PSSM). rpsblast Reverse Position-Specific BLAST Uses a query to search a database of pre-calculated PSSMs and report significant hits in a single pass. rpstblastn Searches database using a translated nucleotide query.

BLAST DATABASE APPLICATIONS Create or examine BLAST databases. makeblastdb Creates BLAST databases. blastdb_aliastool Manage BLAST databases. Search multiple databases together or search a subset of sequences within a database. makeprofiledb Builds an RPS-BLAST database. blastdbcmd Examine the contents of a BLAST database.

SEQUENCE FILTERING APPLICATIONS Segmasker Identifies and masks low complexity regions* of protein sequences. Dustmasker Similar to segmasker but for nucleotide sequences. Windowmasker Uses a genome to identify sequences represented too often to be of interest to most users. *Low-Complexity Regions – Regions of a sequence composed of few elements. These will be ignored by BLAST unless explicitly told to include them in searches. May achieve high scores that may bump more significant sequences.

BLAST ALGORITHM

E-VALUE The number of hits to see by chance when searching the database. This value decreases exponentially when the score is increased. The lower the e-value is, the more significant the match is. This also depends on the length of the query sequence. E-values will be higher with shorter sequences because there is a higher probability of a query sequence occurring in the database by chance.

BITSCORE The bitscore value is derived from the raw alignment score S. Lambda and K are statistical parameters of the scoring system.

EXAMPLE RUN

FASTA FORMAT Text-based format representing nucleotide or peptide sequences. A “>”, followed by the sequence identifier, then an optional description. >seq_1 Some description GAGGGCTCATCCGGGAATCGAACCCGGGACCT CTCGCACCCTAAGCGAGAATCATACGACTAGACC AATGAGCCGTGTTCAAAGAGTGTCAAAATGTGTTTC GAGCGTCTATGTCCAAAGTGAATTGCTTGTCTTTTGA GTTTTGCGATTG

SAMPLE OUTPUT

BWA-MEM Burrows-Wheeler Aligner A software package for aligning sequences against large reference genomes. The BWA package contains three different algorithms: BWA-backtrack, BWA-SW, and BWA- MEM. Manual Page

BWA-MEM Can align 70bp to 1Mbp MEM – Maximal Exact Matches Local alignment

HOW TO RUN Index the reference FASTA file. Run BWA-MEM with a query file (in FASTQ format) against the reference database. The output is in a SAM file format.

FASTQ FORMAT Similar to a FASTA format, but with a quality score TGGAGATGAGATTGTCGGCTTTATTACCCAGGGGC GGGGGGTTATTGTA + Y^]Lcda]YcffccffadafdWKd_V\``^\aa^BBBBBBBBBB BBBBB The quality score is an integer mapping of the probability that the base is incorrect.

SAM FILE Eleven mandatory fields and a variable amount of optional fields. The optional fields are a key-value pair of TAG:TYPE:VALUE. These store extra information.

SAM REQUIRED FIELDS

SAM OPTIONAL FIELDS

BWA-MEM ALGORITHM Seeds alignments with maximal exact matches Then, uses affine-gap Smith-Waterman algorithm.

BWA-MEM OPTIONS t – Number of threads T – Don’t output alignment with score lower than INT. a – Output all found alignments for single-end or unpaired paired-end reads. (In output, ‘*’ are considered zero.)

EXAMPLE RUN

SAMPLE OUTPUT

REFERENCES NCBI Help Manual - Bwa - FASTA - FASTQ - Li, H, et al. (2009). The Sequence Alignment/Map format and SAMtools. Vol. 25 no 16, Bioinformatics Applications Note.