Using Local Tools: BLAST

Slides:



Advertisements
Similar presentations
NGS Bioinformatics Workshop 1.3 Tutorial - Sequence Alignment and Searching March 22 nd, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor,
Advertisements

SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
Run BLAST in command line mode Yanbin Yin Fall
FASTA and BLAST. FASTA: Introduction FASTA (pronounced FAST-Aye) stands for FAST-All, reflecting the fact that it can be used for a fast protein comparison.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Shell Scripting Basics Arun Sethuraman. What’s a shell? Command line interpreter for Unix Bourne (sh), Bourne-again (bash), C shell (csh, tcsh), etc Handful.
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Public Resources (II) – Analysis tools  Web-based analysis tools – easy to use, but often with less customization options.  Stand-alone analysis tools.
Inti Online Login Page (Lecturer/Student/Administrator View)
Streaming Twitter. Install pycurl library Use a lab computer From the course website Download the links from pycurl and twitter streamer Extract site-packages.zip,
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
XML Files and ElementTree
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
BioPython Workshop Gershon Celniker Tel Aviv University.
Public Resources for Bioinformatics Databases : how to find relevant information. Analysis Tools.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Identifying the ortholog of TNF (Tumor necrosis factor) in mosquito genomes Pet Projects:
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Assignment feedback Everyone is doing very well!
11/6/2013BCHB Edwards Using Web-Services: NCBI E-Utilities, online BLAST BCHB Lecture 19.
Computational Skills Course week 3 Mike Gilchrist NIMR May-July 2011.
Clean up sequences with multiple >GI numbers when downloaded from NCBI BLAST website [ Example of one sequence and the duplication clean up for phylo tree.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Basic Local Alignment Search Tool BLAST Why Use BLAST?
9/28/2015BCHB Edwards Basic Python Review BCHB Lecture 8.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Parsing BLAST output. Output of a local BLAST search “less” program Full path to the BLAST output file.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
9/2/2015BCHB Edwards Introduction to Python BCHB524 Lecture 1.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Practice – file types (Cont.) Load the “Mysequence.doc” file to Webcutter using “Choose file” and then “Upload sequence file”. -Notice that the “sequence”
Install CB 1.8 on Ubuntu. Steps Followed Install Ubuntu (Ubuntu LTS) on Virtual machine – (VMware Workstation) (
Using Web-Services: NCBI E-Utilities, online BLAST BCHB Lecture 19 By Edwards & Li Slides:
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Stand-alone tools 2. 1.Download the zip file to the GMS6014 folder. 2.Unzip the files to a folder named “clustalx”. 3.Edit the MDM2_isoforms_5.fasta file.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Sequence File Parsing using Biopython
Stand alone BLAST on Linux
Introduction to Python
Introduction to Python
Using Local Tools: BLAST
Install external command line softwares
Problem with N-W and S-W
Blast Basic Local Alignment Search Tool
Introduction to Python
Using Web-Services: NCBI E-Utilities, online BLAST
The Linux Operating System
Using Web-Services: NCBI E-Utilities, online BLAST
Workshop on Microbiome and Health
Sequence File Parsing using Biopython
Genome Center of Wisconsin, UW-Madison
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Using Local Tools: BLAST
Introduction to Python
Using Local Tools: BLAST
Using Web-Services: NCBI E-Utilities, online BLAST
Sequence File Parsing using Biopython
Basic Local Alignment Search Tool
Presentation transcript:

Using Local Tools: BLAST BCHB524 2014 Lecture 20 11/10/2014 BCHB524 - 2014 - Edwards

Outline Running blast Running blast and interpreting results Exercises Format sequence databases Run by hand Running blast and interpreting results Directly and using BioPython Exercises 11/10/2014 BCHB524 - 2014 - Edwards

Local Tools For other tools: Sometimes web-based services don't do it. For blast: Too many query sequences Need to search a novel sequence database Need to change rarely used parameters Web-service is too slow For other tools: No web-service? No interactive web-site? Insufficient back-end computational resources? 11/10/2014 BCHB524 - 2014 - Edwards

Download / install standalone blast Google "NCBI Blast" …or go to http://www.ncbi.nlm.nih.gov/BLAST Click on "Help" tab Under "Other BLAST Information", Click on "Download BLAST Software and Databases" From the table find the download link your operating system and install. Blast is already installed in BCHB524 Linux virtual box instance: Type "blastn -help" in the terminal 11/10/2014 BCHB524 - 2014 - Edwards

Download BLAST databases Create folder for Blast sequence databases Create folder or "mkdir blastdb" Follow the link for database FTP site: ftp://ftp.ncbi.nlm.nih.gov/blast/db/ The FASTA directory contains compressed (.gz) FASTA format sequence databases. We'll download yeast.aa.gz and yeast.nt.gz from the FASTA folder to the blastdb folder 11/10/2014 BCHB524 - 2014 - Edwards

Uncompress FASTA databases Open up the blastdb folder Select "Extract here" for each From the terminal: cd blastdb gunzip *.gz ls –l 11/10/2014 BCHB524 - 2014 - Edwards

Format FASTA databases cd blastdb ls –l makeblastdb –help makeblastdb -in yeast.aa -dbtype prot makeblastdb -in yeast.nt -dbtype nucl 11/10/2014 BCHB524 - 2014 - Edwards

Running BLAST from the command-line We need a query sequence to search: Copy and paste this FASTA file into IDLE and save as "query.fasta" in your home folder. >gi|6319267|ref|NP_009350.1| Yal049cp MASNQPGKCCFEGVCHDGTPKGRREEIFGLDTYAAGSTSPKEKVIVILTDVYGNKFNNVLLTADKFASAGYMVFVPDILF GDAISSDKPIDRDAWFQRHSPEVTKKIVDGFMKLLKLEYDPKFIGVVGYCFGAKFAVQHISGDGGLANAAAIAHPSFVSI EEIEAIDSKKPILISAAEEDHIFPANLRHLTEEKLKDNHATYQLDLFSGVAHGFAARGDISIPAVKYAKEKVLLDQIYWF NHFSNV >gi|6319268|ref|NP_009351.1| Yal048cp MTKETIRVVICGDEGVGKSSLIVSLTKAEFIPTIQDVLPPISIPRDFSSSPTYSPKNTVLIDTSDSDLIALDHELKSADV IWLVYCDHESYDHVSLFWLPHFRSLGLNIPVILCKNKCDSISNVNANAMVVSENSDDDIDTKVEDEEFIPILMEFKEIDT CIKTSAKTQFDLNQAFYLCQRAITHPISPLFDAMVGELKPLAVMALKRIFLLSDLNQDSYLDDNEILGLQKKCFNKSIDV NELNFIKDLLLDISKHDQEYINRKLYVPGKGITKDGFLVLNKIYAERGRHETTWAILRTFHYTDSLCINDKILHPRLVVP DTSSVELSPKGYRFLVDIFLKFDIDNDGGLNNQELHRLFKCTPGLPKLWTSTNFPFSTVVNNKGCITLQGWLAQWSMTTF LNYSTTTAYLVYFGFQEDARLALQVTKPRKMRRRSGKLYRSNINDRKVFNCFVIGKPCCGKSSLLEAFLGRSFSEEYSPT IKPRIAVNSLELKGGKQYYLILQELGEQEYAILENKDKLKECDVICLTYDSSDPESFSYLVSLLDKFTHLQDLPLVFVAS KADLDKQQQRCQIQPDELADELFVNHPLHISSRWLSSLNELFIKITEAALDPGKNTPGLPEETAAKDVDYRQTALIFGST VGFVALCSFTLMKLFKSSKFSK 11/10/2014 BCHB524 - 2014 - Edwards

Running BLAST from the command-line Step out of the blastdb folder cd .. Check the contents of the query.fasta file more query.fasta Run blast from the command-line (one-line) blastp -db blastdb/yeast.aa -query query.fasta -out results.txt …and check out the result in results.txt. more results.txt 11/10/2014 BCHB524 - 2014 - Edwards

Running BLAST from the command-line Parsing text-format BLAST results is hard: Use XML format output where possible Run blast from the command-line (one-line) blastp -db blastdb/yeast.aa -query query.fasta -outfmt 5 -out results.xml …and check out the result in results.xml. more results.xml 11/10/2014 BCHB524 - 2014 - Edwards

Interpreting blast results Use BioPython's BLAST parser from Bio.Blast import NCBIXML result_handle = open("results.xml") for blast_result in NCBIXML.parse(result_handle):     for alignment in blast_result.alignments:         for hsp in alignment.hsps:             if hsp.expect < 1e-5:                 print '****Alignment****'                 print 'sequence:', alignment.title                 print 'length:', alignment.length                 print 'e value:', hsp.expect 11/10/2014 BCHB524 - 2014 - Edwards

Running BLAST from Python: Generic Python Technique Python can run other programs, including blast and capture the output # Special module for running other programs from subprocess import Popen, PIPE, STDOUT # Set the blast program and arguements as strings blast_prog = '/usr/bin/blastp' blast_args = '-query query.fasta -db blastdb/yeast.aa' # The Popen instance runs a program proc = Popen(blast_prog + " " + blast_args,              stdout=PIPE, stderr=STDOUT, shell=True) # proc.stdout behaves like an open file-handle... for l in proc.stdout:     if l.startswith('Query='):         print '\n'+l.rstrip()+'\n'     if l.startswith('  gi|'):         print l.rstrip() 11/10/2014 BCHB524 - 2014 - Edwards

Running BLAST from BioPython with Text-Parsing Use BioPython to make command and run # Special modules for running blast from Bio.Blast.Applications import NcbiblastpCommandline blast_prog   = '/usr/bin/blastp' blast_query = 'query.fasta' blast_db    = 'blastdb/yeast.aa' # Build the command-line cmdline = NcbiblastpCommandline(cmd=blast_prog,                                 query=blast_query,                                 db=blast_db,                                 out="results.txt") # ...and execute. stdout, stderr = cmdline() # Parse the results by opening the output file result = open("results.txt") for l in result:     if l.startswith('Query='):         print '\n'+l.rstrip()+'\n'     if l.startswith('  gi|'):         print l.rstrip() 11/10/2014 BCHB524 - 2014 - Edwards

Running BLAST from BioPython with ElementTree XML-Parsing # Special modules for running blast from Bio.Blast.Applications import NcbiblastpCommandline # Set the blast program and arguments as strings blast_prog   = '/usr/bin/blastp' blast_query = 'query.fasta' blast_db    = 'blastdb/yeast.aa' # Build the command-line cmdline = NcbiblastpCommandline(cmd=blast_prog,                                 query=blast_query,                                 db=blast_db,                                 outfmt=5,                                 out="results.xml") # ...and execute. stdout, stderr = cmdline() # Parse the results by opening the output file from xml.etree import ElementTree as ET result = open("results.xml") doc = ET.parse(result) root = doc.getroot() for ele in root.getiterator('Iteration'):     queryid = ele.findtext('Iteration_query-def')     for hit in ele.getiterator('Hit'):         hitid = hit.findtext('Hit_id')         for hsp in hit.getiterator('Hsp'):             evalue = hsp.findtext('Hsp_evalue')             print '\t'.join([queryid,hitid,evalue])             break BioPython to make command and run ElementTree to parse the resulting XML 11/10/2014 BCHB524 - 2014 - Edwards

NCBI Blast Parsing Results need to be parsed in order to be useful… from Bio.Blast import NCBIXML result_handle = open("results.xml") for blast_result in NCBIXML.parse(result_handle):     for alignment in blast_result.alignments:         for hsp in alignment.hsps:             if hsp.expect < 1e-5:                 print '****Alignment****'                 print 'sequence:', alignment.title                 print 'length:', alignment.length                 print 'e value:', hsp.expect                 print hsp.query[0:75] + '...'                 print hsp.match[0:75] + '...'                 print hsp.sbjct[0:75] + '...' from Bio.Blast import NCBIXML result_handle = open("query.xml") for blast_result in NCBIXML.parse(result_handle): for alignment in blast_result.alignments: for hsp in alignment.hsps: if hsp.expect < 1e-5: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' 11/10/2014 BCHB524 - 2014 - Edwards

NCBI Blast Parsing Each blast result contains multiple alignments of a query sequence to a database sequence Each alignment consists of multiple high-scoring pairs (HSPs) Each HSP has stats like expect, score, gaps, and aligned sequence chunks 11/10/2014 BCHB524 - 2014 - Edwards

NCBI Blast Parsing Skeleton from Bio.Blast import NCBIXML result_handle = # ... # each blast_result corresponds to one query sequence for blast_result in NCBIXML.parse(result_handle):     # blast_result.query is query description, etc.     print blast_result.query     # Each description contains a one-line summary of an alignment     for desc in blast_result.descriptions:         # title, score, e      print desc.title, desc.score, desc.e     # We can get the alignments one at a time, too     # Each alignment corresponds to one database sequence     for alignment in blast_result.alignments:         # alignment.title is database description         print alignment.title               # each query/database alignment consists of multiple      # high-scoring pair alignment "chunks"      for hsp in alignment.hsps: # HSP statistics are here          # hsp.expect, hsp.score, hsp.positives, hsp.gaps          print hsp.expect, hsp.score, hsp.positives, hsp.gaps 11/10/2014 BCHB524 - 2014 - Edwards

Exercise Find potential fruit fly / yeast orthologs Download FASTA files drosoph-ribosome.fasta.gz and yeast-ribosome.fasta.gz from the course data-directory. Uncompress and format each FASTA file for BLAST Search fruit fly ribosomal proteins against yeast ribosomal proteins For each fruit fly query, output the best yeast protein if it has a significant E-value. What ribosomal protein is most highly conserved between fruit fly and yeast? 11/10/2014 BCHB524 - 2014 - Edwards