Using BLAST options to refine a search 1)Address the question “how many of the Phytophthora/tomato interaction ESTs are tomato?” A: Will depend on conditions.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Max BachourJessica Chen. Shotgun or 454 sequencing High throughput sequencing technique that can collect a large amount of data at a fast rate. Works.
Practice retrieving data and running stand alone BLAST. Step 1. Identify genes in the ABA biosynthesis pathway from the Arabidopsis Cyc database
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Baseline: Are we at the same stage? Cygwin installed Blast installed Data files: TA496Seq1.txt, PhytophSeq1.txt, TomatoSequence.txt Were the files completely.
Doug Davis Plant Science Division Univ. of Missouri 6/26/06
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Sequence alignment, E-value & Extreme value distribution
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Copyright OpenHelix. No use or reproduction without express written consent1.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Assignment feedback Everyone is doing very well!
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Condor: BLAST Rob Quick Open Science Grid Indiana University.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
Condor: BLAST Monday, 3:30pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Doug Raiford Phage class: introduction to sequence databases.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
What is sequencing? Video: WlxM (Illumina video) WlxM.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Welcome to Introduction to Bioinformatics
Bioinformatics and BLAST
Gene Annotation with DNA Subway
BLAST.
Sequence alignment, Part 2
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Additional file 3 >HWI-EAS344:7:70:153:1969#0/1 Length = 75 
Presentation transcript:

Using BLAST options to refine a search 1)Address the question “how many of the Phytophthora/tomato interaction ESTs are tomato?” A: Will depend on conditions. E-value 200 bp; identities > 95%; % match overlap > 50%: ~2100 (54%) show match with 1622 unique ESTs. 2)Can the question be more easily addressed by refining BLAST search? 3)Other BLAST options.

$./blastall.exe -e Expectation value [Real] default = 10.0

$./blastall.exe -m alignment view options: 0 = pairwise 1 = query-anchored showing identities. 7 = XML Blast output 8 = tabular 9 = tabular with comment lines

Run nucleotide BLAST (blastn) $ /cygdrive/c/Blast/bin/blastall -p blastn -d./TA496Seq1.txt -i./tomatosequence.txt –o OUTE2.txt –e 0.01 $ grep –c “Strand =“ OUTE2.txt 3 (with default this was 82…) $ /cygdrive/c/Blast/bin/blastall -p blastn -d./TA496Seq1.txt -i./PhytophSeq1.txt –o PhytOUTE1.txt –e 1e-8 $ grep –c “Strand =“ PhytOUTE1.txt 108,787 (with default this was 292,568…) NOTE: the blast which compares 3,921 sequences to a database of 116,711 sequences will take some time (15 minutes on my laptop).

Searching done Score E Sequences producing significant alignments: (bits) Value gi| |gb|BE |BE EST tomato flower buds, gi| |gb|BI |BI EST tomato flower, anth gi| |gb|AI |AI EST tomato ovary, TAMU S >gi| |gb|BE |BE EST tomato flower buds, anthesis, Cornell University Solanum lycopersicum cDNA clone cTOD9L3, mRNA sequence Length = 632 Score = 1237 bits (624), Expect = 0.0 Identities = 630/632 (99%) Strand = Plus / Plus Query: 1504 gactggctagaatggctgcaatcatggcatctacttacaaggcttatcttggcgtcggac 1563 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 1 gactggctagaatggctgcaatcatggcatctacttacaaggcttatcttggcgtcggac 60 Query: 1564 ttggtccactatcatttttgacgcagtatagaataccacatcctggaagagttggtggaa 1623 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 61 ttggtccactatcatttttgacgcagtatagaataccacatcctggaagagttggtggaa 120

Run nucleotide BLAST (blastn) $ /cygdrive/c/Blast/bin/blastall -p blastn -d./TA496Seq1.txt -i./tomatosequence.txt –o OUTE2.txt –m 8 8 = tabular format -m = alignment view options

Slycopersicum.sequencegi| |gb|BE |BE Slycopersicum.sequencegi| |gb|BI |BI Slycopersicum.sequencegi| |gb|AI |AI querry start/end bit score e-value Subject start/end length/mismatch gap openings identities

tblastn Running BLAST against a protein or peptide (translated BLAST vs nucleotide data) $ /cygdrive/c/Blast/bin/blastall -p tblastn -d./TA496Seq1.txt -i./SB txt –o PEPTIDEOUT.txt (–e #) Try: $ /cygdrive/c/Blast/bin/blastall -p tblastn -d./TA496Seq1.txt -i./SB Pep4A.txt –o PEPTIDEOUT.txt Then Try: $ /cygdrive/c/Blast/bin/blastall -p tblastn -d./TA496Seq1.txt -i./SB Pep4A.txt –o PEPTIDEOUT.txt –e 50

From Xiaodong Other useful BLAST options (1) “-b integer” number of database sequence to show alignments for. The default value is 250. To give it a smaller number will effectively reduce the size of the output file and make the BLAST searches faster. (2) “-v integer” number of database sequences to show one-line descriptions for. The default value is 500. A smaller number for “- v” option will have a similar effect as the “-b”. (3) “-a integer” number of processor to use. Most laptops have only one processor. But if they use BLAST program in a linux workstation with multiple processors, use all processors will drastically reduce the execution time.

From Xiaodong Other useful BLAST options (4) “-m 7” will give results in XML format, which is useful if the users will import the BLAST output results into the Blast2GO for GO assignment and metabolic pathway predictions. (5) “-l string” Restrict search of database to list of GI’s (gene index), a specific identifier for each sequence in GenBank. The string is the name of the file containing all the GI’s of the sequences of the subset you want to search against. Use this option for searches against subsets of a large database without creating multiple databases. The advantage of doing this is that the E values for all the searches against the subsets are comparable. If the subsets were individual databases, the sizes are different making E values incomparable between the searches.