Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

BLAST Sequence alignment, E-value & Extreme value distribution.
Types of homology BLAST
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
BINF350, Tutorial 4 Karen Marshall. Aim ► Examine how blast parameters (e.g. scoring scheme, word length) affect the alignment outcome ► To optimise blast.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Project Proposals Due Monday Feb. 12 Two Parts: Background—describe the question Why is it important and interesting? What is already known about it? Proposed.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence alignment, E-value & Extreme value distribution
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
BLAST : Basic local alignment search tool B L A S T !
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
Copyright OpenHelix. No use or reproduction without express written consent1.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Bioinformatics Workshop 1 Sequences and Similarity Searches Open a web browser and type in the URL: –informatics.gurdon.cam.ac.uk/online/workshops –Bookmark.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Assignment feedback Everyone is doing very well!
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Copyright OpenHelix. No use or reproduction without express written consent1.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
Copyright OpenHelix. No use or reproduction without express written consent1.
What is BLAST? Basic BLAST search What is BLAST?
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
What is BLAST? Basic BLAST search What is BLAST?
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Welcome to Introduction to Bioinformatics
Genome Center of Wisconsin, UW-Madison
BLAST.
BLAST.
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that can be run just by typing the appropriate command and options, e.g. >blastall –p blastn –i my_sequence.fasta –d refseq This is the simplest form: where the basic program ‘blastall’ takes a number of different options, or parameters, indicated by the –x and followed by its value. -p -i -d There are many other parameters, and if not listed explicitly they will use a default value most appropriate to the blast flavour requested. E.g. for –W blastn uses –W 11, where blastx uses –W 3. There are also some options that appear on the web pages that are not really parameters but manage the job in a similar way. One of the most useful of these is on the NCBI blast pages where you can use Entrez queries or pick from an organism list to modify your search.

The Many Parameters of BLAST There are almost literally hundreds of parameters, but most are way too obscure even for die-hard techies like me! Very few of them are regularly useful in any but their default value, but just occasionally they are very necessary. Here are some of the ones that I have used: -e max expected value -moutput format(graphical or tabular/spreadsheet) -F filter query sequence for low complexity(default TRUE) -U use only upper case regions of query (default FALSE) -Ggap opening cost -E gap extension cost -q nucleotide mismatch penalty (BLASTx uses matrices) -r nucleotide match reward -b number of matching sequences to report -g allow gaps (default TRUE) -W word size -z effective database size (removes effect of actual database size!) -S query strands to search(default both directions) -lrestrict database sequences to given list of ‘gi‘ numbers

BLAST Parameters Exercises 1. BLASTn vs. BLASTx Open the file example-sequences.html, copy the sequence: >blastn-vs-blastx This is a Xenopus tropicalis cDNA sequence. Go to the NCBI BLAST Home Page/Nucleotide-nucleotide BLAST (blastn) section. Paste your sequence into the box.Nucleotide-nucleotide BLAST (blastn) Run BLASTn against the nr nucleotide database using all default options. Then hit [format] to wait for the results in a new page. (hint if you paste the sequence definition line ‘>name’ into the box as well, your results will be labelled accordingly, which can be useful) Now repeat but go to the TRANSLATED BLAST section, and BLAST against the nr protein database using BLASTx. How might the different results help us view the presence of this gene in other vertebrates?

Results for Exercise 1. BLASTn BLASTx

BLAST Parameters Exercises 2. Low complexity filtering Open the file example-sequences.html, copy the sequence: >low-complexity-filtering-A This is sequence contains a long AT tandem repeat. Go to the NCBI BLAST Home Page/TRANSLATED BLAST section/BLASTx. Paste your sequence into the box. Carefully UNTICK the “Choose filter [ ] Low complexity” BOX in the second section. And then run BLASTx against the nr database.Choose filter What do you feel about these alignments? Re-run, but leave the low-complexity filter ON this time. Does this change our view of the protein matches? Now continue with >low-complexity-filtering-B and –C. C is an especially interesting case – what can we deduce about the cDNA sequence? Annotators beware!

Results for Exercise 2A (OFF) BLASTn – low complexity filtering OFF

Results for Exercise 2A (ON) BLASTn – low complexity filtering ON

Results for Exercise 2B ONOFF

Results for Exercise 2C There is a sequence error, an extra G at position 117 in the sequence: cDNA (117) AGAAAAGAAGAAACATGGCAATGGATCAGAA |||||||||||||||| |||||||||||||| AGAAAAGAAGAAACAT-GCAATGGATCAGAA Genomic sequence ON OFF

BLAST Parameters Exercises 3. Limit by Entrez query Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items. For instance to find only matching sequences in fruit fly, enter ‘Drosophila melanogaster[ORGN]’ in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list).Limit by entrez query To combine items use logical AND, OR or NOT. Open the file example-sequences.html. Copy the sequence >cyclin-D1-Xt and go to the NCBI BLAST Home Page/ TRANSLATED BLAST section/BLASTx, and paste the sequence. Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1. At what E-value do we expect we are no longer looking at cyclins? Try running the search again with that E-value as a limit…

BLAST Parameters Exercises 4. BLASTn vs tBLASTx and nucleotide mismatch penalties Open the file example-sequences.html. Also open the NCBI BLAST Home Page/SPECIAL – Align two sequences section. There are several Xenopus tropicalis cyclins in the examples file. Copy the sequence >cyclin-A1-Xt to the Sequence 1 BLAST window Copy the sequence >cyclin-A2-Xt to the Sequence 2 BLAST window (i) Run the default comparison, should be BLASTn. Note the alignment. Now run again using tBLASTx – what does this do to our understanding of the relationship between these two sequences? Are they homologs, orthologs or paralogs – or none of these? (ii) Revert to BLASTn, and try varying the values for mismatch penalties and gapping – start by reducing the mismatch penalty to -1. Then try reducing the gap open and gap extension penalties…. What do we learn from this? (iii) Now repeat the first parts of the exercise with cyclin-D1 in place of cyclin-A2…

Results for Exercise 4 (i) BLASTntBLASTx

Results for Exercise 4 (ii) Mismatch penalty = -2 (default)Mismatch penalty = -1

BLAST Parameters Exercises 5. Word Size Go to: informatics.gurdon.cam.ac.uk/online/workshops/useful-web-sites.html Open example-sequences.html Copy the sequence >morpholino go to the NCBI BLAST Home Page. Go to the NUCLEOTIDE BLAST section, BLASTn, and paste the sequence. Check OFF the low complexity filter, and then run the search. Now re-run the search, setting the following parameters: Low complexity OFF Expect 100 Word Size7 Other advanced -q-1 (mismatch penalty -1 instead of default -3) What difference does this make?