Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.

Similar presentations


Presentation on theme: "Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that."— Presentation transcript:

1 Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that can be running just by typing the right command and options, e.g. >blastall –p blastn –I my_sequence.fasta –d refseq Which is the simplest form, where the basic program ‘blastall’ takes a number of different options or parameters indicated by the –x and followed by its value. -p -I -d There are many other parameters, and if not listed explicitly will use a default value most appropriate to the blast flavour requested. E.g. for –W blastn uses –W 11, where blastx uses –W 3. There are also some options that appear on the web pages that which are not really parameters but manage the job in some way. One of the most useful of these is on the NCBI blast pages where you can use Entrez queries or pick from an organism list to modify your search.

2 The Many Parameters of BLAST There are almost literally hundreds of parameters, but most are way too obscure even for die-hard techies like me! Very few of them are regularly useful in any but their default value, but just occasionally they are very necessary. Here are some of the ones that I have used: -e max expected value -moutput format(graphical or tabular/spreadsheet) -F filter query sequence for low complexity(default TRUE) -U use only upper case regions of query (default FALSE) -Ggap opening cost -E gap extension cost -q nucleotide mismatch penalty (BLASTx uses matrices) -r nucleotide match reward -b number of matching sequences to report -g allow gaps (default TRUE) -W word size -z effective database size (removes effect of actual database size!) -S query strands to search(default both directions) -lrestrict database sequences to given list of ‘gi‘ numbers

3

4 BLAST Parameters Exercises 1. BLASTn vs. BLASTp Go to: informatics.gurdon.cam.ac.uk/online/workshops/useful-web-sites.html Open blast-parameter-sequences.html Copy the sequence >blastn-vs-blastp and go to the NCBI BLAST Home Page. This is a Xenopus tropicalis cDNA sequence. Go to NUCLEOTIDE BLAST section. Run BLASTn against the nr nucleotide database using all default options. Then hit [format] to wait for the results in a new page. Now repeat but go to the TRANSLATED BLAST section, and BLAST against the nr protein database using BLASTx. How might the different results help us view the presence of this gene in other vertebrates?

5 BLAST Parameters Exercises 2. Low complexity filtering Go to: informatics.gurdon.cam.ac.uk/online/workshops/useful-web-sites.html Open blast-parameter-sequences.html Copy the sequence >low-complexity-filtering-A and go to the NCBI BLAST Home Page. Go to the TRANSLATED BLAST section, BLASTx. Carefully UNTICK the “Choose filter [ ] Low complexity” BOX in the second section. And then run BLASTx against the nr database.Choose filter What do you feel about these alignments? Re-run, but leave the low-complexity filter ON this time. Does this change our view of the protein matches? Now continue with >low-complexity-filtering-B and –C. C is an especially interesting case – what can we deduce about the cDNA sequence? Annotators beware!

6 BLAST Parameters Exercises 1. BLASTn vs tBLASTx and nucleotide mismatch penalties Go to: informatics.gurdon.cam.ac.uk/online/workshops/useful-web-sites.html Open blast-parameter-sequences.html Also open the NCBI BLAST Home Page and go to the SPECIAL – Align two sequences section. There are several Xenopus tropicalis cyclins. Copy the sequence >cyclin-A1-Xt to the Sequence 1 window Copy the sequence >cyclin-A2-Xt to the Sequence 2 window Run the default comparison, should be BLASTn. Note the alignment. Now run again using tBLASTx – what does this do to our understanding of the relationship between these two sequences? Are they homologs, orthologs or paralogs – or none of these? Revert to BLASTn, and try varying the values for mismatch penalties and gapping – start by reducing the mismatch penalty to -1. Can we learn anything from this? Now repeat the first parts of the exercise with cyclin-D1 in place of cyclin-A2…

7 BLAST Parameters Exercises 4. Limit Entrez query Entrez queries can be used in the NCBI BLAST web page to restrict the search to more specific items. For instance to find only matching in fruit fly proteins, enter ‘Drosophila melanogaster[ORGN]’ in the Limit by entrez query box in the second section (you can also select the organism from the adjacent drop-down list).Limit by entrez query To combine items use logical AND, OR or NOT. Go to: informatics.gurdon.cam.ac.uk/online/workshops/useful-web-sites.html Open blast-parameter-sequences.html Copy the sequence >cyclin-D1-Xt and go to the NCBI BLAST Home Page. Go to the TRANSLATED BLAST section, BLASTx, and paste the sequence. Use an Entrez query to find all rodent sequences (rat and mouse) with a good match to cyclin-D1. At what E-value do we expect we are no longer looking at cyclins? Try running the search again with that E-value as a limit…

8 BLAST Parameters Exercises 5. Word Size Go to: informatics.gurdon.cam.ac.uk/online/workshops/useful-web-sites.html Open blast-parameter-sequences.html Copy the sequence >morpholino go to the NCBI BLAST Home Page. Go to the NUCLEOTIDE BLAST section, BLASTn, and paste the sequence. Check OFF the low complexity filter, and then run the search. Now re-run the search, setting the following parameters: Low complexity OFF Expect 100 Word Size7 Other advanced -q-1 (mismatch penalty -1 instead of default -3) What difference does this make?


Download ppt "Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that."

Similar presentations


Ads by Google