Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center

Similar presentations


Presentation on theme: "Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center"— Presentation transcript:

1 Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu

2 Summer Bioinformatics Workshop 2008 2 BLAST Introduction –What is BLAST? –Query Sequence in FASTA Format –What does BLAST tell you? Choices –BLAST Programs: Which One to Use? –Commonly Used BLAST programs –BLAST Databases: Which One to Search? Understanding the Output Database Search with BLAST Blast Steps – How It Works Acknowledgement: The presentation includes adaptations from NCBI’s Introduction to Molecular Biology Information ResourcesIntroduction to Molecular Biology Information Resources Modules

3 Summer Bioinformatics Workshop 2008 3 What is BLAST? Basic Local Alignment Search Tool The Google TM of bioinformatics query is a DNA or protein sequence, not a text term character string comparison against all the sequences in the target database rigorous statistics used to identify statistically significant matches

4 Summer Bioinformatics Workshop 2008 4 Query Sequence in FASTA Format FASTA definition line ("def line") that begins with a >, followed by some text that briefly describes the query sequence on a single line up to 80 nucleotide bases or amino acids per line example and additional informationexample >gi|532319|pir|TVFV2E|TVFV2E envelope protein ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK

5 Summer Bioinformatics Workshop 2008 5 What does BLAST tell you? putative identity and function of your query sequence helps to direct experimental design to prove the function find similar sequences in model organisms (e.g., yeast, C. elegans, mouse), which can be used to further study the gene compare complete genomes against each other to identify similarities and differences among organisms

6 Summer Bioinformatics Workshop 2008 6 BLAST Programs: Which One to Use? Depends on: what type of query sequence you have (nucleotide or protein) what type of database you will search against (nucleotide or protein) Most commonly used BLAST programs –blastn –blastp –blastx

7 Summer Bioinformatics Workshop 2008 7 Commonly Used BLAST Programs BLASTN –Nucleic acids against nucleic acids BLASTP –Protein query against protein database –usually better to use than nucleotide-nucleotide BLAST –...but... if we don't have a protein query sequence, what are our options? BLASTX –Translated nucleic acids against protein database –one way to do a protein BLAST search if you have a nucleotide query sequence –the BLAST program does the translating for you, in all 6 reading framesreading frames

8 Summer Bioinformatics Workshop 2008 8 Request ID: RID An RID is like a ticket number that allows you to retrieve your search results and format them in many different ways over the next 24 hours. If you've saved RIDs from your recent searches, you can enter the RIDs directly using the Retrieve results with a Request ID page, which is accessible from the bottom of the BLAST home pageRetrieve results with a Request IDBLAST

9 Summer Bioinformatics Workshop 2008 9 Search Results: Understanding the Output Reference to BLAST paper Reminders about your specific query –RID –query sequence reminder (contains the information from your FASTA def line) –what database you searched against Graphical summary –shows where the hits aligned to your query –colors indicate score range –mouse over a colored bar to see info about that hit Text summary (GI numbers and Def lines) –GI links to complete record in Entrez –Score links to pairwise alignment between your query sequence and the hit Pairwise alignments BLAST statistics for your search

10 Summer Bioinformatics Workshop 2008 10 Database Search w/ BLAST Used most often!

11 Summer Bioinformatics Workshop 2008 11 Database Search w/ BLAST Selecting a BLAST program Insert sequence Hit “BLAST” near the end of the web page In general, if you select blastn, select “Others” as your Database to search.

12 Summer Bioinformatics Workshop 2008 12 Database Search w/ BLAST RID and search status will appear RID

13 Summer Bioinformatics Workshop 2008 13 Database Search w/ BLAST Wait for your result (patiently …)

14 Summer Bioinformatics Workshop 2008 14 Database Search w/ BLAST Interpret the result –Graphic result –The black color lines are sequences that matched the least while the red lines would be sequences that matched best. In the example below, the purple color sequences are the best matches available. Source of the image: http://www.bio.davidson.edu/courses/genomics/2006/martens/favorite_gene.htmlhttp://www.bio.davidson.edu/courses/genomics/2006/martens/favorite_gene.html

15 Summer Bioinformatics Workshop 2008 15 Database Search w/ BLAST BLAST result  Matching sequences w/ bit-score & E-value  Hyperlinks to database entry for sequence Example Notes that 3e-188 means 3  10 -188.

16 Summer Bioinformatics Workshop 2008 16 BLAST – Statistical Evaluation E Value – The number of different alignments with scores equivalent to or better than alignment score that are expected to occur in a database search by chance. – The lower the E value, the more significant the score.

17 Summer Bioinformatics Workshop 2008 17 BLAST Steps – How It Works 1. Seeding - Prepare a list of short, fixed-length segments (words) from the query 2. Searching - Find highly similar or exact match for each word 3. Extension - Extend each match to (potentially) a longer match 4. Evaluation - Evaluate the results using E values


Download ppt "Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center"

Similar presentations


Ads by Google