1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Measuring the degree of similarity: PAM and blosum Matrix
Lecture 8 Alignment of pairs of sequence Local and global alignment
Introduction to Bioinformatics
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Heuristic alignment algorithms and cost matrices
Sequence analysis course
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Heuristic Approaches for Sequence Alignments
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Scoring matrices Identity PAM BLOSUM.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Dayhoff’s Markov Model of Evolution. Brands of Soup Revisited Brand A Brand B P(B|A) = 2/7 P(A|B) = 2/7.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
An Introduction to Bioinformatics
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Tutorial 4 Substitution matrices and PSI-BLAST 1.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Sequence Alignment.
Construction of Substitution matrices
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Protein Sequence Alignment Multiple Sequence Alignment
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Pairwise Sequence Alignment and Database Searching
Sequence similarity, BLAST alignments & multiple sequence alignments
Sequence Based Analysis Tutorial
Alignment IV BLOSUM Matrices
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology

2 Topics to be covered: " BLAST as a Sequence Alignment Tool " Uses of BLAST " Types of BLAST " How BLAST works – Scanning for 'hits' – Scoring with Substitution Matrices " Common Databases for Use with BLAST available at NCBI " Interpretation of Blast Results " Blast options: on the net or on your computer " Learning More About BLAST, " A BLAST demo

3 gi| |gb|AAG | (AF232004) HrpL [Pseudomonas syringae pv. tomato] Length = 184 Score = 347 bits (889), Expect = 4e-95 Identities = 182/184 (98%), Positives = 183/184 (98%) Query: 1 MFQKIVILDSTQPRQPSSSAGIRQMTADQIQMLRAFIQKRVMNPDDVDDILQCVFLEALR 60 MFQKIVILDSTQPRQPSSSAGIRQMTADQIQMLRAFIQKRVMNPDDVDDILQCVFLEALR Sbjct: 1 MFQKIVILDSTQPRQPSSSAGIRQMTADQIQMLRAFIQKRVMNPDDVDDILQCVFLEALR 60 Query: 61 NEHKFQHASKPQTWLCGIALNLIRNHFRKMYRQPYQESWEDEVHSELEGHGDVSHQVDGH 120 NEHKFQHASKPQTWLCGIALNLIRNHFRKMYRQPYQESWEDEVHSELEGHGDVSHQV+GH Sbjct: 61 NEHKFQHASKPQTWLCGIALNLIRNHFRKMYRQPYQESWEDEVHSELEGHGDVSHQVEGH 120 Query: 121 RQLARVIQAIDCLPSNMQKVLEVSLEMDGNYQETANSLGVPIGTVRSRLSRARVQLKQQI 180 RQLARVIQAIDCLPSNMQKVLEVSLEMDGNYQETANSLGVPIGTVRSRLS ARVQLKQQI Sbjct: 121 RQLARVIQAIDCLPSNMQKVLEVSLEMDGNYQETANSLGVPIGTVRSRLSGARVQLKQQI 180 Query: 181 DPFA 184 DPFA Sbjct: 181 DPFA 184

4

5 Sequence Alignment Tools Database Searching: BLAST: NCBI, Web Interface: WuBLAST FASTA: Smith-Waterman Par-Align: Multiple Sequence Alignment: CLUSTALW: DiAlign, Web Interface: MSA: Web Interface:

6 Uses of BLAST: Query a database for sequences similar to an input sequence.

7 Uses of BLAST: " Identify previously characterized sequences. Query a database for sequences similar to an input sequence.

8 Uses of BLAST: " Identify previously characterized sequences. " Find phylogenetically related sequences. Query a database for sequences similar to an input sequence.

9 Uses of BLAST: " Identify previously characterized sequences. " Find phylogenetically related sequences. " Identify possible functions based on similarities to known sequences. Query a database for sequences similar to an input sequence.

10 Types of BLAST: Graphic courtesy of Joel Graber.

11 How BLAST Works (1) BLAST scans database for 'words' of a predetermined length (a 'hit') with some minimum threshold parameter, T. (2) BLAST then extends the hit until the score falls below the maximum score yet attained minus some value X. Altschul, S. F. et al., Nucleic Acids Research, 25, (1997)

12 MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA Query:

13 MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA Use 2 or 3-letter words... Query:

14 MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA >gi|507311|gb|AAA | aminoglycoside 6'-N-acetyltransferase MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAMLNGEPIGYAQSYVALG SGDGWWEEETDPGVRGIDQSLANASQLGKGLGTKLVRALVELLFNDPEVTKIQTDPSPSNLRAIRCYEKA GFERQGTVTTPDGPAVYMVQTRQAFERTRSDA Query: Scan against subject sequence:

15 >gi|507311|gb|AAA | aminoglycoside 6'-N-acetyltransferase MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAMLNGEPIGYAQSYVALG SGDGWWEEETDPGVRGIDQSLANASQLGKGLGTKLVRALVELLFNDPEVTKIQTDPSPSNLRAIRCYEKA GFERQGTVTTPDGPAVYMVQTRQAFERTRSDA MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA Query: A hit!

16 >gi|507311|gb|AAA | aminoglycoside 6'-N-acetyltransferase MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAMLNGEPIGYAQSYVALG SGDGWWEEETDPGVRGIDQSLANASQLGKGLGTKLVRALVELLFNDPEVTKIQTDPSPSNLRAIRCYEKA GFERQGTVTTPDGPAVYMVQTRQAFERTRSDA MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA Query: YFP Y P Sbjct: YLP Query: Extension:

17 >gi|507311|gb|AAA | aminoglycoside 6'-N-acetyltransferase MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAMLNGEPIGYAQSYVALG SGDGWWEEETDPGVRGIDQSLANASQLGKGLGTKLVRALVELLFNDPEVTKIQTDPSPSNLRAIRCYEKA GFERQGTVTTPDGPAVYMVQTRQAFERTRSDA MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA Query: MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQL M H A Y L S V W E R L V Y P L E T I L Sbjct: MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAML Query: Extension:

18 >gi|507311|gb|AAA | aminoglycoside 6'-N-acetyltransferase MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAMLNGEPIGYAQSYVALG SGDGWWEEETDPGVRGIDQSLANASQLGKGLGTKLVRALVELLFNDPEVTKIQTDPSPSNLRAIRCYEKA GFERQGTVTTPDGPAVYMVQTRQAFERTRSDA MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA Query: MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQL M H A Y L S V W E R L V Y P L E T I L Sbjct: MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAML Query: Extension:

19 >gi|507311|gb|AAA | aminoglycoside 6'-N-acetyltransferase MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAMLNGEPIGYAQSYVALG SGDGWWEEETDPGVRGIDQSLANASQLGKGLGTKLVRALVELLFNDPEVTKIQTDPSPSNLRAIRCYEKA GFERQGTVTTPDGPAVYMVQTRQAFERTRSDA MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQLA Query: MVSHSAAQAYSMLTNSEFVSMWSAESCRTPLCSVNNSYFPGALGDEKTTKVITQL M H A Y L S V W E R L V Y P L E T I L Sbjct: MTEHDLAMLYEWLNRSHIVEWWGGEEARPTLADVQEQYLPSVLAQESVTPYIAML Query: Extension: HSP: A High-Scoring Segment Pair

20 Towards BLAST Scoring " Expected negative score for alignment of two random residues. " Maximal score for a perfect match. " Combinations of residues that can commonly substitute for one another in proteins may have positive score.

21 # Matrix made by matblas from blosum62.iij # * column uses minimum score # BLOSUM Clustered Scoring Matrix in 1/2 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 62 # Entropy = , Expected = A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X *

22 BLAST Scoring " Nominal HSP scores (S) are sums of scores from substitution matrices. " Nominal scores are normalized to give 'bit scores' (S'): 1 Allows comparison of alignments scored by different methods K and are statistical parameters that relate the calculated score to the probability finding a hit with at least that score. K and are statistical parameters that relate the calculated score to the probability finding a hit with at least that score. (I)

23

24 Substitution Matrices Scores in the substitution matrix are expressed in 'log-odds' format: q ij = target frequency pi, pj = frequency those residues appear by chance = normalization parameter q ij = target frequency pi, pj = frequency those residues appear by chance = normalization parameter 1 The more frequently the substitution occurs, the higher the score. 1 The less frequently the residue occurs in the sequence as a whole, the higher the score. (V)

25 Substitution Matrices " Derived from empirically observed substitution frequencies " Higher scores for substitution with similar residues. " Random substitutions give negative scores

26 Types of Substitution Matrices Each tailored to a specific degree of evolutionary divergence. PAM Matrices: – 'Percent Accepted Mutation' – start with closely related sequences, and extrapolate substitution probabilities for more distantly related sequences. – 1 PAM unit=1 mutation event per 100 bases. – e.g.: PAM 100 tailored for 100 mutation events per 100 bases. Barker, W.C. & Dayhoff, M.O. Atlas of Protein Sequence and Structure, pp , National Biomedical Research Foundation (1972).

27 Types of Substitution Matrices BLOSUM Matrices – 'BLOck SUbstitution Matrix' – Values inferred from sequences sharing a maximum of the given value. – e.g.: BLOSUM62 derived from sequences no more than 62% identical. Henikoff, S. & Henikoff, J.G., Proc. Natl. Acad. Sci., USA, 89, (1992).

28 Comparing Substitution Matrices Similar Evolutionary Distances PAM 120 BLOSUM80 PAM160 BLOSUM62 PAM250 BLOSUM45 BLOSUM more tolerant to hydrophobic than PAM but less tolerant to hydrophilic substitutions.

29 # Matrix made by matblas from blosum62.iij # * column uses minimum score # BLOSUM Clustered Scoring Matrix in 1/2 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 62 # Entropy = , Expected = A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X *

30

31 >gi| |gb|AAF |U22895_1 (U22895) alternative sigma factor AlgU [Azotobacter vinelandii] Length = 193 Score = 334 bits (857), Expect = 2e-91 Identities = 180/192 (93%), Positives = 189/192 (97%) Query: 1 MLTQEQDQQLVERVQRGDKRAFDLLVLKYQHKILGLIVRFVHDAQEAQDVAQEAFIKAYR 60 ML QEQDQQLVERVQRGD+RAFDLLVLKYQHKILGLIVRFVHDA EAQDVAQEAFIKAYR Sbjct: 1 MLNQEQDQQLVERVQRGDRRAFDLLVLKYQHKILGLIVRFVHDAHEAQDVAQEAFIKAYR 60 Query: 61 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDVTAEDAEFFEGDHALKDIESPE 120 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDV+A DAEF+EGDHALKDIESPE Sbjct: 61 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDVSAGDAEFYEGDHALKDIESPE 120 Query: 121 RAMLRDEIEATVHQTIQQLPEDLRTALTLREFEGLSYEDIATVMQCPVGTVRSRIFRARE 180 R++LRDEIEATVH+TIQQLPEDLRTALTLREF+GLSYEDIA+VMQCPVGTVRSRIFRARE Sbjct: 121 RSLLRDEIEATVHRTIQQLPEDLRTALTLREFDGLSYEDIASVMQCPVGTVRSRIFRARE 180 Query: 181 AIDKALQPLLRE 192 AIDKALQPLL+E Sbjct: 181 AIDKALQPLLQE 192 Interpreting Blast Results

32 >gi| |gb|AAF |U22895_1 (U22895) alternative sigma factor AlgU [Azotobacter vinelandii] Length = 193 Score = 334 bits (857), Expect = 2e-91 Identities = 180/192 (93%), Positives = 189/192 (97%) Query: 1 MLTQEQDQQLVERVQRGDKRAFDLLVLKYQHKILGLIVRFVHDAQEAQDVAQEAFIKAYR 60 ML QEQDQQLVERVQRGD+RAFDLLVLKYQHKILGLIVRFVHDA EAQDVAQEAFIKAYR Sbjct: 1 MLNQEQDQQLVERVQRGDRRAFDLLVLKYQHKILGLIVRFVHDAHEAQDVAQEAFIKAYR 60 Query: 61 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDVTAEDAEFFEGDHALKDIESPE 120 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDV+A DAEF+EGDHALKDIESPE Sbjct: 61 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDVSAGDAEFYEGDHALKDIESPE 120 Query: 121 RAMLRDEIEATVHQTIQQLPEDLRTALTLREFEGLSYEDIATVMQCPVGTVRSRIFRARE 180 R++LRDEIEATVH+TIQQLPEDLRTALTLREF+GLSYEDIA+VMQCPVGTVRSRIFRARE Sbjct: 121 RSLLRDEIEATVHRTIQQLPEDLRTALTLREFDGLSYEDIASVMQCPVGTVRSRIFRARE 180 Query: 181 AIDKALQPLLRE 192 AIDKALQPLL+E Sbjct: 181 AIDKALQPLLQE 192 Alignment with query sequence Interpreting Blast Results Hit name

33 >gi| |gb|AAF |U22895_1 (U22895) alternative sigma factor AlgU [Azotobacter vinelandii] Length = 193 Score = 334 bits (857), Expect = 2e-91 Identities = 180/192 (93%), Positives = 189/192 (97%) Query: 1 MLTQEQDQQLVERVQRGDKRAFDLLVLKYQHKILGLIVRFVHDAQEAQDVAQEAFIKAYR 60 ML QEQDQQLVERVQRGD+RAFDLLVLKYQHKILGLIVRFVHDA EAQDVAQEAFIKAYR Sbjct: 1 MLNQEQDQQLVERVQRGDRRAFDLLVLKYQHKILGLIVRFVHDAHEAQDVAQEAFIKAYR 60 Query: 61 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDVTAEDAEFFEGDHALKDIESPE 120 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDV+A DAEF+EGDHALKDIESPE Sbjct: 61 ALGNFRGDSAFYTWLYRIAINTAKNHLVARGRRPPDSDVSAGDAEFYEGDHALKDIESPE 120 Query: 121 RAMLRDEIEATVHQTIQQLPEDLRTALTLREFEGLSYEDIATVMQCPVGTVRSRIFRARE 180 R++LRDEIEATVH+TIQQLPEDLRTALTLREF+GLSYEDIA+VMQCPVGTVRSRIFRARE Sbjct: 121 RSLLRDEIEATVHRTIQQLPEDLRTALTLREFDGLSYEDIASVMQCPVGTVRSRIFRARE 180 Query: 181 AIDKALQPLLRE 192 AIDKALQPLL+E Sbjct: 181 AIDKALQPLLQE 192 Normalized bit scores Nominal HSP scores Expectation value Number of Identities Interpreting Blast Results

34 BLAST: On the Net, and On Your Computer Advantages/Disadvantages of Net Based Blast: (1) Use databases hosted remotely at NCBI. (2) Little/No setup required. (3) But, Cannot use a customized database. Advantages/Disadvantages of Local Microcomputer-Based Blast: (1) Can Use a Customized Database. (2) Better suited to scripting / automation or when a large number of queries will be performed (UNIX). (3) But, Requires some setup and computer expertise.

35 BLAST: On the Net, and On Your Computer On the Net: On Your Computer: UNIX/MacOS/Windows ftp://ncbi.nlm.nih.gov/blast/executables/ NCBI Tools for UNIX ftp://ncbi.nlm.nih.gov/toolbox/ WUBLAST

36 Learning More about BLAST How Blast Works: Altschul, S.F. et al., Nucleic Acids Research, 25, (1997). Scoring Schemes: Karlin, S., and Altschul, S.F., Proc. Natl. Acad. Sci., 87, (1990). Henikoff, S., and Henikoff, J.G., Proc. Natl. Acad. Sci., 89, (1992). Online Tutorial