Copyright OpenHelix. No use or reproduction without express written consent1
EBI-FASTA FASTA Protein and Nucleotide Sequence Comparison Materials prepared by: Mary E. Mangan, Ph.D. Updated: Q Version 2.0
Copyright OpenHelix. No use or reproduction without express written consent3 FASTA Agenda Introduction and Credits Principles of Sequence Comparison FASTA Protein Similarity Search Additional FASTA Searches Summary Exercises EBI-FASTA:
Copyright OpenHelix. No use or reproduction without express written consent4 EMBL-EBI Introduction European Molecular Biology Laboratory - European Bioinformatics Institute = FASTA web-interface
Copyright OpenHelix. No use or reproduction without express written consent5 EBI Menus & Feedback
Copyright OpenHelix. No use or reproduction without express written consent6 EBI Tools - FASTA Access Similarity & Homology FASTA
Copyright OpenHelix. No use or reproduction without express written consent7 Using FASTA in Several Easy Steps 1. Select Databases 2. Input Sequence 3. Set Parameters More options Click 4. Get Results sequences with similarity
Copyright OpenHelix. No use or reproduction without express written consent8 FASTA Agenda Introduction and Credits Principles of Sequence Comparison FASTA Protein Similarity Search Additional FASTA Searches Summary Exercises EBI-FASTA:
Copyright OpenHelix. No use or reproduction without express written consent9 Why Compare Sequences? PROTEIN 1 PROTEIN 2 FASTA most similar to 1.Functional Relationships 2.Structural Relationships 3.Evolutionary Relationships
Copyright OpenHelix. No use or reproduction without express written consent10 Find Functional Relationships PROTEIN 1 PROTEIN 2 FASTA most similar to similar function and role? protein function unknown transcription factor nuclear protein acts in limb development
Copyright OpenHelix. No use or reproduction without express written consent11 Find Structural Relationships PROTEIN 1 PROTEIN 2 FASTA most similar to TAASYECN SKAFSC
Copyright OpenHelix. No use or reproduction without express written consent12 Find Evolutionary Relationships PROTEIN 1 PROTEIN 2 FASTA most similar to computational phylogenetic trees
Copyright OpenHelix. No use or reproduction without express written consent13 Principles of Sequence Comparison PROTEIN 1 PROTEIN 2 KKMQMKLKVKSNVLDRAEQAEAEQK KKMQMKIKAKAAALDRKEQAEK Sequence Alignment Matrix KKMQMK I KAKAAALDRKEQAE K KK MQMK L KV KS N VLDRAEQAEAEQK exact match similar match mis- match gap : : : : : :. :. : : : : : : : : : :
Copyright OpenHelix. No use or reproduction without express written consent14 Importance of Molecular Evolution PROTEIN 1 PROTEIN 2 COMPARE COMMON ANCESTOR Evolution
Copyright OpenHelix. No use or reproduction without express written consent15 PAM250 Scoring Matrix Based upon: W. Pearson “Rapid and Sensitive Sequence Comparison with FASTP and FASTA” Methods in Enzymology (1990) 183: Sequence 2 Sequence 1
Copyright OpenHelix. No use or reproduction without express written consent16 PAM and BLOSUM matrices P Point Accepted Mutation Best for global alignments User can choose this parameter Blocks Blocks Substitution Matrix Best for local alignments Default parameter for FASTA PAMBLOSUM BLOSUM 62BLOSUM 50BLOSUM 80 PAM 250PAM 120 More Divergent Sequences Less Divergent Sequences
Copyright OpenHelix. No use or reproduction without express written consent17 Matrix Help Help Read more about matrices
Copyright OpenHelix. No use or reproduction without express written consent18 Global vs. Local Alignment aligned from first to last, but results in many gaps smaller, more local blocks of maximized alignment
Copyright OpenHelix. No use or reproduction without express written consent19 Sensitivity vs. Selectivity Trade-off SENSITIVITYSELECTIVITY Find all distantly related sequences Avoid un-related sequences with high similarity scores Trade-off: SENSITIVITY, SELECTIVITY
Copyright OpenHelix. No use or reproduction without express written consent20 FASTA Agenda Introduction and Credits Principles of Sequence Comparison FASTA Protein Similarity Search Additional FASTA Searches Summary Exercises EBI-FASTA:
Copyright OpenHelix. No use or reproduction without express written consent21 FASTA Introduction Stands for FAST-ALL Aligns all biological alphabets (protein & nucleotide) Searches for local alignments using substitution matrix Improvement upon FASTP --- increase in sensitivity --- minor decrease in selectivity Very specific at finding long regions of low similarity, esp. for highly diverged sequences
Copyright OpenHelix. No use or reproduction without express written consent22 FASTA Publication Pearson & Lipman,
Copyright OpenHelix. No use or reproduction without express written consent23 FASTA Sequence File Format sequence identifier, comments, and species HEADER must begin with > START 1-letter code SEQUENCE ¶ ¶
Copyright OpenHelix. No use or reproduction without express written consent24 FASTA Help Help Literature
Copyright OpenHelix. No use or reproduction without express written consent25 FASTA FASTA Search Method MATCH_1:...SAASMYLPGCAYYVAPSDFASKPS... MATCH_2:....ASNMYLPGCAYYVSPSDFSTKPS... MATCH_3:...SASNMYLPGCAYYVSPSDFSSKTS... OUTPUT local alignment, with ktup = 2 word hits quickly finds regions of high similarity immediately weeds out all non-matches further processes to find best matches calculates 3 scores of similarity Reference Sequence Database QUERY: SAASMYLPGCAYYVAPSDFASKPS INPUT
Copyright OpenHelix. No use or reproduction without express written consent26 FASTA Agenda Introduction and Credits Principles of Sequence Comparison FASTA Protein Similarity Search Additional FASTA Searches Summary Exercises EBI-FASTA:
Copyright OpenHelix. No use or reproduction without express written consent27 EBI-FASTA Interface Select your databases Enter sequence Set parameters Click here Protein databases by default Switch to other search types Protein databases include: UniProt, Swiss-Prot, IntAct, patent and structure databases & more
Copyright OpenHelix. No use or reproduction without express written consent28 Setting your Parameters - Step 3 Toggle open full parameters section using “More options” Default settings appropriate for most searches
Copyright OpenHelix. No use or reproduction without express written consent29 Setting your Parameters - More Options - 1
Copyright OpenHelix. No use or reproduction without express written consent30 Setting your Parameters - More Options - 2
Copyright OpenHelix. No use or reproduction without express written consent31 Setting your Parameters - More Options
Copyright OpenHelix. No use or reproduction without express written consent32 Job Submission - Step 4 Submit your job notification
Copyright OpenHelix. No use or reproduction without express written consent33 Protein Sequence Search Example CLICK SUPPORTED FORMATS
Copyright OpenHelix. No use or reproduction without express written consent34 FASTA Results - Summary Table RESULT TABS RESULTS OPTIONS UniProt database, seq ID, species linked to report
Copyright OpenHelix. No use or reproduction without express written consent35 FASTA Results - Summary Table - Source Name, cross-references & related info And more !
Copyright OpenHelix. No use or reproduction without express written consent36 FASTA Results - Summary Table - More Data exact a.a. Score similar a.a. # amino acids expectation value
Copyright OpenHelix. No use or reproduction without express written consent37 FASTA Results - Summary Table Options OPTIONS Table sorting Check individual boxes Select all or none
Copyright OpenHelix. No use or reproduction without express written consent38 FASTA Results - Summary Table - Annotations UniProt record scroll Result 1, still 9 more to view
Copyright OpenHelix. No use or reproduction without express written consent39 FASTA Results - Summary Table - Alignments Alignment of query to hit Download Download one or more sequences Details Next slide
Copyright OpenHelix. No use or reproduction without express written consent40 Tool Output - Best Scores SCORE 1 SCORE 2 SCORE 3 Download scroll Search Details Alignments Best Scores optimized score bits score E-value # amino acids Click
Copyright OpenHelix. No use or reproduction without express written consent41 Tool Output - Alignment INPUT HIT mis- match gap exact match similar match
Copyright OpenHelix. No use or reproduction without express written consent42 Tool Output - Integrated Biological Data Best Scores Perfect match UniProt annotation InterPro domains & motifs
Copyright OpenHelix. No use or reproduction without express written consent43 Visual Output sequence depiction with amino acid length INPUT OUTPUT To alignment Fixed scale Color coded by E-value Download
Copyright OpenHelix. No use or reproduction without express written consent44 Functional Predictions Options Color coded by E-value Download, switch view
Copyright OpenHelix. No use or reproduction without express written consent45 Submission Details and Submit Another Job Tabs Input parameters
Copyright OpenHelix. No use or reproduction without express written consent46 Protein Similarity Search via DNA or RNA Select DNA or RNA FASTX DNA STRAND menu active PAGE RELOADS
Copyright OpenHelix. No use or reproduction without express written consent47 FASTA Agenda Introduction and Credits Principles of Sequence Comparison FASTA Protein Similarity Search Additional FASTA Searches Summary Exercises EBI-FASTA:
FASTA Copyright OpenHelix. No use or reproduction without express written consent48 Additional FASTA Searches Click Similar Applications Click to access
Copyright OpenHelix. No use or reproduction without express written consent49 Nucleotide Similarity Search - Similar Interface Select databases Enter sequence Set parameters More options Submit job HELP PARAMETERS YOU ARE FAMILIAR WITH
Copyright OpenHelix. No use or reproduction without express written consent50 Whole Genome Shotgun Search - Similar Interface Eukaryota
Copyright OpenHelix. No use or reproduction without express written consent51 Whole Genome Shotgun: Select Genomes Scroll to select cow, dog, horse
Copyright OpenHelix. No use or reproduction without express written consent52 Whole Genome Shotgun: Add Sequence Add sequence CLICK
Copyright OpenHelix. No use or reproduction without express written consent53 Whole Genome Shotgun Results RESULTS - similar organization
Copyright OpenHelix. No use or reproduction without express written consent54 FASTA Agenda Introduction and Credits Principles of Sequence Comparison FASTA Protein Similarity Search Additional FASTA Searches Summary Exercises EBI-FASTA:
Copyright OpenHelix. No use or reproduction without express written consent55 FASTA Sequence Comparison Resource FASTA web-interface
Copyright OpenHelix. No use or reproduction without express written consent56 Relationships You Can Find with FASTA Sequence 1 Sequence 2 FASTA most similar to Functional Structural Evolutionary Relationships
Copyright OpenHelix. No use or reproduction without express written consent57 FASTA FASTA: Local Sequence Alignments MATCH_1:...SAASMYLPGCAYYVAPSDFASKPS... MATCH_2:....ASNMYLPGCAYYVSPSDFSTKPS... MATCH_3:...SASNMYLPGCAYYVSPSDFSSKTS... OUTPUT local alignment, with ktup = 2 word hits quickly finds regions of high similarity immediately weeds out all non-matches further processes to find best matches calculates 3 scores of similarity Reference Sequence Database QUERY: SAASMYLPGCAYYVAPSDFASKPS INPUT
Copyright OpenHelix. No use or reproduction without express written consent58 FASTA: Easy and Powerful 1. Select Databases 2. Input Sequence 3. Set Parameters More options Click 4. Get Results sequences with similarity
Copyright OpenHelix. No use or reproduction without express written consent59 FASTA Agenda Introduction and Credits Principles of Sequence Comparison FASTA Protein Similarity Search Additional FASTA Searches Summary Exercises EBI-FASTA:
Copyright OpenHelix. No use or reproduction without express written consent60