Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Based Analysis Tutorial

Similar presentations


Presentation on theme: "Sequence Based Analysis Tutorial"— Presentation transcript:

1 Sequence Based Analysis Tutorial
NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center

2 Retrieval, Sequence Search & Classification Methods
Retrieve protein info by text / UID Sequence Similarity Search BLAST, FASTA, Dynamic Programming Family Classification Patterns, Profiles, Hidden Markov Models, Sequence Alignments, Neural Networks Integrated Search and Classification System

3 Sequence Similarity Search (I)
Based on Pair-Wise Comparisons Dynamic Programming Algorithms Global Similarity: Needleman-Wunch Local Similarity: Smith-Waterman Heuristic Algorithms FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs PHI-BLAST: Pattern-Hit Initiated Search PSI-BLAST: Position-Specific Iterated Search

4 Sequence Similarity Search (II)
Similarity Search Parameters Scoring Matrices – Based on Conserved Amino Acid Substitution Dayhoff Mutation Matrix, e.g., PAM250 (~20% Identity) Henikoff Matrix from Ungapped Alignments, e.g., BLOSUM 62 Gap Penalty Search Time Comparisons Smith-Waterman: 10 Min FASTA: 2 Min BLAST: 20 Sec 10

5 Feature Representation
Features of Amino Acids: Physicochemical Properties, Context (Local & Global) Features, Evolutionary Features Alternative Amino Acids: Classification of Amino Acids To Capture Different Features of Amino Acid Residues

6 Substitution Matrix Likelihood of One Amino Acid Mutated into Another Over Evolutionary Time Negative Score: Unlikely to Happen (e.g., Gly/Trp, -7) Positive Score: Conservative Substitution (e.g., Lys/Arg, +3) High Score for Identical Matches: Rare Amino Acids (e.g., Trp, Cys) 10

7 BLAST BALST (Basic Local Alignment Search Tool) Extremely fast Robust
Most frequently used It finds very short segment pairs (“seeds”) between the query and the database sequence These seeds are then extended in both directions until the maximum possible score for extensions of this particular seed is reached

8 BLAST Search From BLAST Search Interface
Table-Format Result with BLAST Output and SSEARCH (Smith-Waterman) Pair-Wise Alignment Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Click to see SSearch alignment Click to see alignment

9 Blast Result & Pairwise Alignment
BLAST Aligment

10 Classification What is classification?
Why do we need protein classification? Different levels of classification Basis for functional protein classification How to classify a protein of unknown function?

11 Classification Databases
C - x(2,4) - C - x(3) - [LIVMFYWC] - x(8) - H - x(3,5) - H The 2 C's and the 2 H's are zinc ligands Group proteins according to the presence of a common domain Protein motif Protein domain 3-D structure Whole-protein Group proteins according to common 3D structure Group proteins according to common domain architecture and length Protein Domain: Structurally compact, independently folded unit that forms a stable 3D-structure and shows a certain level of evolutionary conservation Protein motif: A set of conserved amino acid residues that are important for protein function and located within a certain distance from one another

12 Family Classification Methods
Based on Other Classification Information Multiple Sequence Alignment (ClustalW) ProSite Pattern Search Profile Search Hidden Markov Models (HMMs) Domain (Pfam); Whole protein (PIRSF) Neural Networks

13 How do you build a tree? Pick sequences to align Align them
Verify the alignment Keep the parts that are aligned correctly Build and evaluate a phylogenetic tree Integrated Analysis

14 Multiple Sequence Alignment
ClustalW Progressive Pairwise Approach Base on Exhaustive Pairwise Alignments Neighbor Joining Joining Order Corresponding to a Tree Alignment Varies Dependent on Joining Order

15 Multiple Alignment and Tree
From Text/Sequence Search Result or ClustalW Alignment Interface

16 Here is an example of two different functions easily separated on a phylogenetic tree. Each functional group is used to build an HMM.

17 Motif Patterns (Regular Expressions)
Signature Patterns for Functional Motifs ProClass Motif Alignments

18 PIR Pattern Search From Text/Sequence Search Result or Pattern Search Interface One Query Sequence Against PROSITE Pattern Database One Query Pattern (PROSITE or User-Defined) Against Sequence DB

19 Pattern Search Result (I)
One Query Sequence Against PROSITE Pattern Database

20 Pattern Search Result (II)
One Query Pattern Against Sequence Database Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report 2 1 3 Sorting arrows Display the query pattern

21 Profile Method Profile: A Table of Scores to Express Family Consensus Derived from Multiple Sequence Alignments Num of Rows = Num of Aligned Positions Each row contains a score for the alignment with each possible residue. Profile Searching Summation of Scores for Each Amino Acid Residue along Query Sequence Higher Match Values at Conserved Positions

22 1 PIRSF scan Shows PIRSF that the query belongs to Search One Query Protein Against all the Full-length and Domain HMM models for the fully curated PIRSFs by HAMMER The matched regions and statistics will be displayed. Statistical data for all domains Statistical data per domain Alignment with consensus sequence

23 Secondary Structure Features
a Helix Patterns of Hydrophobic Residue Conservation Showing I, I+3, I+4, I+7 Pattern Are Highly Indicative of an a Helix (Amphipathic) b Strands That Are Half Buried in the Protein Core Will Tend to Have Hydrophobic Residues at Positions I, I+2, I+4, I+6

24 Proteins share the same fold suggesting homology
3D Structure Proteins share the same fold suggesting homology Gamma Crystallin C Beta B1 Crystallin

25 Creation and Curation of PIRSFs

26 Integrated Bioinformatics System for Function and Pathway Discovery
Data Integration Associative Analysis

27 Analytical Pipeline Family Classification & Functional Analysis
Query Sequence UniProt Top-Matched Superfamilies/Domains BLAST Search HMM Domain Search Predicated Superfamilies/Domains/Motifs/Sites/SignalPeptides/TMHs SSEARCH CLUSTALW Superfamily/Domain/Motif Alignments Family Relationships & Functional Features Family Classification & Functional Analysis HMM Motif Search Pattern Search SignalP/TMHMM Analytical Pipeline

28 Integrated Bioinformatics System
Global Bioinformatics Analysis of 1000’s of Genes and Proteins Pathway Discovery, Target Identification

29 Lab Section

30 Text Search

31 Text Search Result (I) Pre-computed Extend your search or start over
Choose columns to be displayed Expand view Pre-computed BLAST Results Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report

32 Text Search Result (III)
Number of Related Seq. at 3 different E-value cut-offs

33 Text Search Result (II)
Extend your search or start over Choose columns to be displayed Curated domain architecture with links to Pfam database Link to PIRSF report Extent of family curation

34 Peptide Search

35 Peptide Search & Results
Links to iProClass and UniProtKB reports Link to NCBI taxonomy Link to PIRSF report Matching peptide highlighted in the sequence Sorting arrows

36 Choose columns to be displayed
Batch Retrieval Results (I) Choose columns to be displayed 3 4 5 2 1 6 Links to iProClass and UniProtKB reports Retrieve more sequences

37 Curated domain architecture (N- to C- termini) with links
Batch Retrieval Results (II) Choose columns to be displayed Retrieve more families 3 4 5 2 1 6 Links PIRSF reports Curated domain architecture (N- to C- termini) with links to Pfam database

38 Blast Similarity Search

39 Blast / Related Sequences Results

40 Blast Result & Pairwise Alignment
BLAST Aligment

41 Pairwise Alignment

42 Multiple Alignment Interactive Phylogenetic Tree and Alignment

43 Phylogenetic Tree and Alignment View

44 Pattern Search (I)

45 Pattern Search (II) Links to iProClass and UniProtKB reports
Link to NCBI taxonomy Link to PIRSF report Sorting arrows Display the query pattern

46 PIRSF scan

47 PIRSF Report

48 PIRSF Family Hierarchy

49 Taxonomic Distribution & Phylogenetic Pattern

50 Rabbit Alpha Crystallin A Chain An iProClass View of the entry
Pre-computed BLAST results See protein synonyms See IDs from different databases

51 alpha-Crystallin and Related Proteins


Download ppt "Sequence Based Analysis Tutorial"

Similar presentations


Ads by Google