Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC PIR: a comprehensive resource for functional.

Similar presentations


Presentation on theme: "Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC PIR: a comprehensive resource for functional."— Presentation transcript:

1

2 Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC PIR: a comprehensive resource for functional analysis of protein sequences and families

3 2 PIR Web Site NEW web site, soon to become public http://pir.georgetown.edu http://pir.georgetown.edu currently an old version PIR and UniProt web sites interlinked and cross- navigable PIR-specific features Text Search Sequence Search Classification Database Search

4 3 i Integration of protein family, function, structure Rich links (executive summary + hypertext links) to > 90 databases Value-added reports for 1.96 Million UniProtKB protein entries i iProClass Protein Knowledgebase http://pir.georgetown.edu/iproclass

5 4 Example Want to find info on chorismate mutases, Specifically: Start with Bacillus subtilis P19080 = CHMU_BACSUP19080 Relatedness to other chorismate mutases - Homology - Domain architecture - Is it related to E.coli P07022 (a well-studied bifunctional enzyme (P-protein), chorismate mutase/prephenate dehydratase)

6 5 iProClass Sequence Report

7 6 What can we find about “chorismate mutase” Protein Analysis: I. Text Search iProClass

8 7 Text SearchResults (I) UniProt ID

9 8 Text SearchResults (II) Display options: add or remove columns

10 9 Text Search Results (III) Find chorismate mutase(s) from B. subtilis

11 10 Determining Protein Homology Is B. subtilis CM P19080 homologous to E.coli P-protein P07022? to B. subtilis AroA(G) P39912 ? Which domains, if any, in multidomain chorismate mutases it corresponds to? What kinds of domain architecture exist in chorismate mutases?

12 11 Retrieve Proteins by UID in Batch Mode ID mapping option: can use various non-UniProt IDs Batch Retrieval

13 12 Determining Protein Homology: Sequence Search BLAST FASTA SSearch

14 13 Blast Search Results BLAST query UniProt sequence P19080 hits PIRSF005965 family members as best hits

15 14 Pre-compiled Related Sequences: saves time

16 15 BLAST/SSEARCH Results SSEARCH Alignment BLAST Alignment

17 16 Determining Protein Homology: Peptide Search

18 17 Peptide Search Results

19 18 Protein families reflect evolutionary relationships Function often follows along the family lines Therefore, matching a protein sequence a protein family provides information about a protein (need a highly curated and annotated family) Faster and often more accurate than searching against a protein database Protein classification facilitates sequence and functional analysis of proteins and is used for accurate automatic annotation (PIRSF is used for UniProt annotation) Family Classification System: One-Stop Platform for Protein Analysis

20 19 PIRSF Classification System PIRSF: reflects evolutionary relationships of full-length proteins Definitions: Basic unit = Homeomorphic Family Homologous: Inferred by sequence similarity Homeomorphic: Full-length sequence similarity and common domain architecture Hierarchy: Flexible number of levels with varying degrees of sequence conservation; Network Structure: multiple domain parents Advantages: Annotation of both generic biochemical and specific biological functions Accurate propagation of annotation and development of standardized protein nomenclature and ontology

21 20 PIRSF Classification System A protein may be assigned to only one homeomorphic family, which may have zero or more child nodes and zero or more parent nodes. Each homeomorphic family may have as many domain superfamily parents as its members have domains.

22 21 Unclassified UniProtKB proteins Uncurated Homeomorphic Clusters Orphans Preliminary Homeomorphic Families Final Families, Subfamilies, Superfamilies Add/Remove Members Name, Refs, Abstract, Domain Arch. Automatic Clustering Computer- assisted Manual Curation Automatic Procedure Unassigned Proteins Automatic Placement Hierarchies (Superfamilies/Subfamilies) Map Domains on Clusters Merge/Split Clusters New Proteins Protein Name Rules/Site Rules Build and Test HMMs 1 2 3 4 5 6 7 8

23 22

24 23 Tool: Curator’s Decision Maker

25 24 Classification Tool: BlastClust Curator-guided clustering Single-linkage clustering using BLAST Retrieve all proteins sharing a common domain Iterative BlastClust (fixed length coverage)

26 25 Family Analysis of Homologous Proteins 1. Fully Curated Protein Family: Especially important when the protein of interest is underannotated or misannotated (happens often!) Evidence types: Characterized (validated), Predicted (by computational methods) or Uncharacterized 2. Preliminary or Uncurated Family Have to do some analysis OR contact PIR and ask to prioritize this family 3. No Family Classification Have to do some analysis OR contact PIR and ask to prioritize this family iProClass search PIRSF - blank

27 26 Underannotated Proteins Search iProClass with PIRSF005965 Providing more information

28 27 PIRSF SCAN (sequence search) UniProt sequence Q8Y5X7 is automatically classified as chorismate mutase of the AroH class PIRSF005965 Returns only matches to fully curated PIRSFs

29 28 Taxonomic distribution of PIRSF can be used to infer evolutionary history of the proteins in the PIRSF PIRSF Family Report: Curated Protein Family Information Phylogenetic tree and alignment view allows further sequence analysis

30 29 PIRSF Family Report (II) Integrated value added information from other databases Mapping to other protein classification databases

31 30 CM from B.subtilis P19080 does not bring B.subtilis AroA(G) or E. coli P-protein (or related proteins) in BLAST search Contains a different PFAM domain Identical conserved motifs are not found NOT homologous PIRSF reports: abstracts contain most of this info PIRSF domain architecture (curated or uncurated): Pfam and newly defined domains Structure information (PDB links) Hierarchy in DAG (under development) Chorismate Mutase Results from iProClass Analysis Use PIRSF family database for the same analysis:

32 31 PIRSF Text Search New domain AroA(G)

33 32 Chorismate Mutase Convergent Evolution – EC 5.4.99.5 (Non-Orthologous Gene Displacement) Two Distinct Sequence/Structure Types AroQ Class: SCOP (all  ), core: 6 helices, bundle AroH Class: SCOP (  +  ), core: beta-alpha-beta-alpha-beta(2) Two Pfam Domains: PF01817, PF07736 (New PFAM domain) AroQAroH

34 33 Developing DAG Viewer Before: all chorismate mutase proteins and families hit PF01817 including PIRSF005965 (not homologous to the rest) Subfamily Network structure (in DAG) for PIRSF family classification system reflects PIRSF family hierarchy which is based on evolutionary relationships

35 34 DAG Viewer (II) After: PFAM created a new domain PF07736 which is found in PIRSF005965 members “Orphans”: no family classification

36 35 PIR Team Dr. Cathy Wu, Director Protein Classification team Dr. Winona Barker Dr. Lai-Su Yeh Dr. Anastasia Nikolskaya Dr. Darren Natale Dr. Zhang-Zhi Hu Dr. Raja Mazumder Dr. CR Vinayaka Dr. Sona Vasudevan Dr. Cecilia Arighi Informatics team Dr. Hongzhan Huang Dr. Peter McGarvey Baris Suzek, M.S. Sehee Chung, M.S. Dr. Leslie Arminski Dr. Hsing-Kuo Hua Yongxing Chen, M.S. Jian Zhang, M.S. Dr. Xin Yuan Students Christina Fang Vincent Hermoso Natalia Petrova UniProt is supported by the National Institutes of Health, grant # 1 U01 HG02712-01National Institutes of Health


Download ppt "Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC PIR: a comprehensive resource for functional."

Similar presentations


Ads by Google