Presentation is loading. Please wait.

Presentation is loading. Please wait.

PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.

Similar presentations


Presentation on theme: "PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing."— Presentation transcript:

1 PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing full-length similarity and common domain architecture Significance Improve sensitivity of protein identification and functional inference Detect and correct genome annotation errors systematically Provide basis for evolutionary and comparative genomics research Provide basis for automated annotation of protein features: annotate generic biochemical and specific biological functions Protein Classification and Functional Annotation Discovery of New Knowledge by Using Information Embedded within Families of Homologous Sequences and Their Structures

2 A protein may be assigned to only one homeomorphic family, which may have zero or more child nodes and zero or more parent nodes. Each homeomorphic family may have as many domain superfamily parents as its members have domains.

3 Creation and Curation of PIRSFs Computer-Generated (Uncurated) Clusters Preliminary Curation Membership Signature Domains Full Curation Family Name, Description, Bibliography PIRSF Name Rules UniProtKB proteins Preliminary Homeomorphic Families Orphans Curated Homeomorphic Families Final Homeomorphic Families Add/remove members Name, refs, abstract, domain arch. Automatic clustering Computer- assisted Manual Curation Automatic Procedure Unassigned proteins Automatic placement Create hierarchies (superfamilies/subfamilies ) Map domains on Families Merge/split clusters New proteins Protein name rule/site rule Build and test HMMs

4 PIRSF family classification system http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml

5 PIRSF Text Search Add extra input boxes for advanced search Select field Ways to get to PIRSF text search

6 PIRSF Text Search Result (I) Things you can do from the result table: 1.Add search terms or start search over 2. Customize the table columns 3. Save your results as table or FASTA format 4. Select entries using check boxes and perform analysis using tool bar options 5. Links to PIRSF records, PIRSF hierarchy, to protein domains (Pfam) 1 2 3 4 5

7 PIRSF Text Search Result (II) 2. How to customize the table columns: Display KEGG pathway ID column a- Select KEGGPathway ID in the “Fields not in display” box c- Now KEGG ID should be in the “Fields in display”. Press apply button for the changes to take place. b- Use the > to add item into the “Fields in display” box

8 PIRSF Text Search Result (III) 3. Save your results as table or FASTA format a- Select Entries using check boxes in the PIRSF column. To select all, check the box in the column heading. b- Click on “Save Result As: Table” to store the information in the result table. This file can be opened in Excel as shown below. c- Click on FASTA to save protein sequences.

9 PIRSF Text Search Result (IV) a- Select families using check boxes in the PIRSF ID column. To select all, check the box in the column heading. Then select tool, e.g., Taxonomy Distribution 4. Select entries using checkboxes and perform analysis using tool bar options Display taxonomic distribution for the selected families. In this case, PIRSF001501 and PIRSF017318 contain members of the AroQ class from prokaryotes and eukaryotes, respectively, which is also reflected in the family name.

10 PIRSF Text Search Result (V) 4.Note on selecting families for analysis for Multiple Alignment and Domain Display: If more than one family is selected the chosen tool will perform the operation on representative members of the selected families. Example: multiple alignment PIRSF001501, PIRSF500251, PIRSF026640 and PIRSF029775. If one family is selected the chosen tool will perform the operation on the seed members. Example: multiple alignment PIRSF001501

11 PIRSF Text Search Result (VI) 5. The result table contains summarized information about family size, domain architecture, level of curation. Additional data can be viewed by using the Display Option. PIRSF Name: The names assigned to PIRSF predominantly reflect the membership. The main source of PIRSF names is the literature. Fully curated families have a name accompanied, in most cases, by an evidence tag: [Validated]: to indicate that at least one member in the family has experimentally determined function. [Predicted]: for families whose functions are inferred computationally based on sequence similarity and/or functional associative analysis. [Tentative]: cases where experimental evidence is not decisive. Curation Status: Indicates the level of manual curation of the PIRSF. Uncurated: Computer-generated protein clusters, no manual curation. The clusters are computationally defined using both pairwise based parameters (% sequence identity, sequence length ratio and overlap length ratio) and cluster-based parameters (% matched members, distance to neighboring clusters and overall domain arrangement). Preliminary: Computer-generated clusters are manually curated for membership (do proteins belong to the assigned cluster?) and domain architecture (Pfam domains listed from N- to C- termini). Full/Full (with description): A name is assigned to the protein family, and accompanying references are listed when available. In many cases, brief descriptions are also provided. Hfam/Superfam/Subfam: Indicates the hierarchical level for the PIRSF: homeomorphic, superfamily or subfamily level, respectively. Selecting the button will show the PIRSF hierarchy in a DAG view with Pfam as the top node.

12 5. PIRSF hierarchy in DAG view (cont.) Pfam level Hfam level Subfam level

13 PIRSF Family Report (I): Curated Protein Family Information Level of manual curation Hierarchy with Pfam domain at the highest node Taxonomic distribution of PIRSF can be used to infer evolutionary history of the proteins in the PIRSF Phylogenetic tree and alignment view allows further sequence analysis See graphical display of Pfam domains assigned with high confidence

14 PIRSF Family Report (II) Integrated value-added information from other databases Mapping to other protein classification databases

15 PIRSF: Batch Retrieval Retrieve PIRSF families by selecting a specific identifier or a combination of identifiers. List IDs Define IDs Display the list of query/PIRSF matches

16 PIRSF SCAN (sequence search)

17 UniProtKB sequence Q8Y5X7 is automatically classified as chorismate mutase of the AroH class PIRSF005965 Returns only matches to fully curated PIRSFs


Download ppt "PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing."

Similar presentations


Ads by Google