Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integration of PRO and UniProtKB Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO-PO-GO Meeting.

Similar presentations


Presentation on theme: "Integration of PRO and UniProtKB Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO-PO-GO Meeting."— Presentation transcript:

1 Integration of PRO and UniProtKB Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO-PO-GO Meeting

2 2 PRO Framework PRO terms are defined/annotated using other ontologies and resources via definition of relations or mappings when appropriate

3 Accessioned, species-specific protein complexes in ProComp are described using protein entities in ProForm; and are cross-referenced to species-independent complex representations in GO A gene product (PR:000025358) and its isoforms and modified forms (PR:000025355; PR:000025356) are represented in PRO as separate, uniquely accessioned entities; but are described in the same UniProtKB record (UniProtKB:Q9D6R2) The representation of protein complexes in the Protein Ontology (PRO) Bult CJ, Drabkin HJ, Evsikov A, Natale D, Arighi C, Roberts N, Ruttenberg A, D'Eustachio P, Smith B, Blake JA, Wu C. (2011) BMC Bioinformatics 12, 371 [PMID: 21929785] Relationships Between PRO-GO- UniProtKB ProComp-ProForm: has_part ProComp-GO: is_a ProForm-UniProtKB: xref 3

4 Mappings to various external databases promapping.txt: tab-delimited, each line indicating the PRO ID, the database ID, and the type of mapping (is_a or exact) promapping.obo: the same information as promapping.txt, but in OBO format Mappings are of two types: exact The database object is an exact match to the PRO object e.g., PR:000026497 describes an isoform of 6-phosphofructokinase type C in human only, which corresponds to UniProtKB:Q01813-1 is_a The database object is more specific than the PRO object e.g., PR:000026465 describes an (organism-nonspecific) isoform of 6-phosphofructokinase type C, so UniProtKB:Q01813-1 (human) and UniProtKB:Q9WUA3-1 (mouse) are mapped to this term 4 PRO ID Mapping

5 bri1/iso1/phos 5 (PR:000035786) has two parents: explicit one in formal definition (PR:000035785) implicit one only shown in the reasoned version (PR:000028355) [Term] id: PR:000035786 name: protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 (Arabidopsis thaliana) def: "A protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 in Arabidopsis thaliana. UniProtKB:O22476-1, Thr-872, MOD:00047|Ser-858, MOD:00046|Ser-891, MOD:00046." [PMID:22184234, PRO:LVM] comment: Category=organism-modification. Flag=automatic. synonym: "Athal-BRI1/iso:1/Phos:5" EXACT PRO-short-label [PRO:DNx] synonym: "At protein brassinosteroid insensitive 1 isoform 1 phosphorylated 4" RELATED [] is_a: PR:000028355 ! implied link automatically realized ! protein brassinosteroid insensitive 1 isoform 1 (Arabidopsis thaliana) is_a: PR:000035785 ! implied link automatically realized ! protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 intersection_of: PR:000035785 ! protein brassinosteroid insensitive 1 isoform 1 phosphorylated 5 intersection_of: only_in_taxon NCBITaxon:3702 ! Arabidopsis thaliana 5 PRO Reasoning with ID Mapping PR:000035785 PR:000028355 pro.obo: PRO version with no implied links pro_reasoned.obo: implied link automatically realized via is_a

6 6 Ontological Representation of UniProtKB in PRO  PRO provides the ontological presentation for UniProtKB  Integration of UniProt records/subrecords into the PRO ontological framework  Use UniProtKB protein records (labeled by accession numbers, isoform IDs, and potentially other stable identifiers within UniProtKB records) to represent organism-gene level and sequence level (and potentially modification-level) terms of PRO  Organism-Gene: canonical protein record  Organism-Sequence: isoform subrecord  Organism-Modification: chain/variant subrecord

7 7 Organism-Gene/Sequence

8 8 Ontologizing UniProtKB  Full-scale implementation of 12 reference genomes (others as needed)  Organism-Gene: canonical protein record – UniProtKB:xxxxxx  Organism-Sequence: isoform subrecord – UniProtKB:xxxxxx-1  Persistent URL: http://purl.obolibrary.org/obo/PR_xxxxxxxxxhttp://purl.obolibrary.org/obo/PR_xxxxxxxxx  UniProtKB URL in the ontological space, proposed as:  PR:xxxxxx (UniProtKB at organism-gene level)  PR:xxxxxx-1 (UniProtKB at organism-sequence level)  To consider  Organism-Modification: chain – UniProtKB:PRO_xxxxxxxxx  Organism-Modification: variant – UniProtKB:VAR_xxxxxx  Integration/coordination between ProComp and IntAct for ontological representation of protein complexes

9 9 Orthologous-Gene Ortho-Isoform Ortho-PTM Organism-PTM Ortho-Complex Organism-Complex UniProtKB in PRO Ontological Framework: Rich Relations

10 10 Issues  Stable identifiers  UniProtKB would provide stable identifiers  ID mapping service  Need for sequence merging and isoform curation: when exist Swiss-Prot (SP) entry for a given gene and corresponding unmerged TrEMBL (Tr) entries that may represent a new isoform, a new variant, or a duplicate.  Unmerged Tr entries corresponding to additional isoforms with a sequence different than any mentioned in the SP entry organism-gene (SP): Q96F24 organism-sequence (SP): Q96F24-1, Q96F24-2 organism-sequence (Tr): B4DWS0  Organism-gene only represented in unreviewed (Tr) section: where one or multiple Tr entries exist for a given gene  One entry organism-gene accession (Tr) = Q8VGZ9 organism-sequence accession (Tr; implied) = Q8VGZ9-1  Multiple entries organism-gene accession ***???*** organism-sequence accession = B9E100, Q6W3E0

11 Integrating PRO curation into UniProtKB Isoforms curated by PRO curators will continue to be integrated into UniProtKB as a priority  PRO isoform curation (mostly done at MGI) is based on experimental information from literature, and covers information such as UniProtKB AC, GO annotation, and comments on evidence on isoform and expression  PIR curators integrate new isoforms and associated annotations to SP entry Submission of annotation for a new SP entry  PIR curators create new reviewed SP entries when annotating protein isoforms and PTM forms with no reference SP entry  Example: BUB3_XENLA Other areas of PRO annotations, particularly on PTMs and complexes, could be integrated as appropriate Reciprocal links from UniProtKB to PRO 11

12 PRO literature-based annotation of isoforms 4 and 5 of a mouse protein UniProt curation:  Merged 3 TrEMBL entries to existing UniProtKB record (Q8BIF2)  Added Isoform specific subcellular localization information  Updated information about function and added new information New isoform curation in PRO & UniProt CC -!- SUBCELLULAR LOCATION: Nucleus. Cytoplasm. CC -!- SUBCELLULAR LOCATION: Isoform 1: Nucleus. CC -!- SUBCELLULAR LOCATION: Isoform 4: Cytoplasm. CC -!- SUBCELLULAR LOCATION: Isoform 5: Nucleus. CC -!- TISSUE SPECIFICITY: Widely expressed in brain, regions including … CC -!- DEVELOPMENTAL STAGE: In the neural tube, expressed as early as CC embryonic day 9.5 (E9.5) and expression is confined to the nervous … CC -!- INDUCTION: By retinoic acid. Expression is up-regulated in P19 CC cells during neural differentiation upon retinoic acid treatment … CC -!- PTM: Phosphorylated (Probable). CC -!- SIMILARITY: Contains 1 RRM (RNA recognition motif) domain. CC -!- CAUTION: Initial characterization was derived from usage of a CC monoclonal antibody (A60) directed to an unknown protein called... 12

13 Integrating PRO curation into UniProtKB Reciprocal links from UniProtKB to PRO  UniProtKB cross-reference (DR) lines [e.g., DR GO; GO:0006954; P:inflammatory response; IEA:Compara]  DR line to include PRO identifier (PURL), PRO name, and short-label  Link to the PRO page(s) at the exact (organism-gene) level and possibly also other PTM forms (organism-modification) Other areas of PRO annotations, particularly on PTMs and complexes, could be integrated as appropriate  Annotation of sequence features (such as PTMs not annotated in UniProtKB) and functional annotation that apply to those features  Barrier for direct annotation integration: curation depth needed for all aspects of annotatable information beyond PTMs  Possible Solution: link to information in PRO as additionally annotated data, similarly to UniProt approach to include additional bibliography 13


Download ppt "Integration of PRO and UniProtKB Amherst, NY May 16, 2013 Cathy H. Wu, Ph.D. PRO-PO-GO Meeting."

Similar presentations


Ads by Google