Protein Ontology: Addressing the need for precision in representing protein networks Darren A. Natale, Ph.D. Protein Science Team Lead, PIR Research Assistant Professor, GUMC Workshop on Ontologies of Cellular Networks March 2008
IEV_ part_of IEV_ IEV_ IEV_ IEV_ IEV_ Binding of R-smad:smad4 complex and responsive element 2 Complex formation of R-smad and Smad4 5 Transcription by R-smad:smad4 1 Phosphorylation of R-smad by TGF beta receptor I 3 Nuclear import of R-smad:smad4 TGF- signaling pathway Example from: INOH Event Ontology R-smad R-smad: Smad4 smad4 TGF beta receptor I responsive element ActionsLocations Nucleus Cytoplasm nuc lear me m b rane Roles
IMR_ Txn regulator IMR_ SMAD IMR_ Co-Smad IMR_ R-Smad IMR_ I-Smad IMR_ Smad3 IMR_ Smad2 IMR_ Smad5 IMR_ Smad1 IMR_ Smad8 is_a Example from: INOH Molecule Role Ontology IMR_ Smad4 The Roles Played IMR_ SMAD2_HUMAN sequence_of
Cellular Component: - nucleus Molecular Function: - protein binding Biological Process: - signal transduction - regulation of transcription, DNA-dependent Mothers against decapentaplegic homolog 2 Smad 2 GO annotation of SMAD2_HUMAN:
II I TGF- TGF-beta receptor PP Smad 4 4 DNA binding 1 phosphorylation 2 complex formation Nucleus Cytoplasm Smad 2 PP Smad 4 5 Transcription Regulation PP Smad 2 Smad 4 3 nuclear translocation PP Smad 2 P P P ++ ERK1 CAMK2 P P
“normal”Cytoplasmic PRO: REACT_ TGF- receptor phosphorylated Forms complex Nuclear Txn upregulation PRO: REACT_ ERK1 phosphorylatedForms complex Nuclear Txn upregulation++ PRO: CAMK2 phosphorylated Forms complex Cytoplasmic No Txn upregulation PRO: alternatively spliced short form Cytoplasmic PRO: REACT_ phosphorylated short form Nuclear Txn upregulationPRO: REACT_ point mutation (causative agent: large intestine carcinoma) Doesn’t form complex Cytoplasmic No Txn upregulation PRO: Smad 2 PP PP P PP P x PP SMAD2_HUMAN
%PRO: Smad2 %PRO: Smad2 isoform 1 (long form) %PRO: Smad2 isoform 1 phosphorylated form %PRO: Smad2 isoform 1, TGF- receptor I-phosphorylated %PRO: Smad2 isoform 1, TGF- receptor I and ERK1-phosphorylated arises_from SO: amino_acid_substitution NOT has_modification MOD: phosphorylated residue NOT has_function GO: transcription coactivator activity gives_rise_to DO: carcinoma of the large intestine %PRO: Smad2 sequence 1, TGF- receptor I and CAMK2-phosphorylated %PRO: Smad2 sequence 2 (short form) - splice variant %PRO: Smad2 sequence 2 phosphorylated form %PRO: Smad2 sequence 2, TGF- receptor I-phosphorylated %PRO: Smad2 sequence 3 - genetic variant related to colorectal carcinoma %PRO: Smad2 isoform 1, TGF- receptor I and CAMK2-phosphorylated %PRO: Smad2 isoform 2 (short form) - splice variant %PRO: Smad2 isoform 2 phosphorylated form %PRO: Smad2 isoform 2, TGF- receptor I-phosphorylated %PRO: Smad2 isoform 3 - genetic variant related to colorectal carcinoma has_modification MOD:O-phosphorylated L-serine has_modification MOD:O-phosphorylated L-threonine has_function GO: TGF- receptor, pathway-specific cytoplasmic mediator activity has_function GO:SMAD binding has_function GO:transcription coactivator activity participates_in GO:signal transduction participates_in GO:SMAD protein heteromerization participates_in GO:regulation of transcription, DNA-dependent located_in GO:nucleus part_of GO:transcription factor complex
ProEvo ProForm GO Gene Ontology molecular function cellular component biological process participates_in part_of (for complexes) located _in (for compartments) has_function PRO protein Root Level is_a translation product of an evolutionarily-related gene translation product of a specific mRNA Family-Level Distinction In common: specific ancestor Source: PIRSF family Modification-Level Distinction In common: specific translation product Source: UniProtKB Sequence-Level Distinction In common: specific allele or splice variant Source: UniProtKB cleaved/modified translation product disease DO/UMLS Disease agent_of is_a protein modification has_modification PSI-MOD Modification SO Sequence Ontology sequence change arises_from (sequence change) gives_rise_to (effect on function) is_a protein domain has_part Pfam Domain Example: TGF- receptor phosphorylated smad2 isoform1 is a phosphorylated smad2 isoform1 is a smad2 isoform 1 is a smad2 is a TGF- receptor-regulated smad is a smad is a protein Modification Level Sequence Level Family Level Root Level translation product of a specific gene Gene-Level Distinction In common: specific gene Sources: PIRSF subfamily, Panther subfamily is_a Gene Level
IEV_ part_of IEV_ IEV_ IEV_ IEV_ IEV_ Binding of R-smad:smad4 complex and responsive element 2 Complex formation of R-smad and Smad4 5 Transcription by R-smad:smad4 1 Phosphorylation of R-smad by TGF beta receptor I 3 Nuclear import of R-smad:smad4 TGF- signaling pathway Example from: INOH Event Ontology R-smad R-smad: Smad4 smad4 TGF beta receptor I responsive element ActionsLocations Nucleus nuc lear me m b rane Roles Actors smad2 smad2: PP PP PP PP PP has_participant PRO:smad4 has_participant PRO:TGF- receptor-phosphorylated smad2 P P P P P Transcription has_participant PRO:smad4 has_participant PRO:TGF- receptor & ERK1-phosphorylated smad2 Cytoplasm
PRO Team (so far…) Principle Investigators Cathy Wu (PIR at GUMC) Judith Blake (The Jackson Laboratory) Barry Smith (SUNY Buffalo) Curators & Developers Cecilia Arighi (PIR at GUMC) Winona Barker (PIR at GUMC) Harold Drabkin (The Jackson Laboratory) Zhang-zhi Hu (PIR at GUMC) Hongfang Liu (GUMC) Darren Natale (PIR at GUMC) Official Launch: March 31,
IMR_ Txn regulator IMR_ SMAD IMR_ R-Smad IMR_ Smad2 Smad2 vs c-Myc in INOH and PRO IMR_ Txn factor IMR_ protein IMR_ Myc IMR_ c-Myc PRO: smad PRO: R-Smad PRO: smad2 is_a PRO: protein PRO: myc PRO: c-myc
cytoskeleton component EPB42 Hypothetical example: TGM3 vs EPB42 structural protein IMR_ protein TGM3 is_a PRO: protein transglutaminaseEPB42protease TGM3 protein modifier enzyme