Presentation is loading. Please wait.

Presentation is loading. Please wait.

RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.

Similar presentations


Presentation on theme: "RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information."— Presentation transcript:

1 RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information Resource, 2 Department of Biostatistics, Bioinformatics, and Biomathematics, 4 Department of Computational Linguistics, Georgetown University, Washington, DC 20007; 3 University of Delaware, DE 19716 http://pir.georgetown.edu/iprolink/rlimsp Contact: pirmail@georgetown.edu RLIMS-P Evidence attribution Manual tagging assisted with computational extraction: Training and testing sets of positive and negative samples for RLIMS-P development 3 objects Annotation tagged literature sets for PTMs from iProLINK literature mining resource Introduction: The RLIMS-P is a rule-based text-mining program specifically designed to extract protein phosphorylation information on protein kinase, substrate and phosphorylation sites from the abstracts (Hu et al., 2005). The program was originally developed by Narayanaswamy, Ravikumar, and Vijay-Shanker (2005), and was tested and benchmarked by PIR using iProLINK annotated datasets (Hu et al., 2004). The RLIMS-P program is now adopted at PIR and being developed into an online text mining tool for extracting protein phosphorylation information from PubMed literature (Yuan, et al., 2006). The online RLIMS-P currently provides the following functions to: 1) determine whether the MEDLINE abstract contains protein phosphorylation information and to extract protein kinase, protein substrate and phosphorylation site/residue when available; 2) tag extracted phosphorylation objects in the abstract in different colors; 3) map the protein substrate to UniProtKB protein entries based on PMID; 4) map protein names to UniProtKB protein entries based on BioThesaurus. Coupled with BioThesarus, RLIMS-P can facilitate the UniProtKB protein phosphorylation feature annotation. P RLIMS-P System Design Pattern 1: (in/at )? ATR/FRP-1 also phosphorylated p53 in Ser 15 Training/benchmarking data sets and pattern rules can be downloaded. Bioinformatics. 21:2759-65, 2005 Benchmarking of RLIMS-P High recall for paper retrieval and high precision for information extraction Web-based RLIMS-P Information retrieval and extraction Protein entity mapping C D A B The online RLIMS-P text-mining results: (A) The summary table lists PMIDs with top-ranking phosphorylation annotation. (B) The full report provides detailed annotation results with evidence tagging and automatic mapping to UniProtKB entry containing the citation (e.g., KPB1_RABIT). Name mapping of phosphorylated protein in RLIMS-P report (C) to UniProtKB entry using BioThesaurus (D). Name mapping includes options to use names appearing in the abstract or user- specified names to search online BioThesaurus. Here, “PBPA” retrieves 10 entries sharing the same name, including PBPA of Mycobacterium tuberculosis (P71586_MYCTU), the phosphorylated protein discussed in the abstract. A preliminary case study – Using RLIMS-P to facilitate the UniProtKB feature annotation Nuclear receptor (NR) phosphorylation was under- annotated in databases. Text-mining of 2170 PubMed abstracts (retrieved with query of NR phosphorylation) with RLIMS-P found significantly more phosphorylation sites to add to UniProt feature annotation. Future development of RLIMS-P program: Extend to mine full-length articles Mine in vivo protein phosphorylation and its cellular context, such as cell types and pathways References: Hu ZZ, et al., Comp Biol Chem. 28:409-16, 2004. Hu ZZ, et al., Bioinformatics. 21:2759-65, 2005. Narayanaswamy M, et al., Bioinformatics, Suppl.1 21: i319-i327, 2005. Yuan X, et al., Bioinformatics, April 27, 2006. Acknowledgements: NIH (UniProt), NSF (Entity Tagging). PIR team: Wu HT, Fang C, Huang H, Arminski L. Collaborators: Liu H, Narayanaswamya M, Ravikumar KE.


Download ppt "RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information."

Similar presentations


Ads by Google