PIR Bio-defense Related Pathogen Data Mining November 19, 2007 Literature Mining of Pathogenesis-Related Proteins NIAID Biodefense Proteomics Resource Center at PIR US Army Dengue Virus E Proteins Bioinformatics Analysis
Literature Mining of Pathogenesis-Related Proteins Objective: To develop a text mining system for pathogenesis-related proteins in pathogens of military and biodefense relevance To integrate the pathogenesis-related proteins into integrated protein databases for functional analysis Priority pathogenic organisms: Francisella tularensis – Gammaproteobacteria Dengue virus – (+)ssRNA virus Brucella – Alphaproteobacteria Trypanosoma cruzi – Kinetoplastida Integrated information for pathogenic proteins UniProtKB iProClass Other pathway databases
Literature Mining of Pathogenesis-Related Proteins Functional pathway analysis Data integration iProClass
Literature Mining of Pathogenesis-Related Proteins Priority list of pathogens… Document retrieval (Prioritizing) 1 Name recognition Pathogenesis related papers 2 3 Passage highlighting n-1 System adjustment n
RLIMS-P: Rule-based Literature Mining System for Protein Phosphorylation http://pir.georgetown.edu/iprolink/rlimsp/
BioThesaurus: Gene/protein name searches - synonyms, ambiguous names… http://pir.georgetown.edu/iprolink/biothesaurus/
Exp. Data Information Knowledge Gene ID Protein ID Peptide seq. 1 UniProtKB AC/ID Information 2 Function Pathway Family …… 3 Categorize, Statistics, Cross-dataset, Association Knowledge
iProXpress – Pathway Profiling Organelle proteome data sets ER Mit Mit Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway… Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination. ER KEGG pathway
IP-MS Data from E2-treated breast cancer cells Gene Ontology: Molecular Process Transcriptional regulation chromatin interaction histone regulation
Albert Einstein College of Medicine T. gondii, C. parvum NIAID Albert Einstein College of Medicine T. gondii, C. parvum Caprion Pharmaceuticals B. abortus Harvard Institute of Proteomics V. cholerae, B. anthracis Myriad Genetics B. anthracis, Y. pestis, F. tularensis, Vaccinia, Variola Pacific Northwest National Laboratory S. typhimurium, S. typhi, Vaccinia, Monkeypox Scripps SARS CoV, Influenza University of Michigan B. anthracis Albert Einstein PNNL U of Michigan Harvard Myriad DATA Scripps Caprion Resource Center PIR VBI SSS As JoJo outlined to you previously there are 7 Research centers and one Resource Center, who in addition to organizing this meeting is required to publicize and make available to the scientific community the data and results from the program. Here are the 7 centers and the organisms they work on 2 private companies, Caprion and Myriad 4 academic labs 1 government lab PNNL All are all required to provide their data, methods, technology, reagents etc. to the public much of it through the resource center, after validation, and whatever time needed to protect IP?? Three organizations comprise the resource center, SSS a private Washington contracting company, PIR at Georgetown which I represent and VBI
Well we have built a website and data repository and some search and anlysis tools that we will outline here, we cannot show you all the details here but there are some posters and demos are available during the meeting Here is our Home page hosted by SSS, you can information on the program and players The key entry point is the project Catalog page shown here were we try to summarize the programs progress and deliverables and link to resources Major deliverables are …. Our resources consist of 3 main integrated resources the MPD, MRD, and the DataCenter. The data center houses the data and protocols from the PRC, it also has some visualization and analysis tools Stephen will show you. The is a reagent directory to track reagents from the program which are housed at other repositories, see antibodies BEI here. And here is the master protein directory where we integrate the data, reagents, experiment results and more around protein information. So lets look at some of the data in the MPD and tell you some of our challenges and solutions and what you can do with the directory. www.proteomicsresource.org
Here is a query where we pulled out 17 mouse proteins detected in microphages infected with either anthracis or salmonella typimurium and seen with both MS and microarray There are 285 if I remove the microarray restriction and just compare MS Anything interesting here? Possibly but we have not analyzed it completely, you can find obvious housekeeping proteins like actin a few cytokine and signaling pathway proteins, that one might expect to see in activated macrophages, here is a protein in the map kinase signaling pathway. Here is one that is completely uncharacterized anywhere though you find homlogs in ameba, fish and humans there does not seem to be anything in the public databases or literature we could find. Some of these examples are on our poster if you want to see more. Mouse proteins detected in B. anthracis and S. typhimurium infected macrophages
Integrated Analysis:Selection Pressure, Entropy; Epitope Dengue DENV1 DENV3 DENV2 DENV4
Additional Structure Analysis Dengue aa site variant Interacting residues exposed Interacting residues Exposed Result: identification of diagnostic and vaccine targets