Presentation is loading. Please wait.

Presentation is loading. Please wait.

PIR Bio-defense Related Pathogen Data Mining

Similar presentations


Presentation on theme: "PIR Bio-defense Related Pathogen Data Mining"— Presentation transcript:

1 PIR Bio-defense Related Pathogen Data Mining
November 19, 2007 Literature Mining of Pathogenesis-Related Proteins NIAID Biodefense Proteomics Resource Center at PIR US Army Dengue Virus E Proteins Bioinformatics Analysis

2 Literature Mining of Pathogenesis-Related Proteins
Objective: To develop a text mining system for pathogenesis-related proteins in pathogens of military and biodefense relevance To integrate the pathogenesis-related proteins into integrated protein databases for functional analysis Priority pathogenic organisms: Francisella tularensis – Gammaproteobacteria Dengue virus – (+)ssRNA virus Brucella – Alphaproteobacteria Trypanosoma cruzi – Kinetoplastida Integrated information for pathogenic proteins UniProtKB iProClass Other pathway databases

3 Literature Mining of Pathogenesis-Related Proteins
Functional pathway analysis Data integration iProClass

4 Literature Mining of Pathogenesis-Related Proteins
Priority list of pathogens… Document retrieval (Prioritizing) 1 Name recognition Pathogenesis related papers 2 3 Passage highlighting n-1 System adjustment n

5 RLIMS-P: Rule-based Literature Mining System for Protein Phosphorylation

6 BioThesaurus: Gene/protein name searches - synonyms, ambiguous names…

7 Exp. Data Information Knowledge Gene ID Protein ID Peptide seq. 1
UniProtKB AC/ID Information 2 Function Pathway Family …… 3 Categorize, Statistics, Cross-dataset, Association Knowledge

8 iProXpress – Pathway Profiling
Organelle proteome data sets ER Mit Mit Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway… Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination. ER KEGG pathway

9 IP-MS Data from E2-treated breast cancer cells
Gene Ontology: Molecular Process Transcriptional regulation chromatin interaction histone regulation

10 Albert Einstein College of Medicine T. gondii, C. parvum
NIAID Albert Einstein College of Medicine T. gondii, C. parvum Caprion Pharmaceuticals B. abortus Harvard Institute of Proteomics V. cholerae, B. anthracis Myriad Genetics B. anthracis, Y. pestis, F. tularensis, Vaccinia, Variola Pacific Northwest National Laboratory S. typhimurium, S. typhi, Vaccinia, Monkeypox Scripps SARS CoV, Influenza University of Michigan B. anthracis Albert Einstein PNNL U of Michigan Harvard Myriad DATA Scripps Caprion Resource Center PIR VBI SSS As JoJo outlined to you previously there are 7 Research centers and one Resource Center, who in addition to organizing this meeting is required to publicize and make available to the scientific community the data and results from the program. Here are the 7 centers and the organisms they work on 2 private companies, Caprion and Myriad 4 academic labs 1 government lab PNNL All are all required to provide their data, methods, technology, reagents etc. to the public much of it through the resource center, after validation, and whatever time needed to protect IP?? Three organizations comprise the resource center, SSS a private Washington contracting company, PIR at Georgetown which I represent and VBI

11 Well we have built a website and data repository and some search and anlysis tools that we will outline here, we cannot show you all the details here but there are some posters and demos are available during the meeting Here is our Home page hosted by SSS, you can information on the program and players The key entry point is the project Catalog page shown here were we try to summarize the programs progress and deliverables and link to resources Major deliverables are …. Our resources consist of 3 main integrated resources the MPD, MRD, and the DataCenter. The data center houses the data and protocols from the PRC, it also has some visualization and analysis tools Stephen will show you. The is a reagent directory to track reagents from the program which are housed at other repositories, see antibodies BEI here. And here is the master protein directory where we integrate the data, reagents, experiment results and more around protein information. So lets look at some of the data in the MPD and tell you some of our challenges and solutions and what you can do with the directory.

12 Here is a query where we pulled out 17 mouse proteins detected in microphages infected with either anthracis or salmonella typimurium and seen with both MS and microarray There are 285 if I remove the microarray restriction and just compare MS Anything interesting here? Possibly but we have not analyzed it completely, you can find obvious housekeeping proteins like actin a few cytokine and signaling pathway proteins, that one might expect to see in activated macrophages, here is a protein in the map kinase signaling pathway. Here is one that is completely uncharacterized anywhere though you find homlogs in ameba, fish and humans there does not seem to be anything in the public databases or literature we could find. Some of these examples are on our poster if you want to see more. Mouse proteins detected in B. anthracis and S. typhimurium infected macrophages

13 Integrated Analysis:Selection Pressure, Entropy; Epitope
Dengue DENV1 DENV3 DENV2 DENV4

14 Additional Structure Analysis
Dengue aa site variant Interacting residues exposed Interacting residues Exposed Result: identification of diagnostic and vaccine targets


Download ppt "PIR Bio-defense Related Pathogen Data Mining"

Similar presentations


Ads by Google