PIR Bio-defense Related Pathogen Data Mining

Slides:



Advertisements
Similar presentations
Kino : Making Semantic Annotations Easier Ajith Ranabahu #, Priti Parikh #, Maryam Panahiazar #, Amit Sheth # and Flora Logan- Klumpler* # Ohio Center.
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
The IntAct Database Sandra Orchard & Birgit Meldal.
Strain-, species-, and genus-specific core unique proteins from selected organisms CUPID: Core and Unique Protein IDentification Raja Mazumder and Darren.
Host cell responses to viral infection can be monitored by a variety of different high throughput experimental methodologies in order to understand the.
Medical Diagnostics NIAID Funding Opportunities Maria Y. Giovanni, Ph.D. Assistant Director for Microbial Genomics and Advanced Technologies National Institute.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Cis-Regulatory/ Text Mining Interface Discussion.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
1 Protein Bioinformatics – Advances and Challenges Sona Vasudevan Peter McGarvey BY.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.
Bioinformatics and medicine: Are we meeting the challenge?
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1, Stephen M. Beckstrom-Sternberg 2,3, Paul S. Keim 2,3,
BioHealthBase: A Web-based Database and Analysis Resource for Francisella Shubhada Godbole 1, Jyothi Noronha 1, Burke Squires 1, Victoria Hunt 1, Ed Klem.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Leveraging Ontologies for Human Immunology Research Barry Smith, Alexander Diehl, Anna- Maria Masci Presented at Leveraging Standards and Ontologies to.
Large-scale knowledge aggregation for infectious diseases ASEAN-China International Bioinformatics Workshop Singapore, 17 th April 2008 Olivo Miotto Institute.
Bioinformatics at NIAID-Biodefense Proteomics Administrative Resource Center Peter McGarvey Ph.D. Senior Bioinformatics Scientist, Project Manager Protein.
Statistical Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) Brett E.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Integration of Host Factor Data into the Virus Pathogen Database and Analysis Resource (ViPR) and the Influenza Research Database (IRD) Brett E. Pickett.
Valentina Di Francesco Senior Program Officer for Bioinformatics, Structural Genomics and Systems Biology Microbial Genomics.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Central dogma: the story of life RNA DNA Protein.
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Bioinformatics and Computational Biology
PRO and the NIF / ImmPort Antibody Registries Alexander Diehl Protein Ontology Workshop 6/18/14.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
RDF based on Integration of Pathway Database and Gene Ontology SNU OOPSLA LAB DongHyuk Im.
Introduction to PubChem BioAssay
Pathway Informatics 16th August, 2017
Protein databases Henrik Nielsen
GenitoUrinary Development Molecular Anatomy Project
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
Data challenges in the pharmaceutical industry
Developing a protein-interactions ontology
EPConDB: Endocrine Pancreas Consortium Database
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Annotation: linking literature to gene products
PIR: Protein Information Resource
Using Spotfire for Proteomic Analysis
Literature Data Mining and Protein Ontology Development
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
Tutorial: Bioinformatics Resources
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
The Influenza Virus Enigma
Sequence Based Analysis Tutorial
Hands-on: Reviewing BLAST
ChIP-seq Robert J. Trumbly
Systems-wide Identification of cis-Regulatory Elements in Proteins
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
BioGRID: Biological General Repository for Interaction Datasets
Volume 15, Issue 2, Pages (April 2016)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Structuring Scientific Papers
Presentation transcript:

PIR Bio-defense Related Pathogen Data Mining November 19, 2007 Literature Mining of Pathogenesis-Related Proteins NIAID Biodefense Proteomics Resource Center at PIR US Army Dengue Virus E Proteins Bioinformatics Analysis

Literature Mining of Pathogenesis-Related Proteins Objective: To develop a text mining system for pathogenesis-related proteins in pathogens of military and biodefense relevance To integrate the pathogenesis-related proteins into integrated protein databases for functional analysis Priority pathogenic organisms: Francisella tularensis – Gammaproteobacteria Dengue virus – (+)ssRNA virus Brucella – Alphaproteobacteria Trypanosoma cruzi – Kinetoplastida Integrated information for pathogenic proteins UniProtKB iProClass Other pathway databases

Literature Mining of Pathogenesis-Related Proteins Functional pathway analysis Data integration iProClass

Literature Mining of Pathogenesis-Related Proteins Priority list of pathogens… Document retrieval (Prioritizing) 1 Name recognition Pathogenesis related papers 2 3 Passage highlighting n-1 System adjustment n

RLIMS-P: Rule-based Literature Mining System for Protein Phosphorylation http://pir.georgetown.edu/iprolink/rlimsp/

BioThesaurus: Gene/protein name searches - synonyms, ambiguous names… http://pir.georgetown.edu/iprolink/biothesaurus/

Exp. Data Information Knowledge Gene ID Protein ID Peptide seq. 1 UniProtKB AC/ID Information 2 Function Pathway Family …… 3 Categorize, Statistics, Cross-dataset, Association Knowledge

iProXpress – Pathway Profiling Organelle proteome data sets ER Mit Mit Protein information matrix: extensive annotations including protein name, family classification, function, protein-protein interaction, pathway… Functional profiling: iterative categorization, sorting, cross-dataset comparison, coupled with manual examination. ER KEGG pathway

IP-MS Data from E2-treated breast cancer cells Gene Ontology: Molecular Process Transcriptional regulation chromatin interaction histone regulation

Albert Einstein College of Medicine T. gondii, C. parvum NIAID Albert Einstein College of Medicine T. gondii, C. parvum Caprion Pharmaceuticals B. abortus Harvard Institute of Proteomics V. cholerae, B. anthracis Myriad Genetics B. anthracis, Y. pestis, F. tularensis, Vaccinia, Variola Pacific Northwest National Laboratory S. typhimurium, S. typhi, Vaccinia, Monkeypox Scripps SARS CoV, Influenza University of Michigan B. anthracis Albert Einstein PNNL U of Michigan Harvard Myriad DATA Scripps Caprion Resource Center PIR VBI SSS As JoJo outlined to you previously there are 7 Research centers and one Resource Center, who in addition to organizing this meeting is required to publicize and make available to the scientific community the data and results from the program. Here are the 7 centers and the organisms they work on 2 private companies, Caprion and Myriad 4 academic labs 1 government lab PNNL All are all required to provide their data, methods, technology, reagents etc. to the public much of it through the resource center, after validation, and whatever time needed to protect IP?? Three organizations comprise the resource center, SSS a private Washington contracting company, PIR at Georgetown which I represent and VBI

Well we have built a website and data repository and some search and anlysis tools that we will outline here, we cannot show you all the details here but there are some posters and demos are available during the meeting Here is our Home page hosted by SSS, you can information on the program and players The key entry point is the project Catalog page shown here were we try to summarize the programs progress and deliverables and link to resources Major deliverables are …. Our resources consist of 3 main integrated resources the MPD, MRD, and the DataCenter. The data center houses the data and protocols from the PRC, it also has some visualization and analysis tools Stephen will show you. The is a reagent directory to track reagents from the program which are housed at other repositories, see antibodies BEI here. And here is the master protein directory where we integrate the data, reagents, experiment results and more around protein information. So lets look at some of the data in the MPD and tell you some of our challenges and solutions and what you can do with the directory. www.proteomicsresource.org

Here is a query where we pulled out 17 mouse proteins detected in microphages infected with either anthracis or salmonella typimurium and seen with both MS and microarray There are 285 if I remove the microarray restriction and just compare MS Anything interesting here? Possibly but we have not analyzed it completely, you can find obvious housekeeping proteins like actin a few cytokine and signaling pathway proteins, that one might expect to see in activated macrophages, here is a protein in the map kinase signaling pathway. Here is one that is completely uncharacterized anywhere though you find homlogs in ameba, fish and humans there does not seem to be anything in the public databases or literature we could find. Some of these examples are on our poster if you want to see more. Mouse proteins detected in B. anthracis and S. typhimurium infected macrophages

Integrated Analysis:Selection Pressure, Entropy; Epitope Dengue DENV1 DENV3 DENV2 DENV4

Additional Structure Analysis Dengue aa site variant Interacting residues exposed Interacting residues Exposed Result: identification of diagnostic and vaccine targets