Building a Chemical Informatics Grid Marlon Pierce Community Grids Laboratory Indiana University.

Slides:



Advertisements
Similar presentations
Integrating ChemAxon technology into your End User Applications Java solutions for cheminformatics Ver. Mar., 2005.
Advertisements

JKlustor clustering chemical libraries presented by … maintained by Miklós Vargyas Last update: 25 March 2010.
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
3/22/2006Community Grids Lab1 VOTable Services. 3/22/2006Community Grids Lab2 ServiceDescriptionInputOutput FileGenerator Service Combines clusterfile.
Indiana University School of David Wild – CICC Quarterly Meeting, Jan Page 1 Projects 1-4 update David Wild CICC Quarterly Meeting January 27.
CICC Web services and Issues Jungkee (Jake) Kim Community Grids Laboratory.
VARUNA – Towards a Grid- based Molecular Modeling Environment CICC/MACE – Meeting May 22, 2006 Mookie Baik Department of Chemistry & School of Informatics.
Indiana University School of David Wild – Research Overview April Page 1 Research Update, April 2006 David Wild Assistant Professor of Chemical Informatics.
1 Overview of Chemical Informatics and Cyberinfrastructure Collaboratory Aug Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology.
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory.
Building a Chemical Informatics Grid Marlon Pierce Community Grids Laboratory Indiana University.
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Analysis of High-Throughput Screening Data C371 Fall 2004.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Community Grids Lab CICC Activities Geoffrey Fox, Marlon Pierce Indiana University.
Chemical Informatics and Cyber- infrastructure Building Blocks Chemical Informatics Resources:  Deluge of experimental data > 100,000 compounds screened.
1 E-Chemistry and Web 2.0 Marlon Pierce Community Grids Lab Indiana University.
University of Leeds Department of Chemistry The New MCM Website Stephen Pascoe, Louise Whitehouse and Andrew Rickard.
Information Retrieval in Practice
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
Building Services for BCI with Taverna Jungkee (Jake) Kim Community Grids Laboratory.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Overview of Search Engines
AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.
A ‘How To’ on Reproducing Data Obtained During The CHEM6128: Mini Project.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
1 Gary Wiggins for Geoffrey Fox April 30, 2007 Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana.
Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories.
AMBIT Chemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria.
Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
1 Joint meeting of the Molecular Libraries Screening Centers Network (MLSCN) and the Exploratory Centers for Cheminformatics Research (ECCR): Talk II July.
Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
Copyright © 2006 Knovel Corporation Streamline Your Science and Engineering Research
1 Joint meeting of the Molecular Libraries Screening Centers Network (MLSCN) and the Exploratory Centers for Cheminformatics Research (ECCR): Talk I July.
1 Web 2.0 and Grids for Scholarly Research Peking University July Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories.
UCL DEPARTMENT OF SPACE AND CLIMATE PHYSICS MULLARD SPACE SCIENCE LABORATORY Taverna Plugin VAMDC and HELIO (part of the ‘taverna-astronomy’ edition) Kevin.
1 Overview of Chemical Informatics and Cyberinfrastructure Collaboratory October Geoffrey Fox Computer Science, Informatics, Physics Pervasive.
Taming the Big Data in Computational Chemistry #euroCRIS2015 Barcelona 9-11-XI-2015 Carles Bo ICIQ (BIST) -
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Use of Machine Learning in Chemoinformatics
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Chemical Informatics and Cyberinfrastructure Collaboratory An NIH-Funded Exploratory Center for Cheminformatics Research Project of the IU School of Informatics.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Information Retrieval in Practice
Building CICC Web services
Search Engine Architecture
Chemical Informatics and Cyberinfrastructure Collaboratory
Gary Wiggins for Geoffrey Fox
CICC Project Meeting Introduction to VOTable 1.1
CICC Combines Grid Computing with Chemical Informatics
CICC Chemical Compound Mining Workflows
Extracting Recipes from Chemical Academic Papers
Chemical Informatics and Cyberinfrastructure Collaboratory
Presentation transcript:

Building a Chemical Informatics Grid Marlon Pierce Community Grids Laboratory Indiana University

CICC Project Information Chemical Informatics and Cyberinfrastructure Collaboratory is an NIH and MS-funded research project to combine the CIs. Project web site and more information Team members include Computer Science: Geoffrey Fox (PI), Dennis Gannon, Beth Plale, Marlon Pierce, Yuqing (Melanie) Wu, Malika Mahoui, Jake Kim Chemical Informatics and Chemistry: Gary Wiggins, Mu-Hyun (Mookie) Baik, David Wild, Rajarshi Guha, Kevin Gilbert I have stolen slides and content from these fine people. We collaborate with several groups Peter Murray Rusts group at University of Cambridge University of Michigans MACE group. Chemistry Development Kit (CDK) project DTP NIC at NIH Scripps High Throughput Screening Center

Chemical Informatics and Cyber- infrastructure Building Blocks Chemical Informatics Resources: Deluge of experimental data > 100,000 compounds screened by 10 publicly funded high throughput screening centers using various assay techniques (molecular to cellular) Molecular Libraries Screening Center Network Chemical databases maintained by various groups NIH PubChem, NIH DTP Chemical informatics and computational chemistry Data clustering, data mining, descriptor calculations, toxicity prediction, docking, molecular modeling, and quantum chemistry Visualization tools Web resources: journal articles, etc. A Chemical Informatics Grid will need to integrate these into a common, loosely coupled, open, distributed computing environment.

Our Solution Stack Domain specific Web Services VOTables, CDK services Grid services, Cyber- infrastructure for computationally intensive applications. Clustering, quantum chemistry Workflow and service management We work with Taverna Many solutions: Kepler, BPEL engines, etc. Portlets and other user interfaces Rich desktop apps Ubiquitous clients Portals and Other User Interfaces Workflow and Service Management Web and Grid Services Each level is subject for research and development, as is their integration.

A Library of Chemical Informatics Web Services

All Services Great and Small Like most Grids, a Chemical Informatics Grid will have the classic styles: Data Grid Services: these provide access to data sources like PubChem, etc. Execution Grid Services: used for running cluster analysis programs, molecular modeling codes, etc, on TeraGrid and similar places. But we also need many additional services Handling format conversions (InChI SMILES) Shipping and manipulating tabular data Determining toxicity of compounds Generating batch 2D images So one of our core activities is build lots of services

VOTables: Handling Tabular Data Developed by the Virtual Observatory community for encoding astronomy data. The VOTable format is an XML representation of the tabular data (data coming from BCI, NIH DTP databases, and so on). VOTables-compatible tools have been built We just inherit them. SAVOT and JAVOT JAVA Parser APIs for VOTable allow us to easily build VOTable-based applications Web Services Spread sheet Plotting applications. VOPlot and TopCat are two

mrtd1.txt – smiles representation of chemical compounds along with its properties

Votable.xml : xml representation of mrtd1.txt file

VOPlot Application from generated votable.xml file : Graph plotted on Mass (X–axis) and PSA (Y-axis)

More Services: WWMM Services ServicesDescriptionsInputOutput InChIGoogleSearch an InChI structure through Google inchiBasic type Search result in HTML format InChIServerGenerate InChIversion format An InChI structure OpenBabelS erver Transform a chemical format to another using Open Babel format inputData outputData options Converted chemical structure string CMLRSSSer ver Generate CMLRSS feed from CML data mol, title description link, source Converted CMLRSS feed of CML data

CDK-Based Services Common Substructure Calculates the common substructure between two molecules. CDKsimTakes two SMILES and evaluates the Tanimoto coefficient (ratio of intersection to union of their fingerprints). CDKdescCalculates a variety of molecular and atomic descriptors for QSAR modeling CDKwsFingerprint generation CDKsdgCreates a jpeg of the compounds 2D structure CDKStruct3DGenerates 3D coordinates of a molecule from its SMILE

ToxTree Service The Threshold of Toxicological Concern (TTC) establishes a level of exposure for all chemicals below which there would be no appreciable risk to human health. ToxTree implements the Cramer Decision Tree approach to estimate TTC. We have converted this into a service. Uses SMILES as input. Note the GUI must be separated from the library to be a service

OSCAR3 Service Oscar3 is a tool for shallow, chemistry-specific natural language parsing of chemical documents (i.e. journal articles). It identifies (or attempts to identify): Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms. Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections. Other entities: Things like N(5)-C(3) and so on. Results are exported as an XML file. There is a larger effort, SciBorg, in this area It also has potentially very interesting Workflows

Use Cases and Workflows Putting data and clustering together in a distributed environment.

A Workflow Scenario: HTS Data Organization and Flagging This workflow demonstrates how screening data can be flagged and organized for human analysis. The compounds and data values for a particular screen are retrieved from the NIH DTP database and then are filtered to remove compounds with reactive groups, etc. A tumor cell line is selected. The activity results for all the compounds in the DTP database in the given range are extracted from the PostgreSQL database OpenEye FILTER is used to calculate biological and chemical properties of the compounds that are related to their potential effectiveness as drugs ToxTree is used to flag the potential toxicities of compounds. Divkmeans is used to add a column of cluster numbers. Finally, the results are visualized using VOPlot and the 2D viewer applet.

HTS data organization & flagging A tumor cell line is selected. The activity results for all the compounds in the DTP database in the given range are extracted from the PostgreSQL database The compounds are clustered on chemical structure similarity, to group similar compounds together The compounds along with property and cluster information are converted to VOTABLES format and displayed in VOPLOT OpenEye FILTER is used to calculate biological and chemical properties of the compounds that are related to their potential effectiveness as drugs

Web Services

Example plots of our workflow output using VOPlot and VOTables

Chemical Informatics and the TeraGrid

A Workflow for IUs Big Red Demo PubMed abstracts 555,007 PubMed abstracts of 2005 – 2006 (part) 1,000 abstracts per node 511 nodes X 1,000 input abstracts used for the demo OSCAR3 Extracts chemical information from text and produces an XML instance highlighting the chemical information SMILES extraction Extracting SMILES elements from OSCARs XML output files Unique SMILES list within a batch Use this to drive docking and molecular modeling applications.

Bigger Picture for the Workflow NIH PubMed Database OSCAR Text Analysis POV-Ray Parallel Rendering Initial 3D Structure Calculation Toxicity Filtering Cluster Grouping Docking Molecular Mechanics Calculations Quantum Mechanics Calculations IUs Varuna Database NIH PubChem Database Big Red Demo High Throughput Screening (HTS) Data Organization and Flagging

A Workflow for Big Red Demo Final HTML pages

VARUNA – Towards a Grid-based Molecular Modeling Environment Taking the Big Red demo from stunt to science.

Automatic Quantum Mechanical Curation of Structure Data (AutoGeFF) Chemical research logic is often driven by molecular structure Large-scale, small molecule DBs (such as PubChem and, through OSCAR, PubMed) have low-resolution structure data Often key properties are not consistently available: e.g.: Rotation-barriers, Redox Potentials, Polarizabilities, IR frequencies, reactivity towards nucleophiles QM web-services will provide tools for generating high-resolution data Produces a new, curated database of QM results These can then be combined with databases of proteins (PDB, MOAD, PDB-Bind) for docking and other detailed simulation studies.

Prototype-Project: Controlling the TGF pathway PDB 1IAS Inactive TGF VARUNA Experiments in the Zhang Lab Active TGF With inhibitor PubChem in-house Molecules in Varuna Conceptual Understanding of TGF Understanding of TGF Inhibition Simulations AutoGeFF Questions: - What molecular feature controls inhibitor binding? - How do mutations impact binding?

More Information Contact me: Website and wiki: We have project plus collaborator mailing lists if you really are interested.