1 Gary Wiggins for Geoffrey Fox April 30, 2007 Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana.

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

Indiana University Chemical Informatics Programs Gary Wiggins
From Chemical Information to Cheminformatics: Graduate Programs at Indiana University Gary Wiggins School of Informatics May 21, 2007.
Indiana University School of David Wild – CICC Quarterly Meeting, Jan Page 1 Projects 1-4 update David Wild CICC Quarterly Meeting January 27.
Educational Opportunities in Cheminformatics at IU Gary Wiggins
CICC June meeting IUPUI team: Kelsey Forsythe Malika Mahoui Deepthi Jonnala Usha Cheemakurthi.
VARUNA – Towards a Grid- based Molecular Modeling Environment CICC/MACE – Meeting May 22, 2006 Mookie Baik Department of Chemistry & School of Informatics.
Educational Activities in Cheminformatics at IU Gary Wiggins
Indiana University School of David Wild – Research Overview April Page 1 Research Update, April 2006 David Wild Assistant Professor of Chemical Informatics.
1 Overview of Chemical Informatics and Cyberinfrastructure Collaboratory Aug Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology.
Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
Indiana University School of David Wild – Joint IU, Michigan, Lilly Meeting, October Page 1 Smart Mining Interfaces, Workflows, and Data Mining the.
Building a Chemical Informatics Grid Marlon Pierce Community Grids Laboratory Indiana University.
CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory.
Analysis of High-Throughput Screening Data C371 Fall 2004.
Educational Activities in Cheminformatics at IU Gary Wiggins
As computer network experiments increase in complexity and size, it becomes increasingly difficult to fully understand the circumstances under which a.
Community Grids Lab CICC Activities Geoffrey Fox, Marlon Pierce Indiana University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Chemical Informatics and Cyber- infrastructure Building Blocks Chemical Informatics Resources:  Deluge of experimental data > 100,000 compounds screened.
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
1 E-Chemistry and Web 2.0 Marlon Pierce Community Grids Lab Indiana University.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
3. Chemical Data and Data Bases. 2 Datasets and Databases Many small datasets are available Several commercial databases of compounds and reactions (e.g.
The Indiana University School of Informatics Bobby Schnabel: Dean, Indiana University School of Informatics Presented by Geoffrey Fox: Associate Dean for.
Indiana University School of David Wild – I Page 1 David Wild Chemical Informatics.
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
Combinatorial Chemistry and Library Design
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
PolarGrid Geoffrey Fox (PI) Indiana University Associate Dean for Graduate Studies and Research, School of Informatics and Computing, Indiana University.
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Grids for Chemical Informatics Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale Computer Science, Informatics, Physics Pervasive Technology Laboratories.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Computational Science and the School of Informatics at Indiana University IU/HBCU STEM Initiative IUPUI April Geoffrey Fox Computer Science, Informatics,
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
1 Joint meeting of the Molecular Libraries Screening Centers Network (MLSCN) and the Exploratory Centers for Cheminformatics Research (ECCR): Talk II July.
Indiana University School of David Wild, Geoffrey Fox, Bioinformatics retreat, February Page 1 Chemoinformatics David Wild, Bioinformatics.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
ECCR Overview/MLSCN. NIH Roadmap Series of initiatives designed to pursue major opportunities in biomedical research and gaps in current knowledge that.
1 Joint meeting of the Molecular Libraries Screening Centers Network (MLSCN) and the Exploratory Centers for Cheminformatics Research (ECCR): Talk I July.
1 Web 2.0 and Grids for Scholarly Research Peking University July Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories.
Social Networking for Scientists (Research Communities) Using Tagging and Shared Bookmarks: a Web 2.0 Application Marlon Pierce, Geoffrey Fox, Joshua Rosen,
1 Overview of Chemical Informatics and Cyberinfrastructure Collaboratory October Geoffrey Fox Computer Science, Informatics, Physics Pervasive.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
PubChem: An Open Repository for Chemical Structure and Biological Activity Information Steve Bryant The NIH Biowulf Cluster: 10 Years of Scientific Supercomputing.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
Event-Based Infrastructure for Reconciling Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey C. Fox.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Chemical Informatics and Cyberinfrastructure Collaboratory An NIH-Funded Exploratory Center for Cheminformatics Research Project of the IU School of Informatics.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Introduction to PubChem BioAssay
Enhancements to Galaxy for delivering on NIH Commons
MATLAB Distributed, and Other Toolboxes
Chemical Informatics and Cyberinfrastructure Collaboratory
Gary Wiggins for Geoffrey Fox
CICC Combines Grid Computing with Chemical Informatics
Event-Based Infrastructure for Reconciling Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey C. Fox.
Cyberinfrastructure and PolarGrid
CICC Chemical Compound Mining Workflows
New Ms and BS Chemical Informatics Programs
Chemical Informatics and Cyberinfrastructure Collaboratory
Presentation transcript:

1 Gary Wiggins for Geoffrey Fox April 30, 2007 Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN

2 Indiana University Focus Creating a comprehensive, easily accessible infrastructure for cheminformatics tools and data sources Becoming a central hub of cheminformatics education

3 CICC Web Service Infrastructure Cheminformatics services Statistics services Database services Grid services Portal services

Web Services Vision Web services provide a neutral approach to exposing functionality They can be located anywhere: On your desktop Intranet Internet Literally anything can be made into a web service: Libraries Standalone programs Commerical code Open-source code

Modes of Access Web Pages Workflow Tools Taverna, Pipeline Pilot, Xbaya, etc. GUIs Chimera RSS Feeds Feeds include 2D/3D structures in CML Viewable in Bioclipse, Jmol as well as Sage etc. Two feeds currently available: SynSearch – get structures based on full or partial chemical names DockSearch – get best N structures for a target

Where Does Our Functionality Come From? Indiana University VOTables NCI DTP predictions Database services Cambridge University InChi generation / search OSCAR OpenEye Docking DigitalChemistry BCI fingerprints DivKMeans CDK Cheminformatics Univ. of Michigan PkCell R Foundation R package NIH PubChem PubMed gNova Consulting European Chemicals Bureau ToxTree toxicity predictions

7 Methods Development at the CICC Tagging methods for web-based annotation exploiting del.icio.us and Connotea Development of QSAR model interpretability and applicability methods RNN-Profiles for exploration of chemical spaces VisualiSAR - SAR through visual analysis Visual Similarity Matrices for High Volume Datasets Fast, accurate clustering using parallel Divisive K- means Mapping of Natural Language queries to use cases and workflows

Algorithm Development Goals Focus on interpretability and applicability Devise novel approaches to clustering problems Investigate the utility of low dimensional representations for a variety of problems Examples Ensemble feature selection (JCIM, in press) Cluster counting with R-NN curves (in revision)

Chemical Data Mining Working on screening data with Scripps, FL Random forests (modeling & feature selection) Naïve Bayes (modeling) Identifying features indicative of toxicity Domain applicability NCI DTP Cell line activity predictions Random forest models for 60 cell lines All available as downloadable R models web services (supply SMILES, get prediction) with web page clients

Computational Infrastructure R, CDK, and PubChem Goals Access cheminformatics from within R Access PubChem data from within R rcdk package allows to do cheminformatics within R using CDK functionality rpubchem provides access to PubChem compound data and bioassay data Searchable via assay ID, keywords J. Stat. Soft, 2007, 18(6)

11 Example: R Statistics applied to PubChem data By exposing the R statistical package, and the Chemistry Development Kit (CDK) toolkit as web services and integrating them with PubChem, we can quickly and easily perform statistical analysis and virtual screening of PubChem assay data. Predictive models for particular screens are exposed as web services, and can be used either as simple web tools or integrated into other applications. Example below uses DTP Tumor Cell Line screens - a predictive model using Random Forests in R makes predictions of probability of activity across multiple cell lines (avail. at

Databases Our databases aim to add value to PubChem or link into PubChem 3D structures (MMFF94) Searchable by CID, SMARTS, 3D similarity Docked ligands (FRED) 960,000 drug-like compounds into 7 targets Will eventually cover ~2000 targets

13 Example: PubDock Database of 960K PubChem structures (the most drug-like) docked into proteins taken from the PDB Available as a web service, so structures can be accessed in your own programs, or using workflow tools like Pipeline Pilot Several interfaces developed, including one based on Chimera (below) which integrates the database with the PDB to allow browsing of compounds in different targets, or different compounds in the same target

How do we use all of this? Percent Inhibition or IC 50 data is retrieved from HTS Question: Was this screen successful? Question: What should the active/inactive cutoffs be? Question: What can we learn about the target protein or cell line from this screen? Compounds submitted to PubChem Workflows encoding distribution analysis of screening results Grids can link data analysis ( e.g image processing developed in existing Grids), traditional Chem- informatics tools, as well as annotation tools (Semantic Web, del.icio.us) and enhance lead ID and SAR analysis A Grid of Grids linking collections of services at PubChem ECCR centers MLSCN centers Workflows encoding plate & control well statistics, distribution analysis, etc Workflows encoding statistical comparison of results to similar screens, docking of compounds into proteins to correlate binding, with activity, literature search of active compounds, etc CHEMINFORMATICSPROCESS GRIDS

15 Example HTS workflow: Finding cell- protein relationships A protein implicated in tumor growth with a known ligand is selected (in this case HSP90 taken from the PDB 1Y4 complex). Similar structures to the ligand can be browsed using client portlets. Once docking is complete, the user visualizes the high- scoring docked structures in a portlet using the JMOL applet. Similar structures are filtered for drugability, are converted to 3D, and are automatically passed to the OpenEye FRED docking program for docking into the target protein. The screening data from a cellular HTS assay is similarity searched for compounds with 2D structures similar to the ligand. Docking results and activity patterns fed into R services for building of activity models and correlations Least Squares Regression Random Forests Neural Nets

16 Varuna environment for molecular modeling (Baik, IU) QM Database Researcher Simulation Service FORTRAN Code, Scripts Chemical Concepts Experiments QM/MM Database PubChem, PDB, NCI, etc. ChemBioGrid Reaction DB DB Service Queries, Clustering, Curation, etc. Papers etc. Condor TeraGrid Supercomputers “Flocks”

17 Cheminformatics Education at IU School of Informatics degree programs: BS, MS, PhD Cheminformatics MS and track on PhD in Informatics Informatics Undergraduates can choose a chemistry cognate (minor in chemistry) Also Bioinformatics MS and Bioinformatics and Complex Systems tracks on PhD in Informatics Good employer interest but modest student understanding of value of Cheminformatics degree 3 core graduate courses in Cheminformatics plus seminars and independent study courses Significant interest in distance education versions of courses promising for the Graduate Certificate in Chemical Informatics

18 Spreading cheminformatics education with distance education Partnered with the University of Michigan to offer our introductory graduate cheminformatics course at IU and Michigan as a CIC CourseShare UM pharmacy, chemistry and engineering students can be trained in cheminformatics for course credit at UM Individual students in academia, government, and small and large life science companies have taken the class remotely from all over the country for credit towards the graduate certificate Uses mixture of web conferencing (Breeze), videoconferencing, and online resources for maximum flexibility Most recent course wiki is available at wild/I571_2006_wiki wild/I571_2006_wiki Giving a class remotely to UM students with video and web conferencing

19 CICC Infrastructure Vision Drug Discovery and other academic chemistry and pharmacology research will be aided by powerful modern information technology. ChemBioGrid is set up as distributed cyberinfrastructure in eScience model. ChemBioGrid will provide user interfaces (portals) to distributed databases, results of high throughput screening instruments, results of computational chemical simulations and other analyses. ChemBioGrid will provide services to manipulate this data and combine in workflows; it will have convenient ways to submit and manage multiple jobs. ChemBioGrid will include access to PubChem, PubMed, PubMed Central, the Internet and its derivatives like Microsoft Academic Live and Google Scholar. The services include open-source software like CDK, commercial code from vendors such as Digital Chemistry, OpenEye, and Google, and any user contributed programs. ChemBioGrid will define open interfaces to use for a particular type of service allowing plug and play choices between different implementations.

20 CICC Senior Personnel Geoffrey C. Fox Mu-Hyun (Mookie) Baik Dennis B. Gannon Kevin E. Gilbert Rajarshi Guha Marlon Pierce Beth A. Plale Gary D. Wiggins David J. Wild Yuqing (Melanie) Wu Peter T. Cherbas Mehmet M. Dalkilic Charles H. Davis A. Keith Dunker Kelsey M. Forsythe John C. Huffman Malika Mahoui Daniel J. Mindiola Santiago D. Schnell William Scott Craig A. Stewart David R. Williams From Biology, Chemistry, Computer Science, Informatics at IU Bloomington and IUPUI (Indianapolis)