Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Services as Integrators of Public Chemistry Databases Gary Wiggins School of Informatics Indiana University

Similar presentations


Presentation on theme: "Web Services as Integrators of Public Chemistry Databases Gary Wiggins School of Informatics Indiana University"— Presentation transcript:

1 Web Services as Integrators of Public Chemistry Databases Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu

2 Interdisciplinary Science  “The boundary between the known and the unknown is where science flourishes.” --Michael Shermer  “There are known knowns… There are known unknowns… There are unknown unknowns…” --Donald Rumsfeld  BUT….  What about UNKNOWN knowns?

3 zzzzzCAS ““““In an industry first, Chemical Abstracts Service (CAS) has unveiled a revolutionary new literature searching tool which will permit scientists to search and retrieve the world’s chemical literature— including patents and obscure technical reports—in their sleep.” --Author unknown

4 Overview of the Talk  Introduction to Web Services  Public Databases and Data Repositories for Chemistry  Projects Underway or Planned at Indiana University

5 Web Services Introduction  What are “Web Services”? A distributed invocation system built on Grid computing A distributed invocation system built on Grid computing Independent of platform and programming languageIndependent of platform and programming language Built on existing Web standardsBuilt on existing Web standards A service oriented architecture with A service oriented architecture with Interfaces based on Internet protocolsInterfaces based on Internet protocols Messages in XML (except for binary data attachments)Messages in XML (except for binary data attachments)

6 Service Oriented Architecture (SOA)  Goal is to achieve loose coupling among interacting software agents  Define service: a unit of work done by a service provider to achieve desired end results for a service consumer  Both provider and consumer are roles played by software agents on behalf of their owners.

7 Web Services Architectures  Individual services are registered globally Broken down into individual services with inputs and outputs specified Broken down into individual services with inputs and outputs specified  Services are published  Services are requested  Open registry, publishing, and requesting

8 Service-Oriented Architecture  From Curcin et al. DDT, 2005, 10(12),867

9 Web Services for Science  Invisible Services, Semantic Web, and Grid  Easy-to-use tools for any scientist  High throughput, resource intensive computing done for low cost/resources  Shared community Collaborations between labs and fields Collaborations between labs and fields Shared data Shared data Shared tools Shared tools

10 e-Science and the Grid 1  e-Science: Major UK Program global collaboration in key areas of science and the next generation of infrastructure that will enable it global collaboration in key areas of science and the next generation of infrastructure that will enable it reflects growing importance of international laboratories, satellites and sensors and their integrated analysis by distributed teamsreflects growing importance of international laboratories, satellites and sensors and their integrated analysis by distributed teams total investment of some £200M over the five-year period from 2001 to 2006total investment of some £200M over the five-year period from 2001 to 2006  CyberInfrastructure: the analogous US initiative  Grid Technology: supports e-Science & Cyberinfrastructure

11 e-Science and the Grid 2

12 Basic Architectures: Servlets/CGI and Web Services Browser Web Server HTTP GET/POST DB or MPI Appl. JDBC Web Server DB or MPI Appl. JDBC Browser Web Server SOAP GUI Client SOAP WSDL

13 When To Use Web Services?  Applications do not have severe restrictions on reliability and speed.  Two or more organizations need to cooperate. One needs to write an application that uses another’s service. One needs to write an application that uses another’s service.  Services can be upgraded independently of clients.  Services can be easily expressed with simple request/response semantics and simple state.

14 Web Services Benefits  Web services provide a clean separation between a capability and its user interface.  Increase in productivity  Increase in flexibility  Rapid return on investment  Integration across multiple applications

15 Web Services Advantages  Output in human- and computer-readable formats  I/O formats based on standard Internet protocols  Resources accessible server to server allow automated I/O  Integration based on specific services: you select services or data needed without downloading the entire data set

16 Web Services Advantages  Description protocols provide details of service provided and interface components  Semantic Web standards increase efficiency  Use a central registry and standardized description of services  Quality and status of the information is dynamically available

17 Web Services Drawbacks  Based on new technologies  Time and commitment required to learn  Standards still in a state of flux  Issues with quality of data, (and for chemistry, quantity of open data), security, and privacy

18 Components of Web Services  Protocols WSDL WSDL SOAP SOAP UDDI UDDI  XML as a basis for the protocols  Ontologies OWL: Ontology Web Language OWL: Ontology Web Language  Semantic Web

19 WSDL: Web Service Definition Language  Describes a service’s interface to clients  Services register themselves with Web Services  WSDL describes how to contact and interact with services I/O, operations and messages to aid interaction with client I/O, operations and messages to aid interaction with client

20 SOAP: Simple Object Access Protocol  Maps abstract WSDL service descriptions to concrete implementations  Flexible protocol to communicate information between server and server or client and server using XML  Supports Remote Procedure Calls  Allows layers (security, authentication, transactions) over the basic SOAP elements

21 UDDI: Universal Description, Discovery, and Integration  Provides ways for clients and services to interact with other services  Uses XML  Defines the means of access, e.g., URL URL E-Mail E-Mail  Defines services hosted by an entity  Business-oriented tags  Uses SOAP for communicating

22 XML and Web services  XML lends itself to distributed computing: It’s just a data description. It’s just a data description. Platform, programming language independent Platform, programming language independent  Web Services Description Language (WSDL) Describes how to invoke a service Describes how to invoke a service Can bind to SOAP, other protocols for actual invocation Can bind to SOAP, other protocols for actual invocation  Simple Object Access Protocol (SOAP) Wire protocol extension for conveying RPC calls Wire protocol extension for conveying RPC calls Can be carried over HTTP, SMTP Can be carried over HTTP, SMTP

23 RDF: Resource Description Framework  A standard for statements describing resources  RDF statements consist of a Subject: the resource being described Subject: the resource being described Predicate: the property ascribed to the resource Predicate: the property ascribed to the resource Object: the entity to which the resource bears that property Object: the entity to which the resource bears that property  An alternative to UDDI: said to more flexible and extensible than UDDI

24 RDFS: Resource Description Framework Schema  RDF statements ascribe properties to resources, but RDF provides no means for describing those properties.  RDFS defines classes and properties that are used to describe classes, properties, and other resources.  RDFS vocabulary descriptions are written in RDF.  RDFS provides a means of specifying a vocabulary for a particular domain.

25 OWL: Web Ontology Language  Builds on RDF and RDFS and adds a means for richer descriptions of properties and classes Disjoint classes Disjoint classes Cardinality of classes Cardinality of classes Characteristics of relations, like symmetry Characteristics of relations, like symmetry

26 Standards for Web Services  Business Process Execution Language for Web Services (BPEL4WS)  Ontology Web Language Semantics (OWL-S)  Web Service Modeling Ontology (WSMO)

27 Standards Setting Bodies: OASIS  OASIS: Organization for Advancement of Structured Information Standards ebXML: e-business XML ebXML: e-business XML UDDI: Universal Description, Discovery and Integration UDDI: Universal Description, Discovery and Integration  Global Grid Forum community of users, developers, and vendors leading the global standardization effort for grid computing community of users, developers, and vendors leading the global standardization effort for grid computing

28 Standards Setting Bodies: W3C  W3C: World Wide Web Consortium OWL: Ontology Web Language OWL: Ontology Web Language RDF/RDFS: Resource Description Framework/Schema RDF/RDFS: Resource Description Framework/Schema SOAP: Simple Object Access Protocol SOAP: Simple Object Access Protocol URI/URL/URN: Universal Resource Identifier/Locator/Name URI/URL/URN: Universal Resource Identifier/Locator/Name WSDL: Web Service Definition Language WSDL: Web Service Definition Language XML: eXtensible Markup Language XML: eXtensible Markup Language

29 Web Services Integration Projects: Biosciences  myGrid http://www.mygrid.org.uk/ http://www.mygrid.org.uk/ http://www.mygrid.org.uk/  BIOPIPE http://biopipe.org/ http://biopipe.org/ http://biopipe.org/  BioMOBY http://biomoby.org/ http://biomoby.org/ http://biomoby.org/

30 Web Services for Chemistry: Problems  Performance and scalability  Proprietary data  Competition from high-performance desktop applications -- Geoff Hutchison, it’s a puzzle blog, 2005-01-05  ALSO: Lack of a substantial body of trustworthy Open Access databases Lack of a substantial body of trustworthy Open Access databases Non-standard chemical data formats (over 40 in regular use and requiring normalization to one another) Non-standard chemical data formats (over 40 in regular use and requiring normalization to one another)

31 Necessary Ingredients in Chemistry  Chemical communities to assemble Open Access databases Well-defined quality assurance procedures performed by distributed peer-review systems Well-defined quality assurance procedures performed by distributed peer-review systems Software underlying the databases needs to be open source. Software underlying the databases needs to be open source.

32 BlueObelisk.org  A group of chemists, programmers, and informaticians working collaboratively on projects such as: Chemistry Development Kit (CDK) Chemistry Development Kit (CDK) JChemPaint JChemPaint Jmol Jmol JUMBO JUMBO NMRShiftDB NMRShiftDB Octet Octet Open Babel Open Babel QSAR QSAR World Wide Molecular Matrix (WWMM) World Wide Molecular Matrix (WWMM)

33 Components of the Semantic Web for Chemistry  XML – eXtensible Markup Language  RDF – Resource Description Framework  RSS – Rich Site Summary  Dublin Core – allows metadata-based newsfeeds  OWL – for ontologies  BPEL4WS – for workflow and web services Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3192- 3203. Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3192- 3203.

34 Chemistry Databases on the Web  Marc Nicklaus lists 37 databases as of October 2001 Must have structure searching and at least 100 molecules Must have structure searching and at least 100 molecules http://cactus.nci.nih.gov/ncidb2/chem_www.html http://cactus.nci.nih.gov/ncidb2/chem_www.html http://cactus.nci.nih.gov/ncidb2/chem_www.html  SoaringBear’s List has 15 databases http://geocities.com/soaringbear/biomed/chem.html http://geocities.com/soaringbear/biomed/chem.html http://geocities.com/soaringbear/biomed/chem.html

35 Institutional Repositories  NARSTO Quality Systems Science Center http://cdiac.esd.ornl.gov/programs/NARSTO/ http://cdiac.esd.ornl.gov/programs/NARSTO/ http://cdiac.esd.ornl.gov/programs/NARSTO/ Pollutant species in the troposphere over North America Pollutant species in the troposphere over North America Part of the Carbon Dioxide Information Analysis Center at ORNL Part of the Carbon Dioxide Information Analysis Center at ORNL NARSTO Data and Information Sharing Tool NARSTO Data and Information Sharing Tool http://mercury.ornl.gov/narsto/http://mercury.ornl.gov/narsto/http://mercury.ornl.gov/narsto/

36 Public Databases  Developmental Therapeutics Program/NCI Some assay data for download Some assay data for download Structures for over 200,000 compounds Structures for over 200,000 compounds http://dtp.nci.nih.gov/docs/dtp_search.htmlhttp://dtp.nci.nih.gov/docs/dtp_search.htmlhttp://dtp.nci.nih.gov/docs/dtp_search.html  Zinc and other screening databases  NIST computational chemistry database  Environmental fate and exposure databases

37 Other Public Databases 1  ChemExper Chemical Directory > 200,000 substances; > 10,000 IR spectra > 200,000 substances; > 10,000 IR spectra http://chemexper.com/ http://chemexper.com/ http://chemexper.com/  HIC-Up; Hetero-Compound Identification Centre – Uppsala 5384 substances as of 1/15/05 5384 substances as of 1/15/05 http://xray.bmc.uu.se/hicup/ http://xray.bmc.uu.se/hicup/ http://xray.bmc.uu.se/hicup/  Chemicals with Pharmaceutical Activity; a 3D Structural Database 400 3D structures 400 3D structures http://www.chem.ox.ac.uk/mom/chemical-database/ http://www.chem.ox.ac.uk/mom/chemical-database/ http://www.chem.ox.ac.uk/mom/chemical-database/

38 Isatin in ChemExper

39 Isatin ChemExper IR Spectrum

40 Isatin from HIC-Up

41 Other Public Databases 2  Cheminformatics.org 41 data sets in 9 categories as of 8/18/05 41 data sets in 9 categories as of 8/18/05 http://www.cheminformatics.org/ http://www.cheminformatics.org/ http://www.cheminformatics.org/  WebReactions http://webreactions.net/ http://webreactions.net/ http://webreactions.net/

42 Other Public Databases 3  MolTable http://www.moltable.org/ http://www.moltable.org/ http://www.moltable.org/  MatWeb Materials Property Data http://www.matweb.com/index.asp?ckck=1 http://www.matweb.com/index.asp?ckck=1 http://www.matweb.com/index.asp?ckck=1  Spectral Database for Organic Compounds (SDBS) Over 32,000 compounds Over 32,000 compounds Has EI-MS, FT-IR, 1 H NMR, 13 C NMR, Raman, ESR Has EI-MS, FT-IR, 1 H NMR, 13 C NMR, Raman, ESR http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi  NMRShiftDB (Christoph Steinbeck) 14,753 structures as of 8/19/05 14,753 structures as of 8/19/05 Features peer-reviewed submission of data sets Features peer-reviewed submission of data sets http://www.nmrshiftdb.org/ http://www.nmrshiftdb.org/ http://www.nmrshiftdb.org/

43 Comment on Link to PubChem  “It is great to know that - with PubChem - there will be something like a single point of entry for people interested in structures and their properties. We are looking forward to link our datasets to PubChem structures.” --Christoph Steinbeck, commenting on the NMRShiftDB, CHMINF-L, 19 April 2005

44 Other Public Databases: Commercial Teasers  FTIRsearch.com (Thermo Electron) Demo file of 575 spectra from 87,000 in the full database Demo file of 575 spectra from 87,000 in the full database https://ftirsearch.com/default3.htm https://ftirsearch.com/default3.htm https://ftirsearch.com/default3.htm  ChemACX 30 of >350 suppliers catalog data 30 of >350 suppliers catalog data http://chemacx.cambridgesoft.com/chemacx/index.asp http://chemacx.cambridgesoft.com/chemacx/index.asp http://chemacx.cambridgesoft.com/chemacx/index.asp  Sunset Molecular Discovery, LLC Wombat (World of Molecular BioAcTivity) Wombat (World of Molecular BioAcTivity) 117,007 entries with over 230,000 biological activities117,007 entries with over 230,000 biological activities Wombat PK Wombat PK Database for Clinical Pharmacokinetics: 643 substances with 4668 measurementsDatabase for Clinical Pharmacokinetics: 643 substances with 4668 measurements Three sample files from Wombat containing 341 Histamine-1 receptor antagonists Three sample files from Wombat containing 341 Histamine-1 receptor antagonists http://www.sunsetmolecular.com/ http://www.sunsetmolecular.com/ http://www.sunsetmolecular.com/

45 Indiana University Existing Projects  System for the Integration of Bioinformatics Services (SIBIOS) http://sibios.engr.iupui.edu http://sibios.engr.iupui.edu http://sibios.engr.iupui.edu  PlatCom: A Platform for Computational Comparative Genomics http://bio.informatics.indiana.edu/sunkim/Platcom/ http://bio.informatics.indiana.edu/sunkim/Platcom/ http://bio.informatics.indiana.edu/sunkim/Platcom/  Reciprocal Net http://www.reciprocalnet.org/index.html http://www.reciprocalnet.org/index.html http://www.reciprocalnet.org/index.html

46 Indiana University Planned Projects  Design of a Grid-based distributed data architecture  Development of tools for HTS data analysis and virtual screening  Database for quantum mechanical simulation data  Chemical prototype projects Novel routes to enzymatic reaction mechanisms Novel routes to enzymatic reaction mechanisms Mechanism-based drug design Mechanism-based drug design Data-inquiry-based development of new methods in natural product synthesis Data-inquiry-based development of new methods in natural product synthesis

47 Community Grids Lab

48 Open Systems Lab

49 Web Services Future  Depends on Adoption of standards Adoption of standards Incorporation of WS in current and newly developed applications Incorporation of WS in current and newly developed applications Security, privacy, quality of data issues Security, privacy, quality of data issues Development of WS tools and resources for e-Science Development of WS tools and resources for e-Science

50 “Tripos Embraces Web Services”  Goal: to make it easier to incorporate Tripos’ high-throughput workflows Tripos’ Service-Oriented Informatics (SOI) Tripos’ Service-Oriented Informatics (SOI)  In partnership with SciTegic: Web services ensure software compatibility between the companies’ products.

51 Provocative Thought “While much bioscience is published with the knowledge that machines will be expected to understand at least part of it, almost all chemistry is published purely for humans to read.” Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3201. Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3201.

52 Closing quote “The future of chemistry depends on the automated analysis of chemical knowledge, combining disparate data sources in a single resource, such as the World-Wide Molecular Matrix, which can be analysed using computational techniques to assess and build on these data.” Townsend et al. Org. Biomol. Chem. 2004, 2, 3299. Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.

53 Acknowledgements  Christoph Steinbeck  Geoffrey Fox, Marlon Pierce  Peter Murray-Rust  Michael D. Jensen, Timothy B. Patrick, Joyce A. Mitchell  Elsevier

54 Bibliography  Coles, Simon J.; Day, Nick E.; Murray-Rust, Peter; Rzepa, Henry S.; Zhang, Yong. “Enhancement of the chemical semantic web through InChIfication.” Organic & Biomolecular Chemistry 2005, 3, 1832-1834.  Curcin, Vera; Ghanem, Moustafa; Guo, Yike. "Web services in the life sciences." Drug Discovery Today 2005, 10(12), 865-871.  Day, Michael. Metadata: Mapping between metadata formats. http://www.ukoln.ac.uk/metadata/interoperability/ (accessed 8/6/2005) http://www.ukoln.ac.uk/metadata/interoperability/  De Roure, D.; Jennings, N. R.; Shadbolt, N. R. “The Semantic Grid: Past, present and future.” Proceedings of the IEEE 2005, 93(3), 669-681.  Jensen, Michael; Patrick, Timothy B.; Mitchell, Joyce A. “Web services for bioinformatics.” PSB 2004 Tutorial  Murray-Rust, Peter S.; Mitchell, John B.O.; Rzepa, Henry S. “Communication and re-use of chemical information in bioscience.” BMC Bioinformatics 2005, 6, 180.  Murray-Rust, Peter; Mitchell, John B.O.; Rzepa, Henry S. “Chemistry in bioinformatics.” BMC Bioinformatics 2005, 6, 141-144.

55 Bibliography  Murray-Rust, Peter; Rzepa, Henry S.; Tyrrell, Simon M.; Zhang, Yong. “Representation and use of chemistry in the globabl electronic age.” Organic & Biomolecular Chemistry 2004, 2, 3192-3203.  Salamone, Salvatore. “Tripos embraces Web Services. Bio-IT World, 2005, 4(8), 8.  Shermer, Michael. “Rumsfeld’s Wisdom.” Scientific American 2005, 293(3), 38.  Townsend, Joe A.; Adams, Sam E.; Waudby, Christopher A.; de Souza, Vanessa K.; Goodman, Jonathan M; Murray-Rust, Peter. “Chemical documents: machine understanding and automated information extraction.” Organic & Biomolecular Chemistry 2004, 2, 3294-3300.  World Wide Molecular Matrix. http://wwmm.ch.cam.ac.uk/wwmm.html http://wwmm.ch.cam.ac.uk/wwmm.html  Summary Report – W3C Workshop on Semantic Web for Life Sciences, October 27-28, 2004. http://www.w3.org/2004/10/swls-workshop-report.html http://www.w3.org/2004/10/swls-workshop-report.html


Download ppt "Web Services as Integrators of Public Chemistry Databases Gary Wiggins School of Informatics Indiana University"

Similar presentations


Ads by Google