Download presentation
Presentation is loading. Please wait.
Published bySybil Berry Modified over 8 years ago
1
Web Services as Integrators of Public Chemistry Databases Gary Wiggins School of Informatics Indiana University wiggins@indiana.edu
2
Interdisciplinary Science “The boundary between the known and the unknown is where science flourishes.” --Michael Shermer “There are known knowns… There are known unknowns… There are unknown unknowns…” --Donald Rumsfeld BUT…. What about UNKNOWN knowns?
3
zzzzzCAS ““““In an industry first, Chemical Abstracts Service (CAS) has unveiled a revolutionary new literature searching tool which will permit scientists to search and retrieve the world’s chemical literature— including patents and obscure technical reports—in their sleep.” --Author unknown
4
Overview of the Talk Introduction to Web Services Public Databases and Data Repositories for Chemistry Projects Underway or Planned at Indiana University
5
Web Services Introduction What are “Web Services”? A distributed invocation system built on Grid computing A distributed invocation system built on Grid computing Independent of platform and programming languageIndependent of platform and programming language Built on existing Web standardsBuilt on existing Web standards A service oriented architecture with A service oriented architecture with Interfaces based on Internet protocolsInterfaces based on Internet protocols Messages in XML (except for binary data attachments)Messages in XML (except for binary data attachments)
6
Service Oriented Architecture (SOA) Goal is to achieve loose coupling among interacting software agents Define service: a unit of work done by a service provider to achieve desired end results for a service consumer Both provider and consumer are roles played by software agents on behalf of their owners.
7
Web Services Architectures Individual services are registered globally Broken down into individual services with inputs and outputs specified Broken down into individual services with inputs and outputs specified Services are published Services are requested Open registry, publishing, and requesting
8
Service-Oriented Architecture From Curcin et al. DDT, 2005, 10(12),867
9
Web Services for Science Invisible Services, Semantic Web, and Grid Easy-to-use tools for any scientist High throughput, resource intensive computing done for low cost/resources Shared community Collaborations between labs and fields Collaborations between labs and fields Shared data Shared data Shared tools Shared tools
10
e-Science and the Grid 1 e-Science: Major UK Program global collaboration in key areas of science and the next generation of infrastructure that will enable it global collaboration in key areas of science and the next generation of infrastructure that will enable it reflects growing importance of international laboratories, satellites and sensors and their integrated analysis by distributed teamsreflects growing importance of international laboratories, satellites and sensors and their integrated analysis by distributed teams total investment of some £200M over the five-year period from 2001 to 2006total investment of some £200M over the five-year period from 2001 to 2006 CyberInfrastructure: the analogous US initiative Grid Technology: supports e-Science & Cyberinfrastructure
11
e-Science and the Grid 2
12
Basic Architectures: Servlets/CGI and Web Services Browser Web Server HTTP GET/POST DB or MPI Appl. JDBC Web Server DB or MPI Appl. JDBC Browser Web Server SOAP GUI Client SOAP WSDL
13
When To Use Web Services? Applications do not have severe restrictions on reliability and speed. Two or more organizations need to cooperate. One needs to write an application that uses another’s service. One needs to write an application that uses another’s service. Services can be upgraded independently of clients. Services can be easily expressed with simple request/response semantics and simple state.
14
Web Services Benefits Web services provide a clean separation between a capability and its user interface. Increase in productivity Increase in flexibility Rapid return on investment Integration across multiple applications
15
Web Services Advantages Output in human- and computer-readable formats I/O formats based on standard Internet protocols Resources accessible server to server allow automated I/O Integration based on specific services: you select services or data needed without downloading the entire data set
16
Web Services Advantages Description protocols provide details of service provided and interface components Semantic Web standards increase efficiency Use a central registry and standardized description of services Quality and status of the information is dynamically available
17
Web Services Drawbacks Based on new technologies Time and commitment required to learn Standards still in a state of flux Issues with quality of data, (and for chemistry, quantity of open data), security, and privacy
18
Components of Web Services Protocols WSDL WSDL SOAP SOAP UDDI UDDI XML as a basis for the protocols Ontologies OWL: Ontology Web Language OWL: Ontology Web Language Semantic Web
19
WSDL: Web Service Definition Language Describes a service’s interface to clients Services register themselves with Web Services WSDL describes how to contact and interact with services I/O, operations and messages to aid interaction with client I/O, operations and messages to aid interaction with client
20
SOAP: Simple Object Access Protocol Maps abstract WSDL service descriptions to concrete implementations Flexible protocol to communicate information between server and server or client and server using XML Supports Remote Procedure Calls Allows layers (security, authentication, transactions) over the basic SOAP elements
21
UDDI: Universal Description, Discovery, and Integration Provides ways for clients and services to interact with other services Uses XML Defines the means of access, e.g., URL URL E-Mail E-Mail Defines services hosted by an entity Business-oriented tags Uses SOAP for communicating
22
XML and Web services XML lends itself to distributed computing: It’s just a data description. It’s just a data description. Platform, programming language independent Platform, programming language independent Web Services Description Language (WSDL) Describes how to invoke a service Describes how to invoke a service Can bind to SOAP, other protocols for actual invocation Can bind to SOAP, other protocols for actual invocation Simple Object Access Protocol (SOAP) Wire protocol extension for conveying RPC calls Wire protocol extension for conveying RPC calls Can be carried over HTTP, SMTP Can be carried over HTTP, SMTP
23
RDF: Resource Description Framework A standard for statements describing resources RDF statements consist of a Subject: the resource being described Subject: the resource being described Predicate: the property ascribed to the resource Predicate: the property ascribed to the resource Object: the entity to which the resource bears that property Object: the entity to which the resource bears that property An alternative to UDDI: said to more flexible and extensible than UDDI
24
RDFS: Resource Description Framework Schema RDF statements ascribe properties to resources, but RDF provides no means for describing those properties. RDFS defines classes and properties that are used to describe classes, properties, and other resources. RDFS vocabulary descriptions are written in RDF. RDFS provides a means of specifying a vocabulary for a particular domain.
25
OWL: Web Ontology Language Builds on RDF and RDFS and adds a means for richer descriptions of properties and classes Disjoint classes Disjoint classes Cardinality of classes Cardinality of classes Characteristics of relations, like symmetry Characteristics of relations, like symmetry
26
Standards for Web Services Business Process Execution Language for Web Services (BPEL4WS) Ontology Web Language Semantics (OWL-S) Web Service Modeling Ontology (WSMO)
27
Standards Setting Bodies: OASIS OASIS: Organization for Advancement of Structured Information Standards ebXML: e-business XML ebXML: e-business XML UDDI: Universal Description, Discovery and Integration UDDI: Universal Description, Discovery and Integration Global Grid Forum community of users, developers, and vendors leading the global standardization effort for grid computing community of users, developers, and vendors leading the global standardization effort for grid computing
28
Standards Setting Bodies: W3C W3C: World Wide Web Consortium OWL: Ontology Web Language OWL: Ontology Web Language RDF/RDFS: Resource Description Framework/Schema RDF/RDFS: Resource Description Framework/Schema SOAP: Simple Object Access Protocol SOAP: Simple Object Access Protocol URI/URL/URN: Universal Resource Identifier/Locator/Name URI/URL/URN: Universal Resource Identifier/Locator/Name WSDL: Web Service Definition Language WSDL: Web Service Definition Language XML: eXtensible Markup Language XML: eXtensible Markup Language
29
Web Services Integration Projects: Biosciences myGrid http://www.mygrid.org.uk/ http://www.mygrid.org.uk/ http://www.mygrid.org.uk/ BIOPIPE http://biopipe.org/ http://biopipe.org/ http://biopipe.org/ BioMOBY http://biomoby.org/ http://biomoby.org/ http://biomoby.org/
30
Web Services for Chemistry: Problems Performance and scalability Proprietary data Competition from high-performance desktop applications -- Geoff Hutchison, it’s a puzzle blog, 2005-01-05 ALSO: Lack of a substantial body of trustworthy Open Access databases Lack of a substantial body of trustworthy Open Access databases Non-standard chemical data formats (over 40 in regular use and requiring normalization to one another) Non-standard chemical data formats (over 40 in regular use and requiring normalization to one another)
31
Necessary Ingredients in Chemistry Chemical communities to assemble Open Access databases Well-defined quality assurance procedures performed by distributed peer-review systems Well-defined quality assurance procedures performed by distributed peer-review systems Software underlying the databases needs to be open source. Software underlying the databases needs to be open source.
32
BlueObelisk.org A group of chemists, programmers, and informaticians working collaboratively on projects such as: Chemistry Development Kit (CDK) Chemistry Development Kit (CDK) JChemPaint JChemPaint Jmol Jmol JUMBO JUMBO NMRShiftDB NMRShiftDB Octet Octet Open Babel Open Babel QSAR QSAR World Wide Molecular Matrix (WWMM) World Wide Molecular Matrix (WWMM)
33
Components of the Semantic Web for Chemistry XML – eXtensible Markup Language RDF – Resource Description Framework RSS – Rich Site Summary Dublin Core – allows metadata-based newsfeeds OWL – for ontologies BPEL4WS – for workflow and web services Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3192- 3203. Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3192- 3203.
34
Chemistry Databases on the Web Marc Nicklaus lists 37 databases as of October 2001 Must have structure searching and at least 100 molecules Must have structure searching and at least 100 molecules http://cactus.nci.nih.gov/ncidb2/chem_www.html http://cactus.nci.nih.gov/ncidb2/chem_www.html http://cactus.nci.nih.gov/ncidb2/chem_www.html SoaringBear’s List has 15 databases http://geocities.com/soaringbear/biomed/chem.html http://geocities.com/soaringbear/biomed/chem.html http://geocities.com/soaringbear/biomed/chem.html
35
Institutional Repositories NARSTO Quality Systems Science Center http://cdiac.esd.ornl.gov/programs/NARSTO/ http://cdiac.esd.ornl.gov/programs/NARSTO/ http://cdiac.esd.ornl.gov/programs/NARSTO/ Pollutant species in the troposphere over North America Pollutant species in the troposphere over North America Part of the Carbon Dioxide Information Analysis Center at ORNL Part of the Carbon Dioxide Information Analysis Center at ORNL NARSTO Data and Information Sharing Tool NARSTO Data and Information Sharing Tool http://mercury.ornl.gov/narsto/http://mercury.ornl.gov/narsto/http://mercury.ornl.gov/narsto/
36
Public Databases Developmental Therapeutics Program/NCI Some assay data for download Some assay data for download Structures for over 200,000 compounds Structures for over 200,000 compounds http://dtp.nci.nih.gov/docs/dtp_search.htmlhttp://dtp.nci.nih.gov/docs/dtp_search.htmlhttp://dtp.nci.nih.gov/docs/dtp_search.html Zinc and other screening databases NIST computational chemistry database Environmental fate and exposure databases
37
Other Public Databases 1 ChemExper Chemical Directory > 200,000 substances; > 10,000 IR spectra > 200,000 substances; > 10,000 IR spectra http://chemexper.com/ http://chemexper.com/ http://chemexper.com/ HIC-Up; Hetero-Compound Identification Centre – Uppsala 5384 substances as of 1/15/05 5384 substances as of 1/15/05 http://xray.bmc.uu.se/hicup/ http://xray.bmc.uu.se/hicup/ http://xray.bmc.uu.se/hicup/ Chemicals with Pharmaceutical Activity; a 3D Structural Database 400 3D structures 400 3D structures http://www.chem.ox.ac.uk/mom/chemical-database/ http://www.chem.ox.ac.uk/mom/chemical-database/ http://www.chem.ox.ac.uk/mom/chemical-database/
38
Isatin in ChemExper
39
Isatin ChemExper IR Spectrum
40
Isatin from HIC-Up
41
Other Public Databases 2 Cheminformatics.org 41 data sets in 9 categories as of 8/18/05 41 data sets in 9 categories as of 8/18/05 http://www.cheminformatics.org/ http://www.cheminformatics.org/ http://www.cheminformatics.org/ WebReactions http://webreactions.net/ http://webreactions.net/ http://webreactions.net/
42
Other Public Databases 3 MolTable http://www.moltable.org/ http://www.moltable.org/ http://www.moltable.org/ MatWeb Materials Property Data http://www.matweb.com/index.asp?ckck=1 http://www.matweb.com/index.asp?ckck=1 http://www.matweb.com/index.asp?ckck=1 Spectral Database for Organic Compounds (SDBS) Over 32,000 compounds Over 32,000 compounds Has EI-MS, FT-IR, 1 H NMR, 13 C NMR, Raman, ESR Has EI-MS, FT-IR, 1 H NMR, 13 C NMR, Raman, ESR http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi http://www.aist.go.jp/RIODB/SDBS/cgi-bin/cre_index.cgi NMRShiftDB (Christoph Steinbeck) 14,753 structures as of 8/19/05 14,753 structures as of 8/19/05 Features peer-reviewed submission of data sets Features peer-reviewed submission of data sets http://www.nmrshiftdb.org/ http://www.nmrshiftdb.org/ http://www.nmrshiftdb.org/
43
Comment on Link to PubChem “It is great to know that - with PubChem - there will be something like a single point of entry for people interested in structures and their properties. We are looking forward to link our datasets to PubChem structures.” --Christoph Steinbeck, commenting on the NMRShiftDB, CHMINF-L, 19 April 2005
44
Other Public Databases: Commercial Teasers FTIRsearch.com (Thermo Electron) Demo file of 575 spectra from 87,000 in the full database Demo file of 575 spectra from 87,000 in the full database https://ftirsearch.com/default3.htm https://ftirsearch.com/default3.htm https://ftirsearch.com/default3.htm ChemACX 30 of >350 suppliers catalog data 30 of >350 suppliers catalog data http://chemacx.cambridgesoft.com/chemacx/index.asp http://chemacx.cambridgesoft.com/chemacx/index.asp http://chemacx.cambridgesoft.com/chemacx/index.asp Sunset Molecular Discovery, LLC Wombat (World of Molecular BioAcTivity) Wombat (World of Molecular BioAcTivity) 117,007 entries with over 230,000 biological activities117,007 entries with over 230,000 biological activities Wombat PK Wombat PK Database for Clinical Pharmacokinetics: 643 substances with 4668 measurementsDatabase for Clinical Pharmacokinetics: 643 substances with 4668 measurements Three sample files from Wombat containing 341 Histamine-1 receptor antagonists Three sample files from Wombat containing 341 Histamine-1 receptor antagonists http://www.sunsetmolecular.com/ http://www.sunsetmolecular.com/ http://www.sunsetmolecular.com/
45
Indiana University Existing Projects System for the Integration of Bioinformatics Services (SIBIOS) http://sibios.engr.iupui.edu http://sibios.engr.iupui.edu http://sibios.engr.iupui.edu PlatCom: A Platform for Computational Comparative Genomics http://bio.informatics.indiana.edu/sunkim/Platcom/ http://bio.informatics.indiana.edu/sunkim/Platcom/ http://bio.informatics.indiana.edu/sunkim/Platcom/ Reciprocal Net http://www.reciprocalnet.org/index.html http://www.reciprocalnet.org/index.html http://www.reciprocalnet.org/index.html
46
Indiana University Planned Projects Design of a Grid-based distributed data architecture Development of tools for HTS data analysis and virtual screening Database for quantum mechanical simulation data Chemical prototype projects Novel routes to enzymatic reaction mechanisms Novel routes to enzymatic reaction mechanisms Mechanism-based drug design Mechanism-based drug design Data-inquiry-based development of new methods in natural product synthesis Data-inquiry-based development of new methods in natural product synthesis
47
Community Grids Lab
48
Open Systems Lab
49
Web Services Future Depends on Adoption of standards Adoption of standards Incorporation of WS in current and newly developed applications Incorporation of WS in current and newly developed applications Security, privacy, quality of data issues Security, privacy, quality of data issues Development of WS tools and resources for e-Science Development of WS tools and resources for e-Science
50
“Tripos Embraces Web Services” Goal: to make it easier to incorporate Tripos’ high-throughput workflows Tripos’ Service-Oriented Informatics (SOI) Tripos’ Service-Oriented Informatics (SOI) In partnership with SciTegic: Web services ensure software compatibility between the companies’ products.
51
Provocative Thought “While much bioscience is published with the knowledge that machines will be expected to understand at least part of it, almost all chemistry is published purely for humans to read.” Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3201. Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3201.
52
Closing quote “The future of chemistry depends on the automated analysis of chemical knowledge, combining disparate data sources in a single resource, such as the World-Wide Molecular Matrix, which can be analysed using computational techniques to assess and build on these data.” Townsend et al. Org. Biomol. Chem. 2004, 2, 3299. Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.
53
Acknowledgements Christoph Steinbeck Geoffrey Fox, Marlon Pierce Peter Murray-Rust Michael D. Jensen, Timothy B. Patrick, Joyce A. Mitchell Elsevier
54
Bibliography Coles, Simon J.; Day, Nick E.; Murray-Rust, Peter; Rzepa, Henry S.; Zhang, Yong. “Enhancement of the chemical semantic web through InChIfication.” Organic & Biomolecular Chemistry 2005, 3, 1832-1834. Curcin, Vera; Ghanem, Moustafa; Guo, Yike. "Web services in the life sciences." Drug Discovery Today 2005, 10(12), 865-871. Day, Michael. Metadata: Mapping between metadata formats. http://www.ukoln.ac.uk/metadata/interoperability/ (accessed 8/6/2005) http://www.ukoln.ac.uk/metadata/interoperability/ De Roure, D.; Jennings, N. R.; Shadbolt, N. R. “The Semantic Grid: Past, present and future.” Proceedings of the IEEE 2005, 93(3), 669-681. Jensen, Michael; Patrick, Timothy B.; Mitchell, Joyce A. “Web services for bioinformatics.” PSB 2004 Tutorial Murray-Rust, Peter S.; Mitchell, John B.O.; Rzepa, Henry S. “Communication and re-use of chemical information in bioscience.” BMC Bioinformatics 2005, 6, 180. Murray-Rust, Peter; Mitchell, John B.O.; Rzepa, Henry S. “Chemistry in bioinformatics.” BMC Bioinformatics 2005, 6, 141-144.
55
Bibliography Murray-Rust, Peter; Rzepa, Henry S.; Tyrrell, Simon M.; Zhang, Yong. “Representation and use of chemistry in the globabl electronic age.” Organic & Biomolecular Chemistry 2004, 2, 3192-3203. Salamone, Salvatore. “Tripos embraces Web Services. Bio-IT World, 2005, 4(8), 8. Shermer, Michael. “Rumsfeld’s Wisdom.” Scientific American 2005, 293(3), 38. Townsend, Joe A.; Adams, Sam E.; Waudby, Christopher A.; de Souza, Vanessa K.; Goodman, Jonathan M; Murray-Rust, Peter. “Chemical documents: machine understanding and automated information extraction.” Organic & Biomolecular Chemistry 2004, 2, 3294-3300. World Wide Molecular Matrix. http://wwmm.ch.cam.ac.uk/wwmm.html http://wwmm.ch.cam.ac.uk/wwmm.html Summary Report – W3C Workshop on Semantic Web for Life Sciences, October 27-28, 2004. http://www.w3.org/2004/10/swls-workshop-report.html http://www.w3.org/2004/10/swls-workshop-report.html
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.