Presentation is loading. Please wait.

Presentation is loading. Please wait.

How chemists use data Dr William G Town President, Kilmorie Consulting Fourth Bloomsbury Conference on E-publishing and E- publications.

Similar presentations


Presentation on theme: "How chemists use data Dr William G Town President, Kilmorie Consulting Fourth Bloomsbury Conference on E-publishing and E- publications."— Presentation transcript:

1 How chemists use data Dr William G Town President, Kilmorie Consulting Fourth Bloomsbury Conference on E-publishing and E- publications 24 th and 25 th June 2010

2 Overview Chemistry documentation perspectiveChemistry documentation perspective Case study – CCDCCase study – CCDC Study of chemists behaviour – JISCStudy of chemists behaviour – JISC RSC Project ProspectRSC Project Prospect RSC ChemSpiderRSC ChemSpider OreChem projectOreChem project

3 Chemists have a long tradition of documenting chemistry Gmelin Handbook of Chemistry (1817- )Gmelin Handbook of Chemistry (1817- ) Beilstein Handbook of Organic Chemistry ( )Beilstein Handbook of Organic Chemistry ( ) Chemical Abstracts ( )Chemical Abstracts ( ) –CAS Online ( ) –STN Express ( ) –SciFinder ( ) ChemWeb ( )ChemWeb ( ) Reaxys ( )Reaxys ( )

4 Chemists have a long tradition of documenting chemistry Data centres (e.g. CCDC) started in 1960sData centres (e.g. CCDC) started in 1960s Extensive chemical database activitiesExtensive chemical database activities –Bibliographic databases (1960s – )(e.g. CAS) –Factual databases (1980s – )(e.g. Beilstein) –Open access databases (2000s – )(e.g. Crystal Eye)

5 What’s the status of chemistry online? Encyclopaedic articles (Wikipedia)Encyclopaedic articles (Wikipedia) Chemical vendor databasesChemical vendor databases Metabolic pathway databasesMetabolic pathway databases Virtual Screening databasesVirtual Screening databases Property databasesProperty databases Screening assay resultsScreening assay results Patents with chemical structures (IBM & SureChem)Patents with chemical structures (IBM & SureChem) ADME/Tox dataADME/Tox data Scientific publicationsScientific publications Compound aggregatorsCompound aggregators Blogs/Wikis and Open Notebook ScienceBlogs/Wikis and Open Notebook Science Commercial databasesCommercial databases

6 Chemists like structures digitonin

7 Cambridge Crystallographic Data Centre (CCDC)  Founded in 1965 with grant funding in the Department of Chemistry, University of Cambridge  Self financing, self administering Institution since 1987 –Not-for-profit, charitable, research Institute –Recognized institute for postgraduate degrees of the University of Cambridge  Objectives –“advancement and promotion of the science of chemistry and crystallography for the public benefit”

8 Cambridge Structural Database CSD Growth Worldwide repository of validated small-molecule crystal structures Dec 09 – 500,000 th structure milestone reached Lamotrigine Acta Cryst., Sect.C:Cryst Struct. Commun. (2009), 65, o460 Refcode: EFEMUX01

9 Knowledge mining using the CSD Knowledge mining using the CSD “Crystals are windows on the world of atoms” (Chet Raymo, Boston Globe, Science Musings) CSD System search and analysis software permit structural knowledge in the CSD to be mined from the raw data, to generate:  Crystallographic knowledge  Intra-molecular structural knowledge  Inter-molecular structural knowledge

10 Knowledge mining using the CSD Scientific Applications Structural chemistry and crystal engineeringStructural chemistry and crystal engineering Rational drug discovery and designRational drug discovery and design Protein – ligand interactions & ligand dockingProtein – ligand interactions & ligand docking Drug development, formulation and deliveryDrug development, formulation and delivery Materials research and developmentMaterials research and development Crystal structure predictionCrystal structure prediction Crystal structure determinationCrystal structure determination

11 A study of scholarly communication between chemists and of their use of Web 2.0 technologies Study commissioned by JISC (UK Joint Information Systems Committee)Study commissioned by JISC (UK Joint Information Systems Committee) Principal contractor was Publishing Directions (Deborah Kahn – project leader)Principal contractor was Publishing Directions (Deborah Kahn – project leader) Project team composed of Nicki Dennis, Lara Burns and meProject team composed of Nicki Dennis, Lara Burns and me Started November ‘08, reported in April ’09Started November ‘08, reported in April ’09 t.pdf

12 Background to the study Methods of scholarly communication have changed rapidly in the past decade. Improvements in computing and social networking technologies, digital data capture techniques, powerful data and text mining techniques and other technological changes enable practices that are collaborative, network based and highly intensive.

13 Background to the study We researched the needs of academics in two specific areas, economics and chemistry.We researched the needs of academics in two specific areas, economics and chemistry. Recommendations were made on advocacy programmes for each discipline which will be most effective for encouraging optimum take up of useful technologies and other developments which improve scholarly communication.Recommendations were made on advocacy programmes for each discipline which will be most effective for encouraging optimum take up of useful technologies and other developments which improve scholarly communication.

14 Use of information resources

15

16 High use of Wikipedia and Google Scholar but chemists use alerting services and more specialised subject based servicesHigh use of Wikipedia and Google Scholar but chemists use alerting services and more specialised subject based services –This is likely to reflect the fact that chemists are taught information skills as part of their degree course

17 Data sharing

18 Data storage

19 Data storage and sharing Chemists share datasets since they work collaboratively across institutesChemists share datasets since they work collaboratively across institutes Despite considerable work around repositories and storage, data are still being stored locally rather than in institutional or subject based repositories.Despite considerable work around repositories and storage, data are still being stored locally rather than in institutional or subject based repositories. Concerns around ownership of results and of “competitors” obtaining the results need to be addressed before this will change significantly.Concerns around ownership of results and of “competitors” obtaining the results need to be addressed before this will change significantly.

20 Three years of semantic publishing – RSC Project Prospect What were they trying to improve? –Discoverability –Use –Understanding –Linking And why... What chemistry on the web may become...What chemistry on the web may become... Prolonged exposure to Peter Murray-RustProlonged exposure to Peter Murray-Rust

21 Quick, what can we mark up? What standards did we have in 2007? InChI – for some compoundsInChI – for some compounds ChEBI for some compounds and groups of compoundsChEBI for some compounds and groups of compounds Gene/Sequence/Cell OntologiesGene/Sequence/Cell Ontologies IUPAC Gold Book (dictionary, really, but online)IUPAC Gold Book (dictionary, really, but online) And RDF/OWL as distribution format 30-40% of RSC publishing

22

23 What did RSC learn with Prospect? This is probably the way to go – 4000 articles so farThis is probably the way to go – 4000 articles so far How do they cover all subjects?How do they cover all subjects? –Standards not well defined in all areas Scale up in manual QAScale up in manual QA Scale up during huge growth and scope of RSC publishing activitiesScale up during huge growth and scope of RSC publishing activities How to use all that real chemistry data?How to use all that real chemistry data? Pump prime to change what is asked from authorsPump prime to change what is asked from authors Is the vision the day-glo article? (“Free headache for every user”)Is the vision the day-glo article? (“Free headache for every user”)

24 Phil Bourne Lynn Fink Source code and binary: Relationships: Ontology browser Intent: Term recognition & disambiguation based on OBO or OWL formats John Wilbanks Services: Ontology download web service Ontology Add-in for Word 2007

25 Relationships: Navigate and link referenced chemistry Available soon: / Data: Semantics stored in Chemistry Markup Language Intent: Recognizes chemical dictionary and ontology terms Author and edit 1D and 2D chemistry. Intelligence: Verifies validity of authored chemistry Authoring: Chem4Word – Chemistry Drawing in Word

26 Standards = longevity Help implement and develop standards –Open ontologies for chemistry –InChI Trust –How to publish this - pre-competition

27 Addressing a real need in standards Pistoia Alliance “An initiative to provide an open foundation of data standards, ontologies and web-services to streamline the Pharmaceutical Drug Discovery workflow” Semantic Enrichment of the Scientific Literature (SESL) Oct09-Oct10 Pistoia Alliance-fundedPistoia Alliance-funded EBIEBI Elsevier, NPG, OUP, RSCElsevier, NPG, OUP, RSC

28 How to use this information better to benefit existing researchers – computers and humans Real behaviour (for humans)Real behaviour (for humans) Clear requirements (for computer discovery)Clear requirements (for computer discovery)

29 media.obsessable.com As few interfaces as possible What do humans want?

30 What do computers want? Web services flickr.com/photos/microcosmos

31 A free to access online database for chemists Website and web services Links over 25 million compounds integrated to <300 data sources A curation platform for the public to improve the quality of data online A deposition platform for the public to annotate and extend the data

32 ChemSpider – A Pragmatic Vision “Build a Structure Centric Community” –Integrate chemical structure data on the web –Create a “structure-based hub” to information and data –Provide access to structure-based “algorithms” –Let chemists contribute their own data –Allow the community to curate/correct data

33 Why did the RSC acquire ChemSpider? Data versus documentsData versus documents Enhancing discoverabilityEnhancing discoverability Build on cheminformatics expertiseBuild on cheminformatics expertise RSC presence in the open data spaceRSC presence in the open data space Critical mass of data for structure searchingCritical mass of data for structure searching Networking chemical scientistsNetworking chemical scientists

34

35

36

37

38 Crowd-sourcing chemistry curation Identify/tag errors, edit names, synonyms, identify records to deprecate

39 CAS SciFinder

40 Reaxys

41 Differences between ChemSpider, Reaxys and SciFinder Everything on Reaxys and Scifinder is curatedEverything on Reaxys and Scifinder is curated The data resources can be over a 100 years oldThe data resources can be over a 100 years old The platforms are commercial and “read-only”The platforms are commercial and “read-only” ChemSpider is free, to everyoneChemSpider is free, to everyone Data are in a state of ongoing curation & annotationData are in a state of ongoing curation & annotation Data resources are from the “electronic era”Data resources are from the “electronic era” Data are expanded daily and enhanced on an ongoing basisData are expanded daily and enhanced on an ongoing basis The platform delivers integrated algorithm accessThe platform delivers integrated algorithm access

42 Future of chemistry online? Make the internet searchable by chemical structure and substructure by a free online serviceMake the internet searchable by chemical structure and substructure by a free online service Aggregate and help improve disparate public sourcesAggregate and help improve disparate public sources Highlight high quality publicationsHighlight high quality publications Test sharing and discussion of research data in the openTest sharing and discussion of research data in the open Provide structural home to preserve researchers’ collections, experimental and property dataProvide structural home to preserve researchers’ collections, experimental and property data

43 OreChem Project ParticipantsParticipants –Cambridge University –Cornell University –Indiana University –Penn State University FundingFunding –Microsoft Research –NSF

44 OreChem Project Data integrationData integration –Representation/reuse through common data models and ontologies Data capture and recoveryData capture and recovery –At source capture of experimental data and research process (ELNs) –Compound object authoring –Retrospective harvesting of chemistry data Data storage and manipulationData storage and manipulation –Cloud-based triple store –Chemical structure search –Linked data integration –Computation of properties

45 Chemistry is particularly challenging Commercial value of chemical information (e.g. Pharma industry)Commercial value of chemical information (e.g. Pharma industry) Nature of chemistry research cultureNature of chemistry research culture –Predominance of synthesis (creation) overshadows discovery mode typical of physics or biology –Autonomy, successful research with limited reliance on others Dominance of scholarly societies as publishersDominance of scholarly societies as publishers –ACS (CAS) –RSC

46 Chemistry on the Internet – a future vision The “semantic web” for chemistry is in placeThe “semantic web” for chemistry is in place Crowdsourcing is commonplaceCrowdsourcing is commonplace Chemists will search the web by “structure”Chemists will search the web by “structure” Chemistry articles indexed and searchableChemistry articles indexed and searchable Reduced number of searches to find data because data are integrated – compounds, vendors, syntheses, data, publications and patentsReduced number of searches to find data because data are integrated – compounds, vendors, syntheses, data, publications and patents A world of Open Access and Open DataA world of Open Access and Open Data

47 Linked Data on the Web

48 Acknowledgements Colin Groom, Gary Battle CCDCColin Groom, Gary Battle CCDC Richard Kidd, RSCRichard Kidd, RSC Tony Williams, RSC ChemSpiderTony Williams, RSC ChemSpider Carl Lagoze, OreChemCarl Lagoze, OreChem

49 Any questions?


Download ppt "How chemists use data Dr William G Town President, Kilmorie Consulting Fourth Bloomsbury Conference on E-publishing and E- publications."

Similar presentations


Ads by Google