CombeDay Making Data Openly Available Simon Coles
CombeDay Data Overload!
CombeDay CombeChem: eScience testbed Properties X-Ray e-Lab Analysis Properties e-Lab Simulation Video Diffractometer Grid Middleware Structures Database
CombeDay Chemistry Publications Ideas and interpretationsHooks into the literature Results & derived data Raw data!
CombeDay
CombeDay Learning & Teaching workflows Research & e-Science workflows Aggregator services: eBank UK Repositories : institutional, e-prints, subject, data, learning objects Data curation: databases & databanks Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules Validation Harvesting metadata Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Resource discovery, linking, embedding Deposit / self- archiving Peer-reviewed publications: journals, conference proceedings Publication Validation Data analysis, transformation, mining, modelling Resource discovery, linking, embedding Deposit / self- archiving Learning object creation, re-use Searching, harvesting, embedding Quality assurance bodies Validation Presentation services: subject, media-specific, data, commercial portals Resource discovery, linking, embedding Linking
CombeDay Establishing common ground… Understand the data creation process Terminology and definitions –Data –Metadata –Datafile –Dataset –Data holding Different views –Digital library researchers, computer scientists, chemists –Generic vs specific –Modeller vs practitioner Aim for a common ontology Modelling the domain Creating a metadata schema
CombeDay Crystallography workflow Initialisation: mount new sample on diffractometer & set up data collection Collection: collect data Processing: process and correct images Solution: solve structures Refinement: refine structure CIF: produce CIF (Crystallographic Information File format) Report: generate Crystal Structure Report RAW DATADERIVED DATARESULTS DATA
CombeDay Deposition into the archive
CombeDay An Archive entry ecrystals.chem.soton.ac.uk
CombeDay Access to the underlying data
CombeDay Some metadata issues Using simple and qualified Dublin Core Additional chemical information in schema for harvesting e.g. empirical formula Schema contains International Chemical Identifier (InChI) Specifies which ‘parts’ of a dataset are present Links to eprints (and other published literature) derived from the data Using vocabularies specific to crystallography Engaging the broader scientific community to ensure different schemas are compliant and standards can emerge
CombeDay ebank_dc record (XML) Crystal structure (data holding) Crystal structure report (HTML) Dataset Institutional repository eBank UK aggregator service ePrint UK aggregator service Subject service Deposit Harvesting OAI-PMH ebank_dc Harvesting OAI-PMH oai_dc Dataset dc:identifier dcterms:references Linking dc:type=“CrystalStructure” and/or “Collection” Model input Andy Powell, UKOLN. Eprint oai_dc record (XML) dcterms:isReferencedBy dc:type=“Eprint” and/or ”Text” Data flow in eBank Eprint “jump-off” page (HTML) dc:identifier Eprint manifestation (e.g. PDF) Linking
CombeDay Harvesting: OAIster
CombeDay Linking and aggregating
CombeDay Embedded in a science portal
CombeDay Current situation Version 2.0 eBank metadata schema Pilot institutional e-data repository for harvesting (raw, derived, results data) using EPrints software Exports records as ebank_dc and oai_dc Validation of schema & discussion with International Union of Crystallography for final developments and wider deployment Pilot eBank UK aggregator service Developing search interface Version 1.0 Testing with PSIgate physical sciences portal – embedding eBank UK
CombeDay What’s next? Progress towards generic metadata schemas Validation against other schema (CCLRC Model) Eprints.org software: allow for more generic scientific data and schemas? Metadata enhancement: keywords based on knowledge of keywords in related publications? Investigate identifiers: International Chemical Identifier Explore context sensitive linking Full embedding into chemical and crystallographic research and publishing e-Learning embedding and pedagogic evaluation Feasibility study in related domains