EGU, 23 April 2012, Najla Rettberg, OpenAIRE, University of Göttingen, Linking Data to Open Access Publications
In 12 Minutes…. OpenAIRE – Publications and Data Demonstrators for Enhanced Publications Use Case Scenarios Services for Users EGU, April
OpenAIRE – Second Phase Open Access, participatory infrastructure for scientific information linking publications, datasets, funding Disseminates OA/RDM information in Europe Opens its content (search, browse, stats) and to 3rd- party/Service providers Capitalizes on the OpenAIRE infrastructure, built for Open Access pilot, FP7-funded articles (measuring the impact of EC SC39) EGU, April
Portal:Search, Access, Deposit EGU, April
Past, present and OpenAIREplus 5 Publication repositories network Institutional & Thematic FP7 publications EC Project metadata National Project metadata National funding publications Driver Guidelines OpenAIRE Guidelines v1.0 OpenAIRE Guidelines v2.0 Dataset repositories Metadata on data sets OpenAIRE+ Guidelines for Data Providers OpenAIREplus EGU, April ,600,000 OA publications 311 validated repositories
OA Publication Infrastructure Open Data Infrastructures EGU, April ESFRi, EU wide infrastructures Covering ‘European Knowledge’ 6
A ‘Static‘ publication <Slide from Jens Klump
Enhanced Publications (EPs) Compound information objects: represent the aggregation of distinct information objects through meaningful relationships Example of SURF-EPs: textual publications enhanced with links to datasets OpenAIREplus provides EP services: Management: creation and curation Visualization, browsing, querying Import: OAI-PMH/ORE harvesting of EPs from external providers Export: OAI-PMH/ORE publishing of EPs, Linked Data representation EGU, April
‘Information in Context’ EGU, April
Attempt at a generic workflow No one-size fits all for data – Use different data types, PIs, policies, access levels, standards Look at research driven disciplines, different communities Incremental, based on prototypes “..any roadmap for OA infrastructure must address this natural tension between diversity and infrastructure” C. Meier zu Verl, & W. Horstmann (Eds.) Studies on Subject-Specific Requirements for Open Access Infrastructure. Cross-discipline approach EGU, April
Subject-specific pilots Learning lessons from interoperation of data infrastructures – Interoperability pilots between OpenAIREplus and subject- specific infrastructures In the Life Sciences In the Social Sciences – Exploitation in modelling and implementation for OpenAIRE data model Relationship entities: projects, publications, datasets EGU, April
The Challenges Aggregation and Discovery of resources Representation of diverse disciplines in a ‚generic‘ infrastructure Access restrictions/reuse policies User friendly way for Researchers to link research results with project information Machine-readable (Linked Open Data) EGU, April
Two disciplines… SSH - DANS/EASY – Produce handmade EP‘s at file level – Experienced data modelling and research work (Veteran tapes) Life Sciences – EMBL-EBI – Text mine abstracts/full texts – Link bio-entities to database – Enriched information could be transfered to generic infrastructure EGU, April
Demonstrator Data model – Generalised Extract citation info for datasets – from e.g UniProt and full text Derive Persistent Identifiers – from URLs (URNs and PMC-Ids) Transfer of linked entities – community services and OpenAIRE infrastructure EGU, April
Use Cases 1. Import EP created in DANS or SURF – Proof of Services Interoperability EGU, April
Use Cases 1. Import EP created in DANS or SURF – Proof of Services Interoperability 2. Manual composition of EP in OpenAIRE – Proof of Tools: Editor, Discovery of Research data in OpenAIRE EGU, April
Use Cases 1. Import EP created in DANS or SURF – Proof of Services Interoperability 2. Manual composition of EP in OpenAIRE – Proof of Tools: Editor, Discovery of Research data in OpenAIRE 3. Automatic generation of EP by extracting citation information (or mining), auto-linking – Proof that rich metadata can be represented in user-friendly way – Possible Linked Open Data compliancy EGU, April
Use Cases 4. Reuse and enrichment: annotations added by users to datasets or publications – An EP is used by researcher in publication – Adequate documentation – Test legal framework – Study into Licensing of publications and data Analyse requirements of legal protection of research data Legal prototype of restraints EGU, April
Research Scenario 1 1. You are an EC-project researcher – OA publication – Dataset with a DOI – Generate the link in OpenAIRE 2. Researcher completes data output with paper – No data repository – Submit dataset to OpenAIRE ‚orphan‘ repository EGU, April
Research Scenario 2 You search for ‚mouse genome literature‘ in OpenAIRE – Find a citation for publication – funding details of project – Related data, say a protein link to GenBank – Create your own links to this EGU, April
Service activities For publication providers - OpenAIRE’s Guidelines for repository managers – Metadata: (DC) and Protocols: (OAI etc.) For data providers: accessing (metadata of) datasets from providers while minimizing effort to comply – Metadata: indications on minimal metadata about datasets (e.g., identifiers, date of creations, title, URLs) and best-practices for interlinking datasets and publications – Access protocols: no requirements for adopting precise protocols (e.g., OAI, FTP) or ID/URL frameworks (e.g., OpenURL, DOI) to comply EGU, April
Service activities Users Registered end-users (e.g., EC personnel, project coordinators, researchers, authors) – Search, browse and access statistics – Deposit files and metadata of publications and datasets into the Orphan Repository – Ingest (claim) into the information space metadata – Create EP by combining datasets from different communities – Reuse of datasets as secondary data (with respect to IPR) 22 EGU, April
Service activities Users Content provider managers (e.g. datasets and publications repository managers) – Registration and validation (OpenAIREPlus guidelines) of publication and dataset repositories Data curators (administrative tasks) – Collect and aggregate publications, project data and dataset metadata Third-party application developers – Bulk-fetch content from the (curated) information space 23 EGU, April
The Future….. “Forget PDFs, imagine an ideal publication where you click on tables to get through to raw data, where you can contribute and discuss some aspects and later update or correct parts of a paper in subsequent versions. The latter is similar to Wikipedia, actually.” – PhD Student, UGOE EGU, April
Danke…... – EGU, April
Linking: Publication to Database EGU, April
Author supplied Supplementary info: TIFF,MOV EGU, April PLoS: O’Toole, Greenan, Lange, Srayko, Müller-Reichert 27
Research Impact OpenAIRE puts foundations to measure research impact per publication, researcher, project, institution, country, … EGU, April
Data Management Issues Good data practices Data policies, standards Drivers for deposit? What‘s in it for researchers? Work with publishers, DOIs Where do researchers deposit data? Figshare? EGU, April
Potential issues: unstructured data with different kinds of media files Persistent IDs: resolvable and managed by the originator of resource Preservation: responsibility lies in the trusted repositories EGU, April
Demonstrators Demonstrators for Enhanced Publications – Explore how links are managed between publications and research data in Life Sciences and SSH – How data can be mutually complemented and exchanged in generic infrastructures – Example: how a publication ‚reported‘ in OpenAIRE is enriched via UKPMC with links to databases Report: „Connection Data and Publications through e-Infrastructure“ EGU, April