Data catalogues and the data repository ADMIRe JISC MRD

Data catalogues and the data repository ADMIRe JISC MRD
Dr Tom Parsons March 2013 Sunday, November 11, 2018 ADMIRe

A world-class university
One of the world’s top 100 universities, Nottingham is recognised globally for ground-breaking research and teaching excellence. 40,000 students from more than 150 countries, two overseas campuses and strong links with universities around the world Heavily focused on research: Medical & Health Sciences, Sciences, Engineering, Social Sciences and Arts Large research income (£100m) – primarily RCUK, UK/EU government, commercial and charities Sunday, November 11, 2018 ADMIRe

Key priorities for ADMIRe:
RDM policy “1.5. The University will provide mechanisms and services for storage, backup, registration, deposit, retention and preservation of research data assets in support of current and future access, during and after completion of research projects.” Key priorities for ADMIRe: Is the current provision good enough? Where are the gaps? What do we need to provide? Sunday, November 11, 2018 ADMIRe

Understanding requirements
Approaches: Survey (summer 2012) Focus groups (November 2012) Interviews (May 2012 onwards) Mixture of ADMIRe, in-house, JISC MRD & Sero Outputs: service model, detailed requirements catalogue, logical models & prototype Institutional requirements: “Enterprise Architecture compliant”, use and integrate with existing systems Sunday, November 11, 2018 ADMIRe

Survey results: Types of data
Sunday, November 11, 2018 ADMIRe

Survey results: Data storage

Survey results: Metadata…

Sharing data? Sunday, November 11, 2018 ADMIRe

Survey results: Total research data estimates
From the survey’s 366 responses 75 Gb average (mean/frequency) Sunday, November 11, 2018 ADMIRe

Total research data estimates
75 Gb average x approx. numbers of PIs & post-grads (4000) = 300TB (+-90%) Large number of unknowns A large amount of data, a large amount of files and a good case for managing it Sunday, November 11, 2018 ADMIRe

Focus groups to understand more
Five Faculty based focus groups (30 people in total) Based upon California Digital Library model Sunday, November 11, 2018 ADMIRe

Active data Sunday, November 11, 2018 ADMIRe

Archive data Sunday, November 11, 2018 ADMIRe

Preservation activities
Function Actors Req. Freq R S A 1 – Tag Enter metadata describing a bag of research data assets M 2 – Bag Zip the data files up in a bag C 3 – Transfer + Transfer a bag to archival storage 4 – Ingest Ingest a bag in to storage 5 – Update Update (enhance, correct) metadata for a stored bag O L 6 – GetDOI Get (public, private) DOIs for designated assets 7 – Publish Publish assets appropriately on landing pages 8 – Relocate Relocate assets and update locators 9 – Search Search for assets by keyword or field H 10 – Access Access metadata and data according to permissions 11 – Notify Notify actors automatically about data events P 12 – Annotate Create notes about a bag or its contents 13 - Check Check (verify) that the contents of a bag are in order 14 – Report Run reports on aspects of the system (DOI, bag, user) 15 - Administer Administer permissions and system parameters Sunday, November 11, 2018 ADMIRe

Mapping requirements

Where are we now? Sunday, November 11, 2018 ADMIRe

Interfaces/Integrations Direct Users
Solution Description Scope Interfaces/Integrations Direct Users Data Retention Platform A storage platform that enables storage of “unstructured” data files. BPM Metastorm frontend. Storage of files and very basic (file type, size, retention period, user) AD to support access. (Note that Open Access will be supported by providing a persistent account used by the Research data web site server that has read only access to all “Open” data sets. Researchers Research data search and retrieve web site Web Site. Expected to be CMS or possibly SharePoint Web site with relevant information and screens to search and return results 1. Data Retention Platform via REST to enable http(s) data transfer. 2. FAST (embedded function) to allow search from a web page. 3. Equella (API) to expose metadata onto search results. 4. Active Directory/LDAP to authenticate file access Those searching for data sets Equella Metadata Database Stores metadata See Metastorm, FAST and Research Web Site N/A FAST Search Engine Provides search results and rich search functionality on the metadata 1. Potential federation to Primo 2. Crawl of Equella Anyone Baggit File collection tool Tool to assist researchers in selecting and bringing files into a collection Linked to from Metastorm PI

Interfaces/Integrations Direct Users
Solution Description Scope Interfaces/Integrations Direct Users DMP Online On line tool providing support for creating Data Management plan that is managed to ensure Research Council Requirements are met Used to create Data Management Plan 1. Metastorm will link this within curation workflow 2. Metastorm will take the XML output of this and read key fileds directly to automate some metadata creation in Equella 3. Metastorm will save the output file of this tool PI DOI On line tool for creating a unique digital object identifier Workflow to fork out to this system to allow researcher to create a persistent object identifier. See Metastorm Active File Services File services primarily for storage of active (ie not curated) files The source of files for curation (“Bagging”). Selectable by browsing using Baggit tool. “Other Repository” Sometimes Selectable by browsing using Baggit tool as the source of files for curation (“Bagging”). However these may be databases or alternative repositories that are used instead. If used, and where possible, the DOI will point to these.

ADMIRe Phasing: Drop 1 (to June 2013)
Objective: Deliver Key Functions but without over integration Deliverables: 1. Instructions and links on web site on how and why to use DMP Online 2. Instructions and links on web site on how and why to use DOI 3. Implementation (but not integration) of Baggit for Research users 4. Delivery of Metadata in Equella Including instructions and links on web site on how and why to use 5. Creation of Research Data Search Page Implementation of FAST search crawl Embed of FAST in web page Delivery of Results page to include relevant information 6. Metastorm development that: Creates User (PI Researcher) interface to Equella Provides fields to add all metadata into Equella Including Research Project Information, Subject Specific Information, Technical Metadata Allows Researcher to choose when a page is searchable Sunday, November 11, 2018 ADMIRe

ADMIRe Phasing: Drop 2 (to Dec 2013)
Deliverables 1. Delivery of Retention platform Delivered outside of ADMIRe project 2. Delivery of Open Access Platform (Subset of Retention platform) 3. Definition and Delivery of End to end workflow automation and integration for data management process with a vision of “Input Once” Integrations of Baggit, Agresso Awards Management, DMP Online, DOI 4. Definition and Delivery of a report for Research Councils that Confirms project adherence (at Project close) to funding requirements for data management and access Enables non-conformance to be addressed Sunday, November 11, 2018 ADMIRe

Reusable outputs Focus groups/interview formats Requirements catalogue
Use cases Survey – questions, write-up etc Software? No… Sunday, November 11, 2018 ADMIRe

ADMIRe Project Manager
Questions? ADMIRe Project Manager Sunday, November 11, 2018 ADMIRe

Data catalogues and the data repository ADMIRe JISC MRD

Similar presentations

Presentation on theme: "Data catalogues and the data repository ADMIRe JISC MRD"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data catalogues and the data repository ADMIRe JISC MRD

Similar presentations

Presentation on theme: "Data catalogues and the data repository ADMIRe JISC MRD"— Presentation transcript:

Similar presentations

About project

Feedback