Long-term Digital Metadata Curation Arif Shaon University of Reading 16 April 2014.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst Information Semantics Command & Control Center July 17, 2007 Ontologies Can't Help Records Management Or Can They?
Advertisements

Dublin Core for Digital Video: Overview of the ViDe Application Profile.
Status on the Mapping of Metadata Standards
DC8 Ottawa, October 4-6, 2000 Rachel Heery UKOLN, University of Bath Application Profiles: managing metadata.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Issues and approaches to preservation metadata Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
A centre of expertise in digital information managementwww.ukoln.ac.uk Approaches To E-Learning: Developing An E-Learning Strategy Brian Kelly UKOLN University.
UKOLN is supported by: Put functionality Augmenting interoperability across scholarly repositories 20/21 April 2006 Rachel Heery, UKOLN, University of.
UKOLN is supported by: The JISC Information Environment Metadata Schema Registry (IEMSR): Update DC-2006, Manzanillo, Mexico October 3-6, 2006 Rachel Heery.
I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop RAL 11 th February 2010
The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
Pulling it all together… with thanks to Sheila Anderson.
Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
Digital Earth Communities GEOSS Interoperability for Weather Ocean and Water GEOSS Common Infrastructure Evolution Roberto Cossu ESA
InterPARES Project Joanne Evans and Lori Lindberg Description Cross-domain Describing and analyzing the recordkeeping capabilities of metadata sets Joanne.
Chapter 4 Quality Assurance in Context
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
National Digital Repository ® Preserving the imperfect: reflections from NDAD and elsewhere Kevin Ashley Head of Digital Archives Group ULCC.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify.
Caro-COOPS Data Management: Metadata. Cast-Net addresses the need for improved connectivity among coastal observing systems by creating a regional framework.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Software Configuration Management
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
Knowledge representation
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Delivering business value through Context Driven Content Management Karsten Fogh Ho-Lanng, CTO.
1 Knowledge & Knowledge Management “Knowledge is power” to “Sharing K is power” Yaseen Hayajneh, PhD.
Dryad Management Board Meeting Friday, May 22 1:30 p.m. Session 3: Software development timeline and priorities Slides pprepared by the Dryad development.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Content Management Systems Linda Fernandezlopez LIS 385T Information Architecture February 6, 2003.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
Any data..! Any where..! Any time..! Linking Process and Content in a Distributed Spatial Production System Pierre Lafond HydraSpace Solutions Inc
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT The importance of interoperability and intelligibility in digital.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
XMC Cat: An Adaptive Catalog for Scientific Metadata Scott Jensen and Beth Plale School of Informatics and Computing Indiana University-Bloomington Current.
Working Group 4 Data and metadata lifecycle management  1. Policies and infrastructure for data and metadata changes  2. Supporting file and data formats.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson,
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
Developing Metadata Frameworks for Earth System Education NSDL 2003 Annual Meeting October 14, 2003 Katy Ginger and Karon Kelly DLESE Program Center.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
An Introduction to Tessella and The Safety Deposit Box Platform
CSc4730/6730 Scientific Visualization
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Metadata in Digital Preservation: Setting the Scene
Australian and New Zealand Metadata Working Group
Presentation transcript:

Long-term Digital Metadata Curation Arif Shaon University of Reading 16 April 2014

Acknowledgements My PhD is jointly funded by the University of Reading and the CCLRC ( One of the contributors to the long-term metadata curation activities of the DCC (

Presentation Overview The Problem Domain Introducing (Digital) Metadata Metadata Curation – Rationale & Definition Core Requirements of Metadata Curation Current State of Play Metadata Curation Record Metadata Schema Mapping Tool Future Plan

The Problem Domain Phenomenal data deluge over the past decade Main Reason - exponential increase in computing power and communication bandwidth One of the major contributors is e-Science Examples - -Atlas Datastore of CCLRCs e-Science centre -The Sanger Centre at Hinxton near Cambridge

The Problem Domain - The Task Scientific data needs to be preserved and made available over the long-term to serve it to the future generations of scientists and researchers. Benefits are manifold - - Efficient utilization of data - Avoid the cost of data regeneration - High quality future research and experiments in both same and cross- discipline environments.

The Problem Domain - Challenges & Solution Ensuring data accessibility and availability over time Ensuring data quality and integrity over time Notwithstanding rapid evolution and enhancements in related technologies and data formats Solution – Long-term Digital (Data) Curation (Preservation)

Introducing (Digital) Metadata Data about Data – ubiquitous definition aboutness' depends on the application, and leads to the multiplicity of different metadata classifications The prefix meta expresses reflexive application of a concept (i.e. data) to itself Importance of Metadata in Digital Curation -Discovery & Accessibility of data -Appropriate & efficient use of data -Enrichment & Preservation of data

Digital Metadata Defined Structured and standardized information Crafted specifically to describe another digital resource To aid in the intelligent, efficient and enhanced discovery, retrieval, use and preservation of that resource over time.

Metadata Curation - Rationale To ascertain and/or enhance metadata quality & integrity to ensure consistency with data To ascertain efficient search-ability of metadata Intelligent and efficient metadata management, i.e. Creation, updates etc. Long-term preservation of metadata To aid data Curation

Metadata Curation Defined An inherent part of a digital curation process Continuous management of metadata (which involves its creation and/or capturing as well as assuring its overall integrity) Over the life-cycle of the digital materials that metadata describes Ensuring suitability of metadata for facilitating the intelligent, efficient and enhanced discovery, retrieval, use and preservation of digital materials over time.

Core Requirements of Long-term Metadata Curation Metadata Standard (s). Long-term Metadata Preservation - Migration or Emulation? - Tracking & Migrating changes to metadata itself Metadata Quality Assurance - Syntactic Validation - Semantic Validation - Metadata Authentication

Core Requirements of Long-term Metadata Curation Metadata Versioning Metadata Curation Policy Audit Trailing & Provenance Tracking Access Control & Constraints

Current State of Play Recognised Metadata Standards - Main focus is on Data Preservation - Lack of appropriate elements to capture meta-metadata - Lack of sufficient elements to record metadata version information

Current State of Play Contd. Strategies for Metadata Migration - XSLT approach (IMS Metadata Group, - XML specific - short term, i.e. problem may recur due to XML version change Semantic Validation of Metadata (Automated) - Limited to automatically checking metadata records conformance against schema, vocabulary etc.

Metadata Curation Record (MCR) Metadata Curation Record GeneralAvailabilityPreservationCuration …… Life-CycleAnnotation Meta-Metadata

MCR - The Rationale The term Information is crucial and instrumental in long-term digital curation. MCR provides information about both digital objects and associated metadata to aid long-term digital curation. Approach employed: - Examine a range of different existing well-known metadata schemas, e.g. DC, DCC RI, IEEE LOM etc. - import the most relevant elements (in terms of curation, preservation and accessibility) from them. - avoid wheel re-invention.

MCR - Applicability Framework for Metadata creation tools & search engines (within curation systems). Caters for both new (full version) and existing (customised version) standalone and distributed metadata systems. My PhD proposes a standalone Metadata Curation System

MCR in a Metadata Curation System

Metadata Mapping Tool - Motivation & Rationale Long-term Metadata Preservation -Migration is currently the most viable approach - involves mapping/copying metadata from old format to a newer format -Classic Migration issue: tracking or migrating changes to the metadata itself -Therefore, curation-aware migration strategy is needed Existing Schema Mapping tools – -E.g. Altova MapForce, SwissSQL etc. -Facilitate cross-database (e.g. Oracle to DB2) as well as cross-schema type (e.g. XML to database schema) migration

Motivation & Rationale Contd. Efficient in finding direct or obvious matches between two metadata schemas. However, lack the ability to determine in-direct or non- obvious matches between two metadata schemas.

Metadata Schema Mapping Tool - Overview Determines direct matches between schemas Employs regular expression driven algorithm to find all possible in-direct matches between two metadata schemas Calculates mapping rules based on the match results Finally, migrates metadata from the source schema to the destination schema.

Metadata Schema Mapping Tool - Usefulness Easier and relatively less labour-intensive means (than the commercial tools) of identifying and reconciling complex and non-obvious differences between schemas. Effectively facilitates more accurate migration of data More declarative accessibility of the datasets to the data users In a curation system, it would be used as a metadata migration tool to deal with metadata schema change

Metadata Schema Mapping Tool – Screen shot

Future Plan Design & Development of the Metadata Curation Model. -a curation-aware metadata framework based on the MCR. -efficient post-creation metadata quality assurance mechanisms. -suitable metadata versioning techniques. The first draft of the model has already been designed as an extension to the OAIS reference model. The model is only focused on the curation of metadata and does not assume the responsibility of curation of the data that the metadata describes.

Conclusions Efficient & effective long-term metadata curation is a key component of successful preservation, enrichment and access of digital information in the long term. No accepted approach or method till date exists for long-term metadata curation Emphasis is on the necessity of an appropriate metadata standard and an efficient system