Data Provision and Aggregation Mapping Culture Semantically with CIDOC-CRM & 3M CRM SIG Maria Theodoridou Foundation for Research and Technology – Hellas.

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

HL7 V2 Implementation Guide Authoring Tool Proposal
XML III. Learning Objectives Formatting XML Documents: Overview Using Cascading Style Sheets to format XML documents Using XSL to format XML documents.
1 ICS-FORTH & Univ. of Crete SeLene November 15, 2002 A View Definition Language for the Semantic Web Maganaraki Aimilia.
Mapping Memory Manager Use Case: Mapping the dFMRÖ coin database to CIDOC-CRM Martin Doerr, Maria Theodoridou Foundation for Research and Technology –
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 25.
L4-1-S1 UML Overview © M.E. Fayad SJSU -- CmpE Software Architectures Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Lecture Nine Database Planning, Design, and Administration
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Angelika Menne-Haritz The MEX editor - METS and the presentation of digitised archives The MEX editor: METS and the Internet presentation of.
Overview of Search Engines
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
Conceptual Architecture of PostgreSQL PopSQL Andrew Heard, Daniel Basilio, Eril Berkok, Julia Canella, Mark Fischer, Misiu Godfrey.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 August 15th, 2012 BP & IA Team.
Background Data validation, a critical issue for the E.S.S.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
ICS – FORTH, August 31, 2000 Why do we need an “Object Oriented Model” ? Martin Doerr Atlanta, August 31, 2000 Foundation for Research and Technology -
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
The Data Attribution Abdul Saboor PhD Research Student Model Base Development and Software Quality Assurance Research Group Freie.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Implementation Yaodong Bi. Introduction to Implementation Purposes of Implementation – Plan the system integrations required in each iteration – Distribute.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
1 Minggu 9, Pertemuan 17 Database Planning, Design, and Administration Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
Copyright © 2011, SAS Institute Inc. All rights reserved. Using the SAS ® Clinical Standards Toolkit 1.4 to work with the CDISC ODM model Lex Jansen SAS.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Database Planning, Design, and Administration Transparencies
1.file. 2.database. 3.entity. 4.record. 5.attribute. When working with a database, a group of related fields comprises a(n)…
FORTH-ICS The SYNERGY Reference Model of Data Provision and Aggregation Foundation for Research and Technology – Hellas (FORTH) Institute of Computer Science.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Unified Modeling Language* Keng Siau University of Nebraska-Lincoln *Adapted from “Software Architecture and the UML” by Grady Booch.
L6-S1 UML Overview 2003 SJSU -- CmpE Advanced Object-Oriented Analysis & Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
SIMO SIMulation and Optimization ”New generation forest planning system” Antti Mäkinen & Jussi Rasinmäki Dept. of Forest Resource Management.
RELATORS, ROLES AND DATA… … similarities and differences.
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
SDMX IT Tools Introduction
Open GSBPM compliant data processing system in Statistics Estonia (VAIS) 2011 MSIS Conference Maia Ennok Head of Data Warehouse Service Data Processing.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Slide 1 Chapter 8 Architectural Design. Slide 2 Topics covered l System structuring l Control models l Modular decomposition l Domain-specific architectures.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
Cloud based linked data platform for Structural Engineering Experiment
Semantic Database Builder
Model-Driven Analysis Frameworks for Embedded Systems
The Re3gistry software and the INSPIRE Registry
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment.
Conceptual Architecture of PostgreSQL
Conceptual Architecture of PostgreSQL
Metadata The metadata contains
Chapter 2 Database Environment Pearson Education © 2009.
Eurostat Unit B3 – IT and standards for data and metadata exchange
Presentation transcript:

Data Provision and Aggregation Mapping Culture Semantically with CIDOC-CRM & 3M CRM SIG Maria Theodoridou Foundation for Research and Technology – Hellas Institute of Computer Science

CRM SIG, October 8, 2015  A reference model for a better practice of data provisioning and aggregation processes  An initiative of the CIDOC CRM Special Interest Group  It is based on experience and evaluation of national and international information integration projects  It defines a consistent set of business processes, user roles, generic software components and open interfaces that form a harmonious whole Synergy Reference Model 2

CRM SIG, October 8, 2015 Goals:  Describe the provision of data between providers and aggregators including associated data mapping components  Address the lack of functionality in current models  Incorporate the necessary knowledge and input needed from providers to create quality sustainable aggregations  Define a modular architecture that can be developed and optimized by different developers with minimal inter-dependencies and without hindering integrated UI development for the different user roles involved.  Identify, support or manage the processes needed to be executed or maintained between a provider (the source) and an aggregator (the target) institution  Support the management of data between source and target models and the delivery of transformed data at defined times, including updates Synergy Reference Model 3

CRM SIG, October 8, 2015 SYNERGY workflow 4

CRM SIG, October 8, SYNERGY Process Hierarchy

CRM SIG, October 8, 2015 We implemented the X3ML data exchange framework which handles effectively and efficiently:  the schema mapping  the URI definition and generation  the data transformation steps of the data provision and aggregation process. X3ML 6

CRM SIG, October 8, 2015 X3ML mapping definition language  The schema mappings are expressed in a declarative way  X3ML can be understood by non-technical people  Keeps the schema mappings between different systems harmonized  The schema matching and the URI generation policies comprise different distinct steps in the exchange workflow.  X3ML is symmetric and potentially invertible X3ML engine: clean core design of the engine and X3ML language  Transparency  Re-use of Standards and Technologies  Facilitating Instance Matching  Simplicity X3ML Framework Features 7

CRM SIG, October 8, 2015 X3ML Workflow Schema Matching CIDOC -CRM DB2 DB1 Domain Experts Schema Matching Definition file URI generation specification IT Experts Terminology Mapping 8

CRM SIG, October 8, 2015 Syntax Normalizer Provider Institution Provider Schema Definition Raw Metadata Source Syntax Report Target Schema Definition Target Schema Visualizer Effective Provider Schema Source Schema Visualizer Schema Mapping Viewer Terminology Mapper Source Analyzer Instance Generation Rule Builder Metadata Validator Transformer Schema Matcher Mapping Suggester Target Analyzer Source Statistics Normalized Provider Metadata Mapping Memory Schema Matching Definition Provider Terminology Aggregator Terminology Terminology Mapping Aggregator Format Records Aggregator Statistics Report Mapping Definition Aggregator Institution Target Schema Validator Source Schema Validator Source To Target URI Association Table Source Analyzer Mapping Validation Report Raw Metadata Source Statistics Target Analyzer

CRM SIG, October 8, 2015  X3ML is an XML based language designed on the basis of work that started in FORTH in 2006  X3ML emphasizes on establishing a standardized mapping description which lends itself to collaboration and the building of a mapping memory to accumulate knowledge and experience.  It was adapted primarily to be more according to the DRY principle (avoiding repetition) and to be more explicit in its contract with the URI Generating process.  X3ML separates schema mapping from generating proper URIs so that different expertise can be applied to these two very different responsibilities. X3ML Mapping Definition Language 10

CRM SIG, October 8, 2015 The X3ML structure consists of:  a header that contains basic information (title, description, contact persons), the source and target schemata and sample record  a series of mappings each containing  a domain (the main entity that is being mapped) and  a number of links which consist of a path and a range. Each link describes the relation (path) of the domain entity to the corresponding range entity. Each entity-relation-entity of the source schema is mapped individually to the target schema and can be seen as a self-explanatory, context independent proposition. X3ML Mapping Definition Language 11

CRM SIG, October 8, 2015 X3ML Structure 12

CRM SIG, October 8, Target Range: Literal Target Domain: E22 Man-Made Object P43 has dimension Source Path: weights Source Domain: Coin Source Range: WEIGHT P90 has value Target Path: Intermediate Node: E54 Dimension Constant Expression Node: E58 Measurement Unit P91 has unit P2 has type Constant Expression Node: E55 Type weight gr X3ML Structure

CRM SIG, October 8, 2015 X3ML supports 1:N mappings and uses the following special constructs:  intermediate nodes used to represent the mapping of a simple source path to a complex target path.  constant expression nodes used to assign constant attributes to an entity.  conditional statements within the target node and target relation support checks for existence and equality of values and can be combined into Boolean expressions.  “Same as” variable used to identify a specific node instance for a given input record that is generated once but is used in a number of locations in the mapping.  Join operator (==) used in the source path to denote relational database joins  info and comment blocks throughout the mapping specification bridge the gap between human author and machine executor. X3ML Constructs 14

CRM SIG, October 8, 2015  The definition of the URI generation policy is a separate step and follows the schema matching  It is performed usually by an IT expert who must ensure that the generated URIs match certain criteria such as consistency uniqueness  A set of predefined URI generators (UUIDs, literals) and templates are available but any URI generating function can be implemented and incorporated in the system  In the X3ML definition, the target domain and all range entities must contain functions that will generate URIs or literals The result of the schema matching and URI generation policy steps is a complete X3ML mapping definition file that will be fed to the X3ML engine for the transformation of the data. X3ML - URI generation policy 15

CRM SIG, October 8, 2015 The X3ML engine realizes the transformation of the source records to the target format Input:  source records (currently in the form of an XML document)  the description of the mappings in the X3ML mapping definition file  the URI generation policy file Transforms the source records (XML document) into a valid RDF document which is equivalent with the XML input, with respect to the given mappings and policy. X3ML Engine 16

CRM SIG, October 8, 2015  Implemented in Java, producing a single artifact in the form of a JAR file which contains the engine software  XStream8 for parsing XML-based documents  Handy URI Templates to support the generation of valid URIs  Jena10 for building the RDF output. The source code is available under the Apache license at: Originally implemented in the CultureBrokers project co-funded by the Swedish Arts Council and the British Museum. Implementation is partially supported by the projects PARTHENOS (H2020 RI ), ARIADNE (FP7 RI, ), and LifeWatch Greece (NSRF ) X3ML Engine 17

CRM SIG, October 8, 2015  The Input Reader component is responsible for reading the input data.  The X3ML Parser component is responsible for reading and manipulating the X3ML mapping definitions.  The component RDF Writer outputs the transformed data into RDF format.  The Instance Generator component produces the URIs and the labels based on the descriptions that exist in the mappings.  The Controller component coordinates the entire process. X3ML Engine - Components 18

CRM SIG, October 8, 2015  Support of other types of input (RDF):  RDF model (i.e. Jena, Sesame) as the basic construct  Usage of SPARQL  Enhancement of the Instance Generator component to carry the URIs from the source data to the target data.  Support of invertible X3ML mappings:  Regenerate the data in the source dataset that led to the creation of each piece of data in the target dataset.  X3ML mapping is viewed as an association between a “pattern” (Ps) in the source dataset with a “pattern” (Pt) in the target dataset.  An X3ML mapping is a pair (Ps, Pt) of SPARQL graph patterns.  A set of X3ML mappings M is invertible if and only if we can guarantee that whenever a pattern Pt is found in the target dataset, we can identify in a unique manner the pattern Ps that generated it. X3ML Engine - Extensibility 19

CRM SIG, October 8, 2015 The X3ML engine is being exploited by several European projects.  The ARIADNE project initiated several mapping activities using X3ML engine, to convert existing schemata of archaeological data to CIDOC CRM and its extension suite.  The ResearchSpace project has been using X3ML for the mapping and transformation of the Rijksmuseum, the British Museum, the Yale Center for British Art (YCBA) data, Getty, Frick, Canadian Heritage Information Network (CHIN).  X3ML engine is also being exploited by the transformation services of the Greek national implementation of the European LifeWatch infrastructure for biodiversity to transform biodiversity metadata/data such as Darwin Core formats to a CIDOC CRM family semantic models.  The PARTHENOS project X3ML Engine - Usage 20

CRM SIG, October 8, 2015 Synthetic data based on the ARIADNE Project data was provided as input to the X3ML engine.  Three X3ML mapping files containing 10,100 and 1000 mappings  4 XML input files containing 10,100,1000 and records. Conclusions:  The overall time depends on both the number of mappings and the size of the input.  As the size of the input increases the overall time that is required increases as well.  The total number of output records is the total number of input records multiplied with the number of mappings (i.e. 10 input records with 10 mappings will produce 100 output records).  The execution time is affected equally by the number of the mappings and the records, and it is related with the number of the links that are created during the transformation process. X3ML Engine - Evaluation 21

CRM SIG, October 8, 2015 X3ML Engine - Evaluation 22

CRM SIG, October 8, 2015  X3ML Data Exchange Framework is based on the X3ML mapping definition language and the X3ML engine  X3ML Data Exchange Framework solves a number of problems that have to do with managing and aggregating heterogeneous data by: o Supporting the cognitive process of mapping and the schema mappings are expressed in a declarative way. o Keeping the schema mappings between different systems harmonized. o Separating the schema matching and the URI generation policies  X3ML Data Exchange Framework is being used by a significant number of European Projects  X3ML Engine will be extended in order to support other types of input and invertible X3ML mappings Conclusions 23

CRM SIG, October 8, 2015 CIDOC CRM Mapping Repository 24 Published schema matching definitions are available at: The schema matching definition (Version 1.0) format is available: The Mapping Memory Manager (3M) is available: Domain experts are able to easily understand & edit X3ML mapping files You are kindly invited to send us your schema matching definition.

CRM SIG, October 8, 2015 ResearchSpace Workshops 25 CIDOC CRM Mapping workshop for humanities scholars and cultural heritage professionals Supported by the Yale Center for British Art and Yale University 10th - 12th August 2015, Yale University, New Haven, USA CIDOC CRM Mapping workshop at Oxford University Inaugural European workshop hosted at University of Oxford e-Research Centre 9th - 10th November 2015 Some feedback from the recent USA workshop: “This was SO helpful…I have already made better decisions this week as we develop our collections online presence” “Thank you so much! This was an excellent event. It came at the perfect time for my project, and has given me practical methods for moving forward with my data mapping and transformation”. “I had a blast and learned a lot!”

CRM SIG, October 8, 2015 Thank you! 26