“Data is the new Oil” (Ann Winblad) Keith G Jeffery 20140415-16JRC Workshop Big Open Data © Keith G Jeffery.

Slides:



Advertisements
Similar presentations
©euroCRIS/Keith G JefferyOA Workshop May 2010 CNR Roma The euroCRIS view of the Rome OA Workshop Keith G Jeffery President, euroCRIS
Advertisements

Resource description and access for the digital world Gordon Dunsire Centre for Digital Library Research University of Strathclyde Scotland.
© Keith G Jeffery, Anne G S Asserson GL6: New York: December 2004: IP & Corporate Context Relating Intellectual Property Products to the Corporate.
© Keith G Jeffery, Anne G S Asserson GL 11 Washington Keith G Jeffery Director, IT & International Strategy, STFC
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam Keith G Jeffery Director, IT & International Strategy, STFC
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
UKOLN, University of Bath
Forest Markup / Metadata Language FML
A multi-level metadata approach for a Public Sector Information data infrastructure Nikos Houssos 1,2, Brigitte Jörg 1,3, Brian Matthews 4 1 euroCRIS 2.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
CERIF: Common European Research Information Format An international standard relational data model for storage and interoperability of research information.
CERIF-CRIS Overview Keith G Jeffery
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
1 Metadata Data Foundation and Terminology RDA-P5 San Diego - Keith Jeffery.
CERIF-CRIS Overview Keith G Jeffery
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
CRIS + Repositories: Setting the Scene Keith G Jeffery President euroCRIS
© Keith G Jeffery & Anne AssersonCERIF Course: Data Model CERIF COURSE Session3: DataModel 1 Keith G Jeffery, Director, IT CLRC
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
An Overview of the Research Information Metadata Ecosystem Prof Keith G Jeffery ©Keith G JefferyAn Overview.
© Keith G Jeffery & Anne AssersonCERIF Course: Evolution CERIF COURSE Session 6: Evolution Keith G Jeffery, Director, IT CLRC
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
CERIF Common European Research Information Format Jan Dvořák, Jiří Souček, Tomáš Chudlarský  Institute of Information Studies & Librarianship Faculty.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
©STFC/Keith G Jeffery The value of recording each step of the research process The Value of Recording each Step of the Research Process Keith.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
CERIF for Datasets: Background and Key Findings Workshop, London 26 th July 2013 CERIF slides reproduced from presentations by euroCRIS members : Keith.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
©Keith G Jeffery/ Anne AssersonSupporting the Research Process with a CRIS CRIS Supporting the Research Process with a CRIS Keith G Jeffery Director.
METADATA WORKSHOP Conclusions Keith Jeffery Peter Wittenburg.
1 24 September BREAKOUT :30 1)Review of Metadata Standards Directory (DCC version and GitHub) 2)Introduction of Metadata Standards Catalog.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
1 METADATA Plenary Session RDA P5 San Diego. 2 Agenda  Introduction to Metadata & Principles of Metadata Groups  Use Case Template  Standards Directory.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
© euroCRIS/Keith G Jeffery 1 euroCRIS and e-Infrastructure Keith G Jeffery President, euroCRIS Premium Members.
Evidence from Metadata INST 734 Doug Oard Module 8.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
Metadata Registries Registry: authoritative, centrally controlled store of information – W3C Web Services Glossary, 2004
Keith G Jeffery Director, IT What does a CRIS add to Open Access Publications ?
Auditing Grey in a CRIS Environment
A centre of expertise in digital information managementwww.ukoln.ac.uk DCMI Affiliates: Implications for Institutions Rosemary Russell UKOLN University.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Title of the presentation | Date |1 Nikos Houssos National Documentation Centre (EKT/NHRF) CRIS for research information management.
Data in Context Co-chairs: Brigitte Jörg, Keith Jeffery RDA 3rd Plenary, March, 26th - 28th, 2014 Dublin.
1 Metadata Elements and Domain Groups - Keith G Jeffery.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
Analysis of Use Cases (and to some extent, standards) - Keith G Jeffery, Rebecca Koskela.
1 The Metadata Groups - Keith G Jeffery. 2 Positioning  Raise profile of metadata  Data first  Also software, resources, users  Achieve outputs/outcomes.
Metadata Standards Directory Alex Ball, Jane Greenberg, Keith Jeffery, Rebecca Koskela.
©Keith G Jeffery/ Anne AssersonCRIS: Central Relating Information System CRIS CRIS: Central Relating Information System Keith G Jeffery Director.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
1 The Metadata Groups - Keith G Jeffery. 2 Positioning  Raise profile of metadata  Data first  Also software, resources, users  Achieve outputs/outcomes.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
CERIF for research evaluation including REF Keith G Jeffery STFC / euroCRIS 20/04/20111Ready 4 Ref End of Project Workshop
Session 3 Metadata & Workflow
The Metadata Groups Unpacking the Elements- Keith G Jeffery
Introduction to Metadata
Applications of IFLA Namespaces
Session 2: Metadata and Catalogues
Proposal of a Geographic Metadata Profile for WISE
Presentation transcript:

“Data is the new Oil” (Ann Winblad) Keith G Jeffery JRC Workshop Big Open Data © Keith G Jeffery 1

Data is the New Oil Like oil has been, data is – Abundant – Unrefined – Needs refining to extract value – Has great value when refined – Can be used in many ways So how do we gain value from data? – We manage it And what is required for that management? – Data management plan – Appropriate metadata covering all aspects of the data lifecycle JRC Workshop Big Open Data © Keith G Jeffery 2

STFC Rutherford Appleton Laboratory JRC Workshop Big Open Data © Keith G Jeffery 3

CAREER Late 60s First UK relational system: G-EXEC 70s Filematch: interoperation Early 80s Online grants, library, science Late80s IDEAS, EXIRPTS 90s CERIF 90s W3C Standards – CGI, SVG, SMIL, OWL, SKOS 00s e-Science – GRIDs, CLOUDs Late 60s First UK relational system: G-EXEC 70s Filematch: interoperation Early 80s Online grants, library, science Late80s IDEAS, EXIRPTS 90s CERIF 90s W3C Standards – CGI, SVG, SMIL, OWL, SKOS 00s e-Science – GRIDs, CLOUDs JRC Workshop Big Open Data © Keith G Jeffery 4

Agenda Kinds of data – Open government data, open data, big data Need for metadata – problems with exiting standards and experience of ENGAGE CERIF – for research information – wider CERIF / INSPIRE proposed mapping Metadata in RDA context Conclusion JRC Workshop Big Open Data © Keith G Jeffery 5

Data Characterisation Data Structured Semi-structured Unstructured Static Dynamic Streamed Secure / open Private / public Toll free/Toll Open government data Open data Big data JRC Workshop Big Open Data © Keith G Jeffery 6

Agenda Kinds of data – Open government data, open data, big data Need for metadata – problems with exiting standards and experience of ENGAGE CERIF – for research information – wider CERIF / INSPIRE proposed mapping Metadata in RDA context Conclusion JRC Workshop Big Open Data © Keith G Jeffery 7

Data about data (DCMI defintion) – Unhelpful! Analogy of user of library Somehow describes internet resources for the end-user Metadata Book on shelf Catalog card Library User Internet User Internet Resource Meta data JRC Workshop Big Open Data © Keith G Jeffery 8

Consider a library – Catalogue cards – Books on shelves To researcher or reader the catalogue cards are metadata – Describe the book and point to where it is on the shelf – Descriptive and navigational metadata To librarian catalogue cards are data – use catalogue cards to count number of books on ‘information technology So do not distinguish data and metadata except by how used Metadata Book on shelf Catalog card report User Librarian JRC Workshop Big Open Data © Keith G Jeffery 9

Data Lifecycle JRC Workshop Big Open Data © Keith G Jeffery 10 Acknowledgement DCC (Digital Curation Centre, UK

Metadata Description Location Contextualisation Preservation Provenance Schema Discovery Context Detail Re-use Interoperation JRC Workshop Big Open Data © Keith G Jeffery 11

Metadata Standards There are hundreds of specific formats used as a ‘standard’ within a specific community but ones used widely are: DC (Dublin Core): used to describe web pages  web resources CKAN (Comprehensive Knowledge Archive Network): used in government open data sites – based on DC eGMS; e-Government Metadata Standard – based on DC DCAT (Data Catalog): used for datasets on the web – based on DC INSPIRE : used for datasets with geospatial coordinates – EU Directive and standard; some overlap with DC but extended CERIF (Common European research Information Format): used for all research information All but CERIF are ‘flat’ or ‘linear’ There are hundreds of specific formats used as a ‘standard’ within a specific community but ones used widely are: DC (Dublin Core): used to describe web pages  web resources CKAN (Comprehensive Knowledge Archive Network): used in government open data sites – based on DC eGMS; e-Government Metadata Standard – based on DC DCAT (Data Catalog): used for datasets on the web – based on DC INSPIRE : used for datasets with geospatial coordinates – EU Directive and standard; some overlap with DC but extended CERIF (Common European research Information Format): used for all research information All but CERIF are ‘flat’ or ‘linear’ JRC Workshop Big Open Data © Keith G Jeffery 12

Contributor Coverage Creator Date Description Format Identifier Language Publisher Relation Rights Source Subject Title Type Contributor Coverage Creator Date Description Format Identifier Language Publisher Relation Rights Source Subject Title Type Text HTML XML RDF Namespaces Ontologies Text HTML XML RDF Namespaces Ontologies Metadata Standards: DC JRC Workshop Big Open Data © Keith G Jeffery 13

Title Unique Identifier Groups Description Revision History Licence Tags Multiple Formats API key Extra Fields Title Unique Identifier Groups Description Revision History Licence Tags Multiple Formats API key Extra Fields RDF ontologies RDF ontologies Metadata Standards: CKAN Black signifies same as DC JRC Workshop Big Open Data © Keith G Jeffery 14

Metadata Standards: e-GMS Accessibility Addressee Aggregation Audience Contributor Coverage Creator Date Description Digital signature Disposal Format Identifier Accessibility Addressee Aggregation Audience Contributor Coverage Creator Date Description Digital signature Disposal Format Identifier Language Location Mandate Preservation Publisher Relation Rights Source Status Subject Title Type Language Location Mandate Preservation Publisher Relation Rights Source Status Subject Title Type JRC Workshop Big Open Data © Keith G Jeffery 15 Black signifies same as DC

Metadata Standards: DCAT Same as DC are: Title, description, identifier, keyword, language JRC Workshop Big Open Data © Keith G Jeffery 16

Metadata Standards: INSPIRE EU Directive (2008, 2009) For Geospatial datasets – Initiated by ESA Essentially DC plus geospatial information Geospatial information very detailed – coordinate system, precision etc EU Directive (2008, 2009) For Geospatial datasets – Initiated by ESA Essentially DC plus geospatial information Geospatial information very detailed – coordinate system, precision etc JRC Workshop Big Open Data © Keith G Jeffery 17

Metadata Standards: CERIF Common European Research Information Format Data Model for exchange and storage of information about research CERIF91 ( ) quite like the later Dublin Core (late 1990s) CERIF2000 ( ) used full E-E-R modelling – Base entities – Linking entities with role and temporal interval 2002 EC requested euroCRIS to maintain, develop and promote CERIF Now in use in 43 countries and national standard for research information in 10 Common European Research Information Format Data Model for exchange and storage of information about research CERIF91 ( ) quite like the later Dublin Core (late 1990s) CERIF2000 ( ) used full E-E-R modelling – Base entities – Linking entities with role and temporal interval 2002 EC requested euroCRIS to maintain, develop and promote CERIF Now in use in 43 countries and national standard for research information in JRC Workshop Big Open Data © Keith G Jeffery 18

Metadata Comparison (1) [1] [1] After RDF version and introduction of DCMI abstract model (however the advanced features of this version are very rarely utilised in DC implementations). [2] [2] Very limited support for structures with separate entities. [3] [3] Literals, e.g. name, are allowed. [4] [4] For example the author and maintainer fields are allowed to take name strings as values. [5] [5] CERIF Entity cfMedium. [6] [6] CKAN Entity Resource. [7] [7] DCAT Entity Distribution [8] [8] In fact it is limited, using the relation element and its refinements (for example the isDerivedFrom relation – very useful for datasets - is not supported). [9] [9] In fact is is limited support, only between datasets and with predefined relationship semantics. [10] [10] Has very good support for vocabularies using SKOS, but not crosswalking. [1] [1] After RDF version and introduction of DCMI abstract model (however the advanced features of this version are very rarely utilised in DC implementations). [2] [2] Very limited support for structures with separate entities. [3] [3] Literals, e.g. name, are allowed. [4] [4] For example the author and maintainer fields are allowed to take name strings as values. [5] [5] CERIF Entity cfMedium. [6] [6] CKAN Entity Resource. [7] [7] DCAT Entity Distribution [8] [8] In fact it is limited, using the relation element and its refinements (for example the isDerivedFrom relation – very useful for datasets - is not supported). [9] [9] In fact is is limited support, only between datasets and with predefined relationship semantics. [10] [10] Has very good support for vocabularies using SKOS, but not crosswalking. #FeatureUse caseCERIF Dublin Core CKANDCAT 1 Representation of graph structures Realistic representati on of domain of discourse, Generation of Linked Open Data YES NOYES 2 Typed values enforced for values that are entity instances Unambiguo us identification of types and instances. YESNO YES 3 Explicit representation of resources (e.g. data files) Different physical embodiment s of what the metadata describes YESNOYES 4Time-stamping of relationships Accurate real-world representati on, provenance, versioning YESNO JRC Workshop Big Open Data © Keith G Jeffery 19

Metadata Comparison (2) 5 Capture both the dates and actors of events Accurate real-world representati on, provenance, versioning YES Only dates 6 Recursive relationships Compound objects, Derived objects YES NO 7 Extensible relationship semantics Complex objects, accurate semantics YESNO 8 Representation and crosswalking between vocabularies Co- existence of different vocabularies YESNO YES/NO 9 Multilingual values for the same metadata field Multi- lingual environment (e.g. Europe) YES 10Translated flag for multi-linguality Warn metadata consumers (including programs) for machine translated values YESNO JRC Workshop Big Open Data © Keith G Jeffery 20

The Problem with ‘flat’ metadata they violate basic principles of information integrity – elements do not depend functionally on the uniquely identified metadata record. they store event flags or dates in the metadata – e.g. ‘date of publication’, ’received (Y/N)’ they do not handle well multilinguality and multiple linguistic versions of the same text field; they do not manage well versioning and provenance – this requires time-stamped relationships between one research information entity and another they do not allow multiple classification schemes for the same entity or – more generally – multiple terminology schemes for the same attribute of an entity; they do not provide mechanisms for crosswalking between different vocabularies; they do not provide extension mechanisms that preserve interoperability; JRC Workshop Big Open Data © Keith G Jeffery 21

3-Layer Model Need to interoperate at discovery level with other commonly-used metadata standards Need to navigate user to detailed domain-specific metadata on datasets to allow further (re-)processing Between these two need to understand the CONTEXT of the described objects (not only data) So use CERIF as the middle contextual layer Generate discovery level (above) Point to detailed level (below) JRC Workshop Big Open Data © Keith G Jeffery 22

3-Layer Model JRC Workshop Big Open Data © Keith G Jeffery 23

3-Layer Model JRC Workshop Big Open Data © Keith G Jeffery 24

Agenda Kinds of data – Open government data, open data, big data Need for metadata – problems with exiting standards and experience of ENGAGE CERIF – for research information – wider CERIF / INSPIRE proposed mapping Metadata in RDA context Conclusion JRC Workshop Big Open Data © Keith G Jeffery 25

Project Person / CV Institution Event Equipment Books Journal/article Patent Research Group Publisher Information of Interest JRC Workshop Big Open Data © Keith G Jeffery 26

Contextual Metadata: CERIF JRC Workshop Big Open Data © Keith G Jeffery 27 CERIF An EU Recommendation to Member States

RESULT_PUBLICATION PROJECT ORGUNIT PERSON Result_Publication Can Express: Person A (DT1 - DT2) (is author of) Publication X Orgunit O(DT1 - DT2) (is owner of IPR in) Publication X Person A (DT1 - DT2) (is employee of ) Orgunit O Person A (DT1 - DT2) (is project leader of)Project P Person A (DT1-DT2) (is member of) Orgunit M Person A (DT1-DT2) (is member of) Orgunit N Orgunit M (DT1-DT2) (is part of) Orgunit O Orgunit N (DT1-DT2) (is part of) Orgunit O CERIF Expressiveness JRC Workshop Big Open Data © Keith G Jeffery 28

Result_Publication Instance Diagram Person A Publication X OrgUnit O OrgUnit M OrgUnit N Project P member employee Part of owns IPR author Project leader JRC Workshop Big Open Data © Keith G Jeffery 29 CERIF

CERIF Features Developed by international community – consensus Flexible and extensible Separation of base and link entities – Flexible / extensible – Rich semantics (role) – Temporal : it is the relationships that have duration Multi characterset Multilingual Formal Syntax – Efficient, accurate computer processing Declared Semantics – Including crosswalks for interoperation Developed by international community – consensus Flexible and extensible Separation of base and link entities – Flexible / extensible – Rich semantics (role) – Temporal : it is the relationships that have duration Multi characterset Multilingual Formal Syntax – Efficient, accurate computer processing Declared Semantics – Including crosswalks for interoperation CERIF JRC Workshop Big Open Data © Keith G Jeffery 30

Repositories and CERIF To view content (white or grey) in repositories through contextualised, structured metadata – E.g. Relate publication to: Persons Organisations Projects Funding Facilities Equipment Event Patent Product Repository metadata DC (Dublin Core) insufficient (as recognised by OpenAIREPlus when adopted CERIF) To view content (white or grey) in repositories through contextualised, structured metadata – E.g. Relate publication to: Persons Organisations Projects Funding Facilities Equipment Event Patent Product Repository metadata DC (Dublin Core) insufficient (as recognised by OpenAIREPlus when adopted CERIF) Allows the user to judge better relevance, quality JRC Workshop Big Open Data © Keith G Jeffery 31 CERIF

Agenda Kinds of data – Open government data, open data, big data Need for metadata – problems with exiting standards and experience of ENGAGE CERIF – for research information – wider CERIF / INSPIRE proposed mapping Metadata in RDA context Conclusion JRC Workshop Big Open Data © Keith G Jeffery 32

INSPIRE-CERIF Identifier (Title, ID, Abstract, Locator) Classification Keyword Geo B Box, Country Temporal (dates) Lineage Resolution Conformity Constraints (use) Responsible party Title, ID, Abstract, URI Class Scheme, Class Keyword Geo B Box, Country Linking relations start/end date/time Linking relations temporal/classification Measurement Linking relation to certifier Linking relation to licence Linking relation to OrgUnit/Person Fairly straighforward Already have CERIF-DC INSPIRE geospatial elements  CERIF: GeoBbox JRC Workshop Big Open Data © Keith G Jeffery 33

Agenda Kinds of data – Open government data, open data, big data Need for metadata – problems with exiting standards and experience of ENGAGE CERIF – for research information – wider CERIF / INSPIRE proposed mapping Metadata in RDA context Conclusion JRC Workshop Big Open Data © Keith G Jeffery 34

Metadata RDA Metadata Interest Group Metadata Standards Directory Working Group Data In Context Interest Group Working with Provenance Group and groups on repositories, types… An various domain-specific groups JRC Workshop Big Open Data © Keith G Jeffery 35

Agenda Kinds of data – Open government data, open data, big data Need for metadata – problems with exiting standards and experience of ENGAGE CERIF – for research information – wider CERIF / INSPIRE proposed mapping Metadata in RDA context Conclusion JRC Workshop Big Open Data © Keith G Jeffery 36

Conclusion Data is the new Oil Metadata is the catalyst to make it useful JRC Workshop Big Open Data © Keith G Jeffery 37