2 Geert Van Grootel Flemish government Economy, Science and Innovation dept. Knowledge Management Division euroCRIS Treasurer CERIF taskgroup member Contact Koning Albert II-laan, 35 bus 10 1030 Brussel firstname.lastname@example.org
4 The research information space researchers Research organisation Investment opportunities projects publications patents equipment government financers researchers editors libraries data centres research institutions industry products research data facilities
6 Vision and strategic goals a simple, transparent and open research information space that contributes to the Flemish knowledge based economy (and strengthening the international competitive position of Flanders) more efficient and effective policy (monitoring) innovation value chain works faster better customer services (e-government) improved and faster valorisation increased networking capacities finding expertise improved information flow maximal reuse of data Simple and uniform processes enhancing strategic intelligence better information: complete, correct, actual higher responsivity of the policy domain
7 Generate the required information directly from the data within processes at and between the stakeholders: Faster and reduced workload Data quality guaranteed Simultaneity - real time data Data – Information –Knowledge at the lowest process cost Central principle
8 Flanders Research Information Space Globally (includes all stakeholders) Network of federated repositories For & by all stakeholders researchers, educators & students industry management public Open Acces via open standards (CERIF) Semantically rich environment (SBVR) Maximal formal information interconnection Generate the required information directly from the data within processes at and between the stakeholders : Faster and reduced workload - Data quality guaranteed - Simultaneity - real time data
10 Project: Publication metadata in Research Portal According to FRIS principles Repository content can be linked with Research Portal data. Preferably by identifiers: Person, OrgUnit If feasible by author name Integration into Research Portal and exposing via portal interfaces Two scenarios In repository conversion, harvesting, loading Harvesting in native format, conversion, loading
11 Scenario 1: details University Dspace repository Metadata format: qualified DC Not yet publicly accessible: data cleansing process ongoing Based on large import from WoS Merged with university data for Person and OrgUnit 10 person day (@mire) Product: CERIF2006 publication metadata available at OAI-PMH interface.
12 2 Running university repositories Dspace: DC, qDC and MODS metadata formats available. Metadata format of choice: MODS Different levels of integration Workflow with controlled identifier insertion Correct author and orgunit identifiers in relations (for internal authors and orgunits) Simple stand alone interface No identifiers available 10 person days allocated (EWI) Product: CERIF2006 xml file Scenario 2: details
13 Scenario 1. execution Conversion of qDC into CERIF (@mire) Steep learning curve, CERIF familiarization did cost time Several iterations needed to produce harvest in digestible format To get CERIF format right (at least for the quite flexible portal harvesting module) Data error corrections Date as text field: 2001, 2001-01, 01-2001, jan-2001, ?2001 or maybe 2002 Identifiers sourced form different internal databases Ambiguous references due to personnel status Authors with >1 identifier Harvesting Possible Loading into Research Portal data base Failed due to persistent relation constraint violations on Person and OrgUnit Project aborted after evaluation
14 Scenario 2. execution Elimination of identifier-less repository Name to identifier mapping was considered to time consuming for the resources of this scenario Schema mapping though was succesfull Harvesting of repository content in single xml filesingle xml file Create the following patterns in BS Studio generic Publication pattern Mods pattern CERIF2006 pattern Test pattern against there sources by self commitment Create mappings: mods2generic generic2CERIF2006 Generate CERIF2006 xml as output.
15 Scenario 2. execution For several small test datasets this scenario was successful Loading the full repository content proofed difficult Identifier mismatches Data type support in source Date fields Numeric fields Eg. Page: 103 vs p.103 vs pag. 103
16 Conclusion: scenarios End goal is to ambitious The less desirable is scenario 1 Scalability problem Ad hoc per repository and repository software Learning curve Scenario 2 is the more systematic approach Require tools & semantic mapping knowledge Modification only in details on mods patterns Other patterns and the commitments reusable
18 Conclusions: metadata schemes DC an qDC have to little formality and normalisation Hampers considerably implementation in m2m Mods is more robust but still need more formality CERIF has steep learning curve less human friendly Need for support for date-time fractions Year of Data-Time for publication year
19 Conclusions: Workflow & process control Data inconsistencies need a architectural or at least a system approach Integration of information sources with workflow control Strong integration in researchers workbench Responsibilities The management of the production is a core responsibility of each organisation. Why is research the exception Is it wise to buy back the own production data from a third party