DAITF Data Access and Interoperability Task Force DAITF Preparation Group status report Daan Broeder EUDAT / TLA - MPI for Psycholinguistics.

Slides:



Advertisements
Similar presentations
doi> Digital Object Identifier: overview
Advertisements

The Data Conservancy: A Digital Research and Curation Virtual Organization D4Science World User Meeting November 25, 2009.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
Data Conservancy and the US NSF DataNet Initiative 2010 JISC/CNI Conference July 1, 2010 Sayeed Choudhury Johns Hopkins University.
A Unified Approach to Combat Counterfeiting: Use of the Digital Object Architecture and ITU-T Recommendation X.1255 Robert E. Kahn President & CEO CNRI,
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
OVERVIEW & LIBRARY SUPPORT FOR DATA MANAGEMENT/SHARING Jim Van Loon, MSME/MLIS Science Librarian.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
DASISH Common Solutions to Common Problems. DASISH – Data Service Infrastructure for the Social Sciences and Humanities DASISH brings together 5 ESFRI.
New DFG Information Infrastructure Projects Dr. Stefan Winkler-Nees; Birmingham, 28. March 2011 New DFG Information Infrastructure Projects.
Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,
Tryggve project developing services for sensitive biomedical data: Call for Nordic use cases NeiC 2015 Conference Workshop on sensitive data Antti Pursula.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data.
The Preparatory Phase Proposal a first draft to be discussed.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands PIDs in Data Infrastructures Peter Wittenburg CLARIN Research.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
APARSEN WP22 Identifiers and Citability APARSEN WP22 Identifiers and Citability Some key results Fondazione Rinascimento Digitale Emanuele Bellini, Chiara.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Digital Preservation: Lessons learned through national action Digital Preservation Interoperability Framework Workshop April 2010.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
eSciDoc Community Model Draft eSciDoc Community Model Overview 1.Introduction 2.Requirements on the Community Model 3.Organizational.
Replicate Research Data Safely eudat.eu/b2safe B2SAFE How to replicate your data using EUDAT’s B2SAFE Version 3 November 2015 This work is.
Data Conservancy and the US NSF DataNet Initiative Fourth Workshop on Data Preservation and Long-Term Analysis in HEP Sayeed Choudhury Johns Hopkins University.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Working.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The use of the.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
International Planetary Data Alliance Registry Project Update September 16, 2011.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Services.
PIDs in EUDAT Webinar, 15 Februari 2013
EUDAT’s engagement with the Earth Sciences
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
Paolo Budroni, University of Vienna
DataNet Collaboration
WP7: Training & Education
EOSC services architecture
Common Solutions to Common Problems
Malte Dreyer – Matthias Razum
Institutional Repositories
Bird of Feather Session
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

DAITF Data Access and Interoperability Task Force DAITF Preparation Group status report Daan Broeder EUDAT / TLA - MPI for Psycholinguistics

Context EUDAT and OpenAIREplus projects were invited to collaborate on helping prepare DAITF … an organization dedicated to discussing and suggesting solutions for data access and interoperability in the context of e-infrastructures A ‘DAITF preparation group’ of experts was approached to discuss this: Michael Lautenschlager, Reinhard Budich, Johannes Reetz, Stefan Heinzel, Maurice Bouwhuis, Marc van der Sanden, Alberto Michelini, Daan Broeder, Willem Elbers, Peter Wittenburg, Giuseppe Fiameni, Jens Jensen, Adrian Burton, Ross Wilkinson, Andrew Treolar, Donatella Castelli, Yannis Ioannidis, Paolo Manghi, Herbert van de Sompel, Ken Galluppi, Reagan Moore, Bob Kahn, Larry Lannom, … A ‘DAITF Preparation Note’ was drafted and is currently being discussed (ed. Peter Wittenburg, Michael Lautenschlager, Daan Broeder, …) A DAITF forum is in preparation and will be functional in a few weeks Some DAITF workshops will be organized by EUDAT and OpenAIREplus However it should be stressed that concerning the content of the preparation note all is subject to further discussions

Motivation Ever increasing amounts of (scientific) data and the need to properly manage that. Should make that data available for reuse and recombination in new research contexts. Should establish proper trust relation between data creators, managers and users. Then there is the large variety and fragmentation preventing easy solutions – Data types – Design: data models, formats, semantics – Implementations: repository systems, tools, etc. – This diversity will probably be increasing

Collaborative Data Infrastructure Integrate existing data solutions from different communities A ‘common data model’ or ‘abstract architecture’ would facilitate this Will require considerable work and collaboration from all actors Most problems concerning heterogeneity and fragmentation are the interfacing between community services and common data services

Scope Many aspects to DAITF, but the core points are: Terminology synchronization between different participating communities Support for the scientific work flow incl. enriched publications Improving access and interoperability while accepting heterogeneous community solutions Long-term preservation and curation policies Trust framework for data creators, managers and users Abstract architecture that the different communities can relate to … including an analysis of the necessary primitives: data object, metadata, resource, identifier But only registered data: data that is visible and that some organization takes responsibility for

Abstract Architecture Needs a community driven discussion, but some guidance for harmonization could help (std. orgs?) Analyzed existing community solutions and general models and implementations: – Kahn/Wilensky DOA (CNRI), CMIS (OASIS), IRods (Data Grid) – DOBES/CLARIN DOA (Linguistics/Humanities),ENES DOA (Climate), EPOS (Earth Sciences) Need to increase this number: more research communities, more models as W3C a.o.

Possible DAITF Governance Structure Focus should be on discussions by experts not on governance, so start lightweight Comparing ISO, IETF, OASIS. IETF seems closest to a grass-roots approach we need However DAITF scope is more heterogeneous: different disciplines, funding organizations, global initiatives, software developers and vendors.

Next steps? Suggestion would be to first collect a core group of experts from a few domains, continue working on preparatory docs + forum discussions Should organize a series of well prepared workshops Starting Apr/May 2012 as planned by EUDAT & OpenAIREplus Communities Expert group From different communities world wide Understand & experienced with scientific workflow Participated in design & implementation of solutions Willing to abstract from their base Steering board Steering board Continuity Summarizing Initiating topics Encourage convergence Start special working groups elects ? ?

Thank you for your attention

Scientific workflow This is an essential use case that such an ‘abstract architecture’ should support Also where cooperation of projects as EUDAT and OpenAIRE(plus) will become most visible and needed analysis enrichment raw data and descriptions registration preservation citable publication temporary data raw data and descriptions registration preservation citable publication analysis enrichment temporary data traditional workflow eScience workflow referable &citable data

Non-EU Contacts CNRI (Bob Kahn, Larry Lannom): DOA, Handle System – Bob Kahn visited in August, some talks about CNRI DOA RENCI, National Climate Data Center in Renaissance Computing Institute; Ken Gallupi DICE, Data Intensive Cyber Environments; Reagan Moore (SRB, iRods) – Reagan Moore will visit Oct 6/7 for consultations with EUDAT – Still possible to invite a few other experts OAI: Herbert vd. Sompel DataONE (DataNet): Bill Michener – We will visit the DataONE hands-on meeting DataConservancy (DataNet): Sayeed Choudhury

EUDAT OpenAIRE datasets & metadata datasets & metadata publications data depositor data curator reviewer author editor API Identifiers for Actors (ORCID?) Identifiers for data & publications (HS, DOI, URN)

D… Access and Interoperability T… F.. Access Visibility (metadata), Access (AAI via Federated Identities) deposits (organizational guarantees), interpretability (syntax, semantics), preservation (copies), curation (adapt data to new standards & technology), verifiability (checksums), quality assessment (trust your archive ), policy framework covering also legal & ethical aspects (licences) Interoperability integration of data in a single virtual domain and support joint operations

DAITF workshops Two DAITF workshops should be planned Proposal is to have a preparatory OpenAIRE/EUDAT meeting in Jan/Feb 2012 to synchronize terminology, ideas and ambitions First DAITF workshop would then be in April/May 2012

US Cooperation PhD exchange, expert hosting Joint (data management) API development Joint development data management middleware: possibly (based on) iRods

EUDAT Consortium

The Data Conservancy is one of two initial awards through the National Science Foundation's DataNet Program. The Data Conservancy shares a common vision that data curation is not an end, but rather a means to provide persistent access to a variety of scientific data for addressing grand challenge research problems. In addition to the infrastructure development that lies at the core of the Data Conservancy, the project team is directly focusing on a semantic view of data and other forms of content as compound objects that describe a full picture of the scientific process. This presentation will feature an overview of the Data Conservancy with an emphasis on the data framework aspects of the project.

WP4 First Analysis Results Produce a set of Kahn’s Digital Object Architecture (1995, 2006) originatordepositorrepository Auser registered DO - data - metadata (Key-MD) handle generator PID property record rights type (from central registry) ROR flag mutable flag transaction record repository B work ownership data metadata (Key-MD) PID access rights hands-over requests deposits via RAP requests stores maintains receives disseminations via RAP replicates

WP4 First Analysis Results right level of abstraction? ignore DO content - analogue DOBES Object Architecture (2002) originatordepositorrepository Auser registered DO - data - metadata - PID handle generator rights type (open vocabulary) ROR flag transaction record repository B work ownership data metadata access rights hands-overdeposits via LAMUS requests stores maintains receives disseminations via Apps replicates depositorrepository A user registered DO - data - metadata handle generator to come rights type (open vocabulary) transaction record data metadata access rights deposits via NETCDF requests to come stores maintains users build arbitrary virtual collections Virtual Collection Object - metadata - mutable flag - DOI stores data publication users register collections with publications ENES Object Architecture (2006)

WP4 DataONE / DataConservancy not too bad to see what others are doing  NSF DataNet initiative two research driven projects: DataONE, DataConservancy one horizontal project to come: DICE? DataONE: combination of biological (genome to ecosystem) and environmental (atmosphere, ecology, hydrology, oceanography) researchers (want to reach out) coordinating nodes (basic indexing, replication etc), member nodes with utilization software to interact with researchers and citizen scientists research institutions, libraries, information science, IT DataConservancy: combination of astronomy, earth sciences, life sciences, and anthropology infrastructure for data preservation & curation, capacity building, libraries as cornerstones for sustainability all based on data modeling, data management support and capacity building research institutions, libraries, information science, service providers,

WP4 Start simple - but that’s not the end HLEG: there will be NO ONE solution (technology, organization, etc) there will be heterogeneity and dynamics driven by research Trust/Usability: this is a sensitive issue in terms of risks, benefits, gain, etc. are we sensitive enough to connect with researchers workflows? No Stupid Questions “please say what you need and we will do it” no big questionnaires real progress is a matter of interaction, evolving ideas, potentials, sensitivity, changing research paradigms, etc. AND some sensitive and experienced people need to dare to take some risks like entrepreneurs and design simple start-up services

WP4 Startup Plan (first 6 months) Produce a set of important questions for the interviews 31. August Plan a first SAF meeting in second half November31. August make first round of interviews with core communities12. September Finish CDI understanding work and include all core communities15. September (this is inline with the DAITF DOA analysis work) Get a charter done for SAF (WP2/4)17. October Plan a user forum for January17. October Structured interviews with all core communities 31. October - covering data organization, architectures, standards, registries, etc. - covering wishes and requirements for use/service cases Create a Requirements Spec. Doc. for comm. requirements(4/5/6) 31. October Analyze interviews and iterate to fill gaps etc14. November - transform results into RSD (WP4/5/6) in parallel improve DAITF / DCI document 14. November Present & discuss all results in first SAF Meeting24. November - decide about first 2 or 3 service cases to implement at SAF meeting - consider WP5/6 technology watch in discussion/selection process Make a full plan/roadmap for selected service casesDecember Run a first User Forum extending to other EUDAT communitiesJanuary 12 Prepare a first DAITF workshopJanuary 12 Extending interviews and analysis to other EUDAT comm.Jan-March 12 Write Del Analysis of CDIMarch 12

WP4 Update Meetings 22/23. SeptemberLyon Meeting (DAITF) 6/7. October 2011Meeting with Reagan Moore 12/13. Octobere-IRG Meeting (DAITF) 17/19. October 2011Hands-On Meeting with DataONE 17/18. November 2011SSH Meeting 24. November 2011first SAF Meeting 2/3. December 2011Meeting with Bill Michener DataONE January 12first User Forum March/April 2012first DAITF workshop Past Meetings 2. AugustMeeting with Bob Kahn 23. AugustWP4/5/6 Meeting Augustseveral EPIC PID Meetings