Presentation is loading. Please wait.

Presentation is loading. Please wait.

DAITF Data Access and Interoperability Task Force DAITF Preparation Group status report Daan Broeder EUDAT / TLA - MPI for Psycholinguistics.

Similar presentations


Presentation on theme: "DAITF Data Access and Interoperability Task Force DAITF Preparation Group status report Daan Broeder EUDAT / TLA - MPI for Psycholinguistics."— Presentation transcript:

1 DAITF Data Access and Interoperability Task Force DAITF Preparation Group status report Daan Broeder EUDAT / TLA - MPI for Psycholinguistics

2 Context EUDAT and OpenAIREplus projects were invited to collaborate on helping prepare DAITF … an organization dedicated to discussing and suggesting solutions for data access and interoperability in the context of e-infrastructures A ‘DAITF preparation group’ of experts was approached to discuss this: Michael Lautenschlager, Reinhard Budich, Johannes Reetz, Stefan Heinzel, Maurice Bouwhuis, Marc van der Sanden, Alberto Michelini, Daan Broeder, Willem Elbers, Peter Wittenburg, Giuseppe Fiameni, Jens Jensen, Adrian Burton, Ross Wilkinson, Andrew Treolar, Donatella Castelli, Yannis Ioannidis, Paolo Manghi, Herbert van de Sompel, Ken Galluppi, Reagan Moore, Bob Kahn, Larry Lannom, … A ‘DAITF Preparation Note’ was drafted and is currently being discussed (ed. Peter Wittenburg, Michael Lautenschlager, Daan Broeder, …) A DAITF forum is in preparation and will be functional in a few weeks Some DAITF workshops will be organized by EUDAT and OpenAIREplus However it should be stressed that concerning the content of the preparation note all is subject to further discussions

3 Motivation Ever increasing amounts of (scientific) data and the need to properly manage that. Should make that data available for reuse and recombination in new research contexts. Should establish proper trust relation between data creators, managers and users. Then there is the large variety and fragmentation preventing easy solutions – Data types – Design: data models, formats, semantics – Implementations: repository systems, tools, etc. – This diversity will probably be increasing

4 Collaborative Data Infrastructure Integrate existing data solutions from different communities A ‘common data model’ or ‘abstract architecture’ would facilitate this Will require considerable work and collaboration from all actors Most problems concerning heterogeneity and fragmentation are the interfacing between community services and common data services

5 Scope Many aspects to DAITF, but the core points are: Terminology synchronization between different participating communities Support for the scientific work flow incl. enriched publications Improving access and interoperability while accepting heterogeneous community solutions Long-term preservation and curation policies Trust framework for data creators, managers and users Abstract architecture that the different communities can relate to … including an analysis of the necessary primitives: data object, metadata, resource, identifier But only registered data: data that is visible and that some organization takes responsibility for

6 Abstract Architecture Needs a community driven discussion, but some guidance for harmonization could help (std. orgs?) Analyzed existing community solutions and general models and implementations: – Kahn/Wilensky DOA (CNRI), CMIS (OASIS), IRods (Data Grid) – DOBES/CLARIN DOA (Linguistics/Humanities),ENES DOA (Climate), EPOS (Earth Sciences) Need to increase this number: more research communities, more models as W3C a.o.

7 Possible DAITF Governance Structure Focus should be on discussions by experts not on governance, so start lightweight Comparing ISO, IETF, OASIS. IETF seems closest to a grass-roots approach we need However DAITF scope is more heterogeneous: different disciplines, funding organizations, global initiatives, software developers and vendors.

8 Next steps? Suggestion would be to first collect a core group of experts from a few domains, continue working on preparatory docs + forum discussions Should organize a series of well prepared workshops Starting Apr/May 2012 as planned by EUDAT & OpenAIREplus Communities Expert group From different communities world wide Understand & experienced with scientific workflow Participated in design & implementation of solutions Willing to abstract from their base Steering board Steering board Continuity Summarizing Initiating topics Encourage convergence Start special working groups elects ? ?

9 Thank you for your attention

10 Scientific workflow This is an essential use case that such an ‘abstract architecture’ should support Also where cooperation of projects as EUDAT and OpenAIRE(plus) will become most visible and needed analysis enrichment raw data and descriptions registration preservation citable publication temporary data raw data and descriptions registration preservation citable publication analysis enrichment temporary data traditional workflow eScience workflow referable &citable data

11 Non-EU Contacts CNRI (Bob Kahn, Larry Lannom): DOA, Handle System – Bob Kahn visited in August, some talks about CNRI DOA RENCI, National Climate Data Center in Renaissance Computing Institute; Ken Gallupi DICE, Data Intensive Cyber Environments; Reagan Moore (SRB, iRods) – Reagan Moore will visit Oct 6/7 for consultations with EUDAT – Still possible to invite a few other experts OAI: Herbert vd. Sompel DataONE (DataNet): Bill Michener – We will visit the DataONE hands-on meeting DataConservancy (DataNet): Sayeed Choudhury

12 EUDAT OpenAIRE datasets & metadata datasets & metadata publications data depositor data curator reviewer author editor API Identifiers for Actors (ORCID?) Identifiers for data & publications (HS, DOI, URN)

13 D… Access and Interoperability T… F.. Access Visibility (metadata), Access (AAI via Federated Identities) deposits (organizational guarantees), interpretability (syntax, semantics), preservation (copies), curation (adapt data to new standards & technology), verifiability (checksums), quality assessment (trust your archive ), policy framework covering also legal & ethical aspects (licences) Interoperability integration of data in a single virtual domain and support joint operations

14 DAITF workshops Two DAITF workshops should be planned Proposal is to have a preparatory OpenAIRE/EUDAT meeting in Jan/Feb 2012 to synchronize terminology, ideas and ambitions First DAITF workshop would then be in April/May 2012

15 US Cooperation PhD exchange, expert hosting Joint (data management) API development Joint development data management middleware: possibly (based on) iRods

16 EUDAT Consortium

17 The Data Conservancy is one of two initial awards through the National Science Foundation's DataNet Program. The Data Conservancy shares a common vision that data curation is not an end, but rather a means to provide persistent access to a variety of scientific data for addressing grand challenge research problems. In addition to the infrastructure development that lies at the core of the Data Conservancy, the project team is directly focusing on a semantic view of data and other forms of content as compound objects that describe a full picture of the scientific process. This presentation will feature an overview of the Data Conservancy with an emphasis on the data framework aspects of the project.

18 WP4 First Analysis Results Produce a set of Kahn’s Digital Object Architecture (1995, 2006) originatordepositorrepository Auser registered DO - data - metadata (Key-MD) handle generator PID property record rights type (from central registry) ROR flag mutable flag transaction record repository B work ownership data metadata (Key-MD) PID access rights hands-over requests deposits via RAP requests stores maintains receives disseminations via RAP replicates

19 WP4 First Analysis Results right level of abstraction? ignore DO content - analogue email DOBES Object Architecture (2002) originatordepositorrepository Auser registered DO - data - metadata - PID handle generator rights type (open vocabulary) ROR flag transaction record repository B work ownership data metadata access rights hands-overdeposits via LAMUS requests stores maintains receives disseminations via Apps replicates depositorrepository A user registered DO - data - metadata handle generator to come rights type (open vocabulary) transaction record data metadata access rights deposits via NETCDF requests to come stores maintains users build arbitrary virtual collections Virtual Collection Object - metadata - mutable flag - DOI stores data publication users register collections with publications ENES Object Architecture (2006)

20 WP4 DataONE / DataConservancy not too bad to see what others are doing  NSF DataNet initiative two research driven projects: DataONE, DataConservancy one horizontal project to come: DICE? DataONE: combination of biological (genome to ecosystem) and environmental (atmosphere, ecology, hydrology, oceanography) researchers (want to reach out) coordinating nodes (basic indexing, replication etc), member nodes with utilization software to interact with researchers and citizen scientists research institutions, libraries, information science, IT DataConservancy: combination of astronomy, earth sciences, life sciences, and anthropology infrastructure for data preservation & curation, capacity building, libraries as cornerstones for sustainability all based on data modeling, data management support and capacity building research institutions, libraries, information science, service providers,

21 WP4 Start simple - but that’s not the end HLEG: there will be NO ONE solution (technology, organization, etc) there will be heterogeneity and dynamics driven by research Trust/Usability: this is a sensitive issue in terms of risks, benefits, gain, etc. are we sensitive enough to connect with researchers workflows? No Stupid Questions “please say what you need and we will do it” no big questionnaires real progress is a matter of interaction, evolving ideas, potentials, sensitivity, changing research paradigms, etc. AND some sensitive and experienced people need to dare to take some risks like entrepreneurs and design simple start-up services

22 WP4 Startup Plan (first 6 months) Produce a set of important questions for the interviews 31. August Plan a first SAF meeting in second half November31. August make first round of interviews with core communities12. September Finish CDI understanding work and include all core communities15. September (this is inline with the DAITF DOA analysis work) Get a charter done for SAF (WP2/4)17. October Plan a user forum for January17. October Structured interviews with all core communities 31. October - covering data organization, architectures, standards, registries, etc. - covering wishes and requirements for use/service cases Create a Requirements Spec. Doc. for comm. requirements(4/5/6) 31. October Analyze interviews and iterate to fill gaps etc14. November - transform results into RSD (WP4/5/6) in parallel improve DAITF / DCI document 14. November Present & discuss all results in first SAF Meeting24. November - decide about first 2 or 3 service cases to implement at SAF meeting - consider WP5/6 technology watch in discussion/selection process Make a full plan/roadmap for selected service casesDecember Run a first User Forum extending to other EUDAT communitiesJanuary 12 Prepare a first DAITF workshopJanuary 12 Extending interviews and analysis to other EUDAT comm.Jan-March 12 Write Del. 4.1.1 - Analysis of CDIMarch 12

23 WP4 Update Meetings 22/23. SeptemberLyon Meeting (DAITF) 6/7. October 2011Meeting with Reagan Moore 12/13. Octobere-IRG Meeting (DAITF) 17/19. October 2011Hands-On Meeting with DataONE 17/18. November 2011SSH Meeting 24. November 2011first SAF Meeting 2/3. December 2011Meeting with Bill Michener DataONE January 12first User Forum March/April 2012first DAITF workshop Past Meetings 2. AugustMeeting with Bob Kahn 23. AugustWP4/5/6 Meeting Augustseveral EPIC PID Meetings


Download ppt "DAITF Data Access and Interoperability Task Force DAITF Preparation Group status report Daan Broeder EUDAT / TLA - MPI for Psycholinguistics."

Similar presentations


Ads by Google