Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices.

Similar presentations


Presentation on theme: "Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices."— Presentation transcript:

1 Data Fabric IG Use Case Analysis

2 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

3 3 Data Practices I (120 interviews etc.)

4 4 Data Practices II – EUDAT federation Community Centers Common Data Centers projects to push limits and raise awareness

5 5 Data Practices II – split of functions  physical layer operations are trivial – know how to do it  “logical layer” operations are complex due to relations, etc.  all LL information needs to be aggregated and we need to have a secure access layer around it

6 6 Data Fabric Analysis how to come to essential components & services? Analyze Use Cases

7 7 10 (+5) Use Cases so far (2 in development, others mature) environmental sciencenatural sciencelife sciencehumanities, soc. sciencesIT, various all indicated nodes are centers of national, regional and even worldwide federations

8 8 10 (+5) Use Cases so far (2 in development, others mature) all indicated nodes are centers of national, regional and even worldwide federations NameInstitutestate 1Language ArchiveMax Planck Institute NLin operation 2Geodata Sharing PlatformAcademy of ChinaIn operation 3Datanet Federation ConcortiumRENCI USIn operation 4ADCIRC Storm ForcastingRENCI USIn operation 5EPOS Plate ObservationINGV/CINECA ItalyIn operation 6ENVRI Environment ObservationU Helsinki, FinlandIn design 7Nanoscopy Repository Cell structuresKIT, GermanyIn design 8Human Brain NeuroinformaticsEPFL Switzerlandin testing 9ENES Climate ModelingDKRZ GermanyIn operation 10LIGO Gravitation PhysicsNCSA USIn operation 11ECRIN Medical Trial InteroperationU Düsseldorf GermanyIn testing 12VPH Physiology SimulationU London UKIn operation 13Species ArchiveNature Museum GermanyIn operation 14International NeuroI FacilityINCF SwedenIn operation 15Molecular GeneticsMPI GermanyIn operation

9 9 10 (+5) Use Cases so far (2 in development, others mature) all indicated nodes are centers of national, regional and even worldwide federations NameInstitutestate 1Language ArchiveMax Planck Institute NLin operation 2Geodata Sharing PlatformAcademy of ChinaIn operation 3Datanet Federation ConcortiumRENCI USIn operation 4ADCIRC Storm ForcastingRENCI USIn operation 5EPOS Plate ObservationINGV/CINECA ItalyIn operation 6ENVRI Environment ObservationU Helsinki, FinlandIn design 7Nanoscopy Repository Cell structuresKIT, GermanyIn design 8Human Brain NeuroinformaticsEPFL Switzerlandin testing 9ENES Climate ModelingDKRZ GermanyIn operation 10LIGO Gravitation PhysicsNCSA USIn operation 11ECRIN Medical Trial InteroperationU Düsseldorf GermanyIn testing 12VPH Physiology SimulationU London UKIn operation 13Species ArchiveNature Museum GermanyIn operation 14International NeuroI FacilityINCF SwedenIn operation 15Molecular GeneticsMPI GermanyIn operation a few side remarks these are all federated approaches some have various use cases (one selected) 3 is more of an IT framework applied by many description of state very vague indication 5 marked red need another round of interaction

10 10 Issues of Relevance sensors simulations crowd etc. PID, Metadata Rights Syntax, Types Semantics Relations FS, Cloud, DB Repository System virtual collection builder management, analytics, conversion provenance – reproducibility workflows, policies, deployment new collection new metadata temp store highly distributed in federations AAI/FIM

11 11 How do WGs/IGs fit? CITDD PROVBROK CERT BDA REP REPRO DMP DOM FIM PP

12 12  domain of registered digital objects (DO) incl. basic organization principles (data, code, knowledge) -> worldwide PID system (Handles/DOI)  domain of registered actors -> worldwide ID system (ORCID)  domain of trusted repositories for DOs -> worldwide Rep Registry  proper DFT/DSA/WDS compliant repository systems  accepted policy commons (proper organization support, self-documenting, tested/certified, etc.) -> policy component registry  policy/services -> service registry  authentication system -> various in place (ORCID just number)  authorization system -> authorization registry Components I

13 13  MD components/schemas -> metadata schema registry  data types /schemas/formats -> data type registry  semantic categories -> category registry  vocabularies -> vocabulary registry  what about complex ontologies (thesauri, ontologies, etc.)  what about mapping relations? Components II

14 14  MD components/schemas -> metadata schema registry  data types /schemas/formats -> data type registry  semantic categories -> category registry  vocabularies -> vocabulary registry  what about complex ontologies (thesauri, ontologies, etc.)  what about mapping relations? Components II much already out there but...... why does it cost months to federate and integrate data to make data interoperable... need to harmonize, raise trust & value... make it ready for machines

15 15  4 use cases (max 10 min) with the following goals  understand whether we get what we want to get (common components/services)  discuss whether we need to adapt the template  Zhu  Dieter  Sean  Giuseppe  Ed  discuss how to move on with use cases & analysis  discuss my first look on C/S (?)  update of existing and appearance on wiki (deadline)  deadline for first round (when, whom to motivate, ?)  virtual meeting for a discussion on analysis (when?)  at P6 (September) a first document with analysis What to do today

16 16 Did we forget something?

17 17 Data Practices I – Survey  ~120 Interviews/Interactions  2 Workshops with Leading Scientists (EU, US)  too much manual or via ad hoc scripts  too much in Legacy formats (no PID & MD)  there are lighthouse projects etc. but...  DM and DP not efficient and too expensive (Biologist for 75% of his time data manager)  federating data incl. logical information much too expensive  hardly usage of automated workflows and lack of reproducibility

18 18 Data Practices I – Survey  ~120 Interviews/Interactions  2 Workshops with Leading Scientists (EU, US)  too much manual or via ad hoc scripts  too much in Legacy formats (no PID & MD)  there are lighthouse projects etc. but...  DM and DP not efficient and too expensive (Biologist for 75% of his time data manager)  federating data incl. logical information much too expensive  hardly usage of automated workflows and lack of reproducibility is DI research only available for Power-Institutes pressure towards DI research is high, but only some departments are fit for the challenges Senior Researchers: can’t continue like this! need to move towards proper data organization and automated workflows is evident but changes now are risky: lack of trained experts, guidelines and support


Download ppt "Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices."

Similar presentations


Ads by Google