Presentation is loading. Please wait.

Presentation is loading. Please wait.

Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,

Similar presentations


Presentation on theme: "Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,"— Presentation transcript:

1 Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen, The Netherlands

2 scope of workshop clear focus on technology and architecture issues for preservation and access many other issues not in focus although relevant IPR, license issues only partially quality of data & metadata certification (RAC, DSA, etc) AAI cost aspects etc. let's have interactive presentations should be able to extract essentials

3 Definitions?

4 so simple repository

5 - orange - 2010 - plum - 2010 - pear - 2010 - apple - 2010 + Metadata repository metadata registry ? ? dangerous since physical paths may change etc

6 - orange - 2010 - plum - 2010 - pear - 2010 - apple - 2010 + replication due to preservation repository metadata registry repository ? ? dangerous since metadata records can be re-used metadata should be stable transfer at physical level

7 - orange - 2010 - plum - 2010 - pear - 2010 - appel - 2010 + replication and PIDs repository metadata registry repository - PID4 - 2010 - PID3 - 2010 - PID2 - 2010 - PID1 - URL1 - URL 2 PID registry ? dangerous: another indirection layer transfer at physical level access possible which rights? same access rights

8 - orange - 2010 - plum - 2010 - pear - 2010 what about collections repository metadata registry repository - PID4 - 2010 - PID3 - 2010 - PID2 - 2010 - PID1 - URL1 - URL 2 PID registry transfer at physical level - collection - 2010 - appel - 2010 - PIDx - URL PS: collections are dynamic

9 topic of high relevance ESFRI Task Force on Repositories (report) e-IRG/ESFRI Task Force on Data Management (report) Blue Ribbon Task Force on Sustainable Digital Preservation and Access (report) EC High Level Expert Group on Scientific Data (report) ASIS&T Summit Phoenix on Research Data and Access (slides & summary) T. Hey et al. The Fourth Paradigm: Data-Intensive Scientific Discovery (book)

10 summarizing the challenges how to manage the data Tsunami maintain data visibility preserve the data (just seen one solution) protect the data integrity ensure that we get the object we wanted guarantee data authenticity (how to present) maintain context and provenance information protect privacy and rights in complex data world maintain trust in data federate repositories to (virtually) integrate data achieve (partial) interoperability exploit distributed data without copying

11 speaking about metadata harvesting Access Data With Extraction and Analysis, Through Catalog Direct to Partner Sites View Information on Data Through Catalog Link to Data at Partner Site Search Shared Catalog Data Mirror Metadata Catalog Harvester Online Catalog Online Analysis

12 speaking about architectures

13 speaking about federations

14

15

16 general configuration repository A - architecture - rights domain - access paths - etc. mirror repository X - architecture - rights domain - access paths - etc. adapter(s) repository B - architecture - rights domain - access paths - etc. adapters repository C - architecture - rights domain - access paths - etc. adapters mirror repository Y - architecture - rights domain - access paths - etc. adapters mirror repository Z - architecture - rights domain - access paths - etc. adapters can be special does not scale

17 general configuration repository A - architecture - rights domain - access paths - etc. mirror repository X - architecture - rights domain - access paths - etc. API repository B - architecture - rights domain - access paths - etc. API repository C - architecture - rights domain - access paths - etc. API mirror repository Y - architecture - rights domain - access paths - etc. API mirror repository Z - architecture - rights domain - access paths - etc. API replication layer

18 generic HLEG figure Data generators Users Common Data Services Community Support Services Data Curation User functionalities Data capture & transfer Virtual Research Environments Data discovery & navigation Workflow generation Annotation, Interpretability Safe & persistent storage Identifiers, Authenticity, Workflow execution, Mining Trust

19 requirements for intermediate layer needs to cope with large diversity of solutions and architectures may only minimally interfere with local repository solutions (too much has been invested along community traditions) needs to respect rights domains and preserve access rights needs to be transparent to proven utilization mechanisms needs to operate at logical level (canonical collections) needs to scale with number of (community) data centers only one way to go: separate functionality into independent components (data, metadata, PIDs, etc) specify proper interfaces (of course)

20 requirements for layer how to manage procedures/workflows in complex landscape how to assess quality and correctness of all workflows how to maintain provenance information only one way to go make use of an easy-to-interpret declarative language establish proper "policy rules on all levels" map these rules to robust and proven activities separate declarative language from interpretation engine iRODS is an attempt in this direction respect to Reagan Moore and his team at MPI since some years such a declarative language to manage access rights for the million objects which need to be treated individually and which are part of collections

21 Reagan's data environments moving not bytes but collections need to maintain integrity of collections (incl. relations) collections are assembled for a certain purpose collections have properties to ensure their purpose policies ensure maintenance of properties procedures implement policies procedures result in state information assessment step to validate state purpose, properties, policies, procedures, state info

22 program - 1st part Larry Lannom (CNRI): about a digital object architecture Alex Wade (MS): approach from MS Malte Dreyer: thoughts about generic API John Kennedy: heterogeneity of repositories in DEISA Ken Galluppi: federating several repositories Willem Elbers: federation tests with iRODS Jean-Yves Nief: iRODS in professional use Peter and Johannes: summary + discussion

23 utilization challenge utilization software may not be affected by replication utilization software should also make use of copies any replication solution needs to demonstrate this !!!! existing utilization software

24 work spaces and profiles users want to store data protect data share data enrich data change data etc. data is somewhere in this complex domain users want transparent access how to get this done? profiles attributes quotas etc

25 processing chains - specification data metadata registries tool metadata registries dataoperationdata*operation workflow specification framework this is very discipline specific - various possibilities curation/annotation/enrichment/visualization pipelines, etc

26 processing chains - execution workflow execution framework

27 the challenges large amounts of data is at mirroring repositories let's execute operations on the mirroring sites how to easily deploy operators how to inform execution environment about invocation way how to let them act on the user's behalf etc

28 program - 2nd part SARA colleagues: workspace in NL Morris Riedel (FZJ): workspace ideas Johannes & John (RZG): operational aspects Thomas & Erhard (U Tübingen): WebLicht example Mike Papazoglou (U Tilburg): generic SOA aspects Peter: wrap up and discussion

29 thanks for the attention


Download ppt "Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,"

Similar presentations


Ads by Google