Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop.

Similar presentations


Presentation on theme: "Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop."— Presentation transcript:

1 Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop

2 Content Domain & Scope Organizational embedding Further requirements Services for e-research with PIDs 2008 CNRI Handle System Workshop

3 Domain & Scope Reliable references & citations of web accessible resources Language resource domain –Audio & video recordings, pictures, primary texts, annotations –Lexica, grammar descriptions, … –Concepts in terminology registries and ontology's –… Number of resources very big, dependent on how you approach the granularity issue References and citations –embedded in (web) documents –In data structures –In DBs –… 2008 CNRI Handle System Workshop

4 CLARIN Common Language Resources and Technology Infrastructure The CLARIN project is a large-scale pan- European collaborative effort to create, coordinate and make language resources and technology available and readily useable. As one of its goals CLARIN will create a federation of LR repositories and aims to create a unified resource registry using persistent identifiers. 2008 CNRI Handle System Workshop

5 CLARIN Common Language Resources and Technology Infrastructure Preparatory phase 2008-2011 (Construction phase 2011-2020) European dimension (ICT FP7) –112 members from 35 countries, –Prep. Phase Funded with 4.2 ME National dimension: –Funding until now 6.5 ME, more to come –… 2008 CNRI Handle System Workshop

6 DAM-LR Distributed Access Management for Language Resources (Small 4 partners) European Project aimed at federation building in LR repository domain, 2005-2007 Unified metadata catalogue Identity federation using Shibboleth Single resource identifier system for all published resources using the Handle System 2008 CNRI Handle System Workshop

7 Developed special tools Mover –Updates Handle DB + catalogue –Updates metadata XML files* Restore operations –Recreate the Handle DB (and others) from scratch Lessons learned –Fed. Tech not for all organizations Lund archive R MPI archive R primary 1839 sec. 10050 primary 10050 INL archive R primary 10032 R R R R R sec. 10032 sec. 1839 DAM-LR HS infrastructure

8 User benefits

9 MPG Max-Planck Society Proposal within the MPG to support a MPG wide PID registration service based on the HS. Run by MPG computing center GWDG Will also give support for non-MPG German scientific organizations and (hopefully) CLARIN. 2008 CNRI Handle System Workshop

10 Requirements (Political) Independence: European GHR mirror & proxy + no single point of failure Wide(r) acceptance of PID scheme Support for object part addressing, from ISO TC37/SC4 CITER work. Support for (secure) management of resource copies 2008 CNRI Handle System Workshop

11 proxy MPG/CLARIN @GWDG MPI archive Class A R primary 1839 primary 1111.. Archive Class C R R R R CLARIN PID Infrastructure sec. … sec. … 1839/R1 GHR mirror 1111/R5 sec. 1839 PID registration service

12 PID Scheme Difficult to gain acceptance –Without PID syntax being official –W3C seems to have problems with anything else but HTTP (see recent XRI events) Can the HS user community help? Possibly only acceptance via urlified handles: http://hdl.handle.net/1039/R5http://hdl.handle.net/1039/R5 Perhaps follow ARK for elegance: –http://hdl.handle.net/hdl:/1039/R5http://hdl.handle.net/hdl:/1039/R5 2008 CNRI Handle System Workshop

13 A y x z Wasteful to issue a pid for each part (think of 100k entries in a lexicon). So use part identifiers. Resolver can make an adequate translation A#z -> objectA?part=z This requires enough flexibility from the resolver to accommodate the object server. The syntax of Z should be standard for the specific data type. Loan from existing fragment identifier syntax standards. 1839/A 1839/x 1839/y 1839/z 1839/A: + 1839/A#x, 1839/A#y, 1839/A#z pid resolver object server 1839/A#z http://oserver/objectA?part=z 1839/A http://oserver/objectA A y x z z 2008 CNRI Handle System Workshop PIDs & Resource Parts

14 Lund archive R MPI archive copy 10050/R -> http://lund/lund_url primary 1839 primary 10050 R What if MPI moves the resource copy? MPI should have wrt access to the Lund Handle record This would enable changing the Lund URL record too! -> http://mpi/mpi_url move LHS Access monitor MPI Manager R 2008 CNRI Handle System Workshop Resource duplicates

15 Lund archive R MPI archive R copy 10050/R -> http://lund/lund_url primary 1839 primary 10050 R indirect handles* TYPE = URL –IE-Plugin: ok. –HS proxy: not-ok TYPE = HS_ALIAS (problem*) –IE-Plugin: ok. –HS-Proxy ok Status of 1839/Rcpy handle? –Use in documents? -> hdl:1839/Rcpy 1839/Rcpy -> http://mpi/mpi_url MPI Manager move Resource duplicates 2008 CNRI Handle System Workshop

16 Possible Added PID Services Establishing resource authenticity Resource Collection Registration Resource Citation Information Lost Resource Detective … 2008 CNRI Handle System Workshop

17 Collection Registration Service Much scientific works depends on seemingly accidental distributed collections of material that has no independent embodiment. Needs to be citable with one single PID –encode the collections resource uris directly in a handle record –attach a link to a map of the collections uris Compare recent Aggregation Map concept from ORE 2008 CNRI Handle System Workshop

18 Citation Information Service (Collections of) resources need to be cited in documents. Acknowledgement & credit also important for primary scientific data E.g. Dutch Spoken Corpus, © Institute for Dutch Lexicography, …. Make this citation information part of the with the PID associated metadata. 2008 CNRI Handle System Workshop

19 Establishing Provenance If by accident the handle URI mapping was not properly maintained, special metadata could be available from the handle record to establish its location or find a copy. –URI history, Repository, Depositor, … Labor intensive Only for limited number of resources unless there is a pattern 2008 CNRI Handle System Workshop Lost Resource Detective

20 2008 CNRI Handle System Workshop The End

21 Integration it should be an optional extension Make sure HS is not SPF IMDI/LAT SW functions also without HS Issue handles for objects Only for local resources Need special tools Mover –Updates Handle DB + catalogue –Updates IMDI XML files* Restore operations –Recreate the Handle DB (and others) from scratch MPI1001# mpi_url 1839/087-D mpi_url LHS LAT webapps sync Handle DB catalogue mover IMDI harvester CC SSSSS C DAM-LR HS infrastructure


Download ppt "Towards a Persistent Identifier Infrastructure for European e-Research Daan Broeder CLARIN / MPG 2008 CNRI Handle System Workshop."

Similar presentations


Ads by Google