Presentation is loading. Please wait.

Presentation is loading. Please wait.

The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999.

Similar presentations


Presentation on theme: "The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999."— Presentation transcript:

1 the UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999

2 description project the UPS protoproto demo the data exchange framework dex

3 projectwhy a protoproto? UPS: enable cross-archive end-user services protoproto: –facilitate discussions –identify issues involved in creating cross-archive services –experiment with digital object concepts for archive material –does not claim to be a solution protoproto is multi-disciplinary –a special instance of cross-archive –there is a market –promotional value

4 projectwho? coordination: herbert van de sompel, michael nelson, thomas krichel involvement of: –Old Dominion U & NASA Langley –U of Surrey –U of Ghent –Los Alamos National Laboratory - Library –Russian Academy of Science - Siberian branch

5 projectsponsors Los Alamos National Laboratory - Research Library JISC eLib WoPEc project

6 projectdatasets –metadata only –full text remains at archives –static dumps obtained ca. July 99 the arXiv CogPrints NACA NCSTRL NDLTD RePEc Total objects 85,223 742 3,036 29,184 1,590 73,367 193,142 full-text 85,223 659 3,036 9,084 951 13,582 112,535 !organization 17,983 14 100 93 1 2,453

7 projectmetadata formats the arXiv CogPrints NACA NCSTRL NDLTD RePEc format internal Refer RFC1807 MARC ReDIF

8 Getting metadata out of archives –not all archives support metadata extraction some archives have undocumented metadata extraction procedures –not all archives support rich criteria for extraction single dump concept only Intellectual property and use rights not always clear projectmetadata extraction

9 Metadata has problems with: –record duplication –crucial missing fields –internal errors –ambiguous references to people and places, publications projectmetadata quality

10 projectmetadata conversion data enhancements: creation of unique identifier addition of raw subject-classification normalization of publication types all datasets converted to ReDIF: essential to have a single fomat for the creation of services supply by archives in a single format was not realistic no downgrading of data

11 projectre-creation of archives creation of archives for ReDIF-ed metadata using intelligent digital objects : “buckets” arXiv RePEc NCSTRL

12 Buckets were chosen to study the implications of using rich, intelligent objects in UPS Buckets are: –DL protocol / system independent –self-contained and mobile –handle their own display, enforcement of terms and conditions, and dissemination of their contents –designed for bundling multiple data representations and data instance types The aggregative nature of buckets is well suited for adding valued-added services at the object level projectbuckets

13 projectcreation of end-user service NCSTRL+ digital library service indexing buckets in archives by requesting their metadata enhanced user-interface NCSTRL+ search results point at buckets buckets auto-display buckets provide link to full-text in native archive

14 UPS contains 193K objects –using buckets consumed inodes (~60 inodes per bucket) filesystem reformatted with more generous amount of inodes –Solaris and Dienst conflict Dienst wants each object in an publishing authority to be in a single directory Solaris has a hard limit of 32K objects in a directory resolution: use many (100+) authorities for UPS projectscaling problems

15 projectaddition of linking service integrate the archives with the traditional communication mechanism context-sensitive linking to deliver extended services via SFX technology

16 projectSFX linking service metadata evaluate metadata extended services system A system B

17 projectSFX linking database

18 buckets for arXiv, NCSTRL and RePEc are SFX- aware Cogprints, NACA, NDLTD not SFX-aware SLAC/SPIRES is SFX-aware linking services for preprint metadata + for published version projectaddition of linking service

19 demothe UPS protoproto http://ups.cs.odu.edu:8000/ will be available starting beginning of November UPS list will be notified disclaimer “not a production system” http://ups.cs.odu.edu

20 dexsome issues (I) data exchange framework data provision vs. data implementation central searching, distributed archives need for a framework by which archives can describe themselves: content terms and conditions protocols, criteria supported to extract (meta)data metadata scheme, subject classification scheme, material-type scheme,...

21 need for an identifier scheme for archives and archive objects (cf. ISSN, ISBN, DOI) metadata quality obstructs the creation of services desirabile to extend metadata with citation information smart objects archived objects that are active, not passsive dexsome issues (II)

22 Providing data: –publishing into an archive –providing methods for metadata “harvesting” provide non-technical context for sharing information also Implementing Data: –harvest metadata from providers –implement user interface to data Even if provided by the same DL, these are distinct functions dexproviding vs. implementing data

23 Provider Input interface Native end-user interface Provider Input interface Native end-user interface Native harvesting interface No machine based way to extract metadata… Machine and user interfaces for extracting metadata…. dexproviding vs. implementing data

24 Provider Input interface Native harvesting interface Provider Input interface Native end-user interface Native harvesting interface Implementor Native end-user interface Input and harvesting interfaces optional Native end-user interface optional (e.g., RePEc) dexproviding vs. implementing data

25 Much of the learning about the constituent UPS archives occurred out of band… Given an unknown archive, we should be able to algorithmically determine the archive’s metadata... Provider Input interface Native end-user interface Native harvesting interface Where possible, the harvesting interface should provide the same criteria as the end-user interface dexself-describing archives

26 Recommended criteria for metadata extraction: –subject classification –accession date –publication date Criteria for archive description –metadata formats employed –contact information for archive –publication type scheme –identifier scheme –subject classification scheme dexself-describing archives

27 Useful in: –reference linking –can be used in citations –resolving duplications UPS duplications were removed by hand –tracking publication lifecycle Need the ability for an object to have multiple unique identifiers –organization, discipline, etc. dexidentifiers

28 Premise: Objects are more important than the archives that hold them SODA: Smart Objects, Dumb Archives Objects should be the canonical authority for metadata contents use Objects should be able to grow and change correct metadata add new formats add new services reflect the lifecycle of the object dexsmart objects

29 It would be beneficial if the archived objects could be heterogenous: with their own “look-and-feel” unique functionality / services –e.g., the data archiving needs of an atmospheric scientist can be different than that of a computer scientist, engineer or medical researcher yet maintained a standard API for: extracting metadata content retrieval resource discovery on the object terms and conditions dexsmart objects

30 A strong distinction between the provision of data, and the implementation of data –also, a socio-legal context for sharing metadata Open, “self-describing” archives A universal, unique identifier name space Archived objects with more intelligence and flexibility dexlessons learned


Download ppt "The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999."

Similar presentations


Ads by Google