Presentation is loading. Please wait.

Presentation is loading. Please wait.

Subject Repositories European collaboration in the international context 28-29 January 2010 Workshop Technical infrastructure & interoperability Benoit.

Similar presentations


Presentation on theme: "Subject Repositories European collaboration in the international context 28-29 January 2010 Workshop Technical infrastructure & interoperability Benoit."— Presentation transcript:

1 Subject Repositories European collaboration in the international context 28-29 January 2010 Workshop Technical infrastructure & interoperability Benoit Pauwels Université Libre de Bruxelles, Belgium 1

2 2 Theme 1: The Economists Online network of data providers General infrastructure of the EO solution DIDL/MODS: the EO metadata exchange format RDF/XML Admin file: decentralized administration Enrichment of metadata Theme 2: Economists Online and RePEc Pulling metadata from RePEc Pushing metadata to RePEc Contribute to LogEC Use CitEC Workshop plan

3 3 Theme (45’) Introduction (BP, 20’) 3 topics for brainstorming (breakout groups,10’) Breakout groups reporting back (all, 15’) Workshop plan

4 4 Theme 1: The Economists Online network of data providers General infrastructure of the EO solution DIDL/MODS: the EO metadata exchange format RDF/XML Admin file: decentralized administration Enrichment of metadata The Economists Online network of data providers

5 Meresco Metadata Harvester Objects HTTP Crawler Metadata Lucene EO portal Homemade - FOSS Exporter engine Homemade - FOSS Logs OAI-PMH RSS Other portals SRU RePEc

6 Meresco Metadata Harvester Objects HTTP Crawler Metadata Lucene EO portal Homemade - FOSS Exporter engine Homemade - FOSS Logs OAI-PMH RSS Other portals SRU RePEc Metadata exchange format DIDL / MODS NEEO specs Usage metadata exchange format SWUP OFI Comm Profile

7 7 Technical decisions Desired EO functionalityTechnical decision Facetted search&find experienceNormalized/normalizable metadata APA formatted citationsGranular metadata Publication list per authorUnambiguous identification of authors Full text indexing/searchingUnambiguous links to full texts Enrichment of metadata (JEL, datasets, citations, ReDIF) Extensible metadata format

8 8 XML container structure that can hold semantically distinct metadata descriptive metadata object files (by-ref) splash page enriched metadata JEL full text (by-ref) datasets (by-ref) [ references ] RePEc handle and metadata (by-ref)  DIDL Based on existing container structure defined by SurfShare “info:eu-repo” vocabularies (objectfile accessRights, version,...) Metadata exchange format

9 9 Granular descriptive metadata  MODS (3.2) Based on existing metadata structure defined by SurfShare “info:eu-repo” vocabularies (publication type, Unambiguous identification of authors  DAI – Digital Author Identifier National or institution-unique persistent identifier Solutions not specific to the NEEO project; continuous aim of standardization at a level that surpasses the project Metadata exchange format

10 DIDL[1] Item[1] Descriptor/Identifier (persistent identifier) Item[1..∞] (of type descriptiveMetadata) Descriptor/type (« descriptiveMetadata ») Component/Resource -- representation by value (XML) Item[0..∞] (of type objectFile) Component/Resource -- representation by ref. (URL) Descriptor/modified Descriptor/Identifier (persistent identifier) Descriptor/modified Descriptor/type (« objectFile ») Descriptor/Identifier (persistent identifier) Descriptor/modified Item[0..1] (of type humanStartPage) Component/Resource -- representation by ref. (URL) Descriptor/type (« humanStartPage ») EO Data model Publication is described as a complex (compound) object –persistent identifier Aggregation of 3 types of components –descriptiveMetadata (MODS) –objectFiles –humanStartPage Extensible –additional items can be stored within the complex object MODS –contains Digital Author Identifier (DAI) of EO author

11 11 Implementations in NEEO DIDL application profile MODS application profile Vocabularies in DIDL and MODS Technical guidelines for project partners Solutions: home-made or with external support ARNO: home-made Dspace: home-made, AtMire Eprints: home-made, ECS-University Of Southampton Fedora: METS/MODS -> DIDL/MODS DigiTool: METS/MARC -> DIDL/MODS Metadata exchange format

12 12 XML-RDF file FOAF + NEEO-specific vocabulary maintained by each data provider on a local web server information of institution : name, description,... OAI baseURL + OAI sets to harvest EO authors: photograph, full name, affiliation, DAI HTTP get and validated by EO Gateway at regular intervals Automated harvesting process Made visible through portal New partner Create admin file Ask for registration at economistsonline@uvt.nl, declaring location and validating admin fileeconomistsonline@uvt.nl If valid, you’re in Decentralized registry service

13 Meresco Metadata Harvester Objects HTTP Crawler Metadata Lucene EO portal Homemade - FOSS Exporter engine Homemade - FOSS Logs OAI-PMH RSS Other portals SRU RePEc

14 Meresco Metadata Harvester Objects HTTP Crawler Metadata Lucene EO portal Homemade - FOSS Exporter engine Homemade - FOSS Logs OAI-PMH RSS/Atom Other portals SRU RePEc SRU Enrichment service OAI-PMH

15 15 “Automated” enrichment – JEL, full-text 1.ES gets records to be enriched from EO, over SRU 1.Based on date of request for enrichment of certain type and version 2.Based on flag set in EO record 2.ES creates enrichment record(s) 3.ES makes enrichment records available to EO, over OAI-PMH 4.EO harvests enrichment records from ES and integrates into original record 5.EO reuses enrichment information in its services: index & present “Manual” enrichment – datasets 1)Partner enters permalink of publication on DVN platform 2)EO PMH-harvests DDI from DVN, and stores by-ref information Metadata enrichment

16 DIDL[1] Item[1] Descriptor/Identifier (persistent identifier) Item[1..∞] (of type descriptiveMetadata) Item[0..∞] (of type objectFile) Descriptor/modified Item[0..1] (of type humanStartPage) Item[0..∞] (of type text) Item[0..∞] (of type enrichedMetadata) Item[0..∞] (of type dataset) EOIR / ES PDF HTML TXT Item[0..∞] (of type review) Dataset DDI Review Descriptor/Identifier (persistent identifier) Item[1..∞] (of type descriptiveMetadata) Item[0..∞] (of type objectFile) Descriptor/modified Enriched publication LinkedData / SemanticWeb / ORE ready

17 17 » BO Group 1: DIDL/MODS » Scalable? Implementation by 100s of partners » Local experiences from existing partners: implementation issues you want to share? » Can this become a standard for exchange of metadata of IR contained publications? Where does this stand next to (flavours of) DC, SWAP,...? » BO Group 2: XML Admin file » Scalable? Implementation by 100s of partners » Local experiences from existing partners: implementation issues you want to share? » DAI? » BO Group 3: Enrichment model » Extensibility: vocabulary for semantics of components » Manual enrichment: need for enriched submission form, making it easy for people to make enriched publications » Automated (JEL, full text): sustainable? Theme 1: The Economists Online network of data providers

18 18 Theme 2: Economists Online and RePEc Pulling metadata from RePEc Pushing metadata to RePEc Contribute to LogEc Use CitEc Workshop plan

19 19 RePEc archives contain RePEc series contain Working papers, Articles, Books, Book chapters, Software Manually maintained by research centres, journal publishers, university departments all over the world +/- 900 archives, more than 4000 series ReDIF metadata format Network accessible over FTP or HTTP Aggregation by RePEc services: EconPapers IDEAS Central PMH-accessible aggregated archive of AMF formatted metadata RePEc model

20 20 Template-type: ReDIF-Paper 1.0 Author-Name: Capron, Henri Author-Email: hcapron@ulb.ac.be Author-Name: Meeusen, Wim Author-Email: wim.meeusen@ua.ac.be Author-Name: Dumont, Michel Author-Person: pdu51 Author-Name: Cincera, Michele Author-Person: pci5 Title: National innovation systems: pilot study of the Belgian innovation system Creation-Date: 1998 Publication-Status: Published as a report for the Belgian Federal Office for Scientific, Technical and Cultural Affairs (OSTC) File-URL: http://bib17.ulb.ac.be:8080/dspace/bitstream/2013/941/1/mc- 0048.pdf File-Format: application/pdf Handle: RePEc:dul:ecoulb:2013-941 RePEc model

21 21 Very similar BUT RePEc model: Harvests only from “official” publisher repositories Therefore: 1 work exists once in RePEc and it is guaranteed the one and only “official” manifestation of the work IR model: holds publications for which institution is typically not the publisher 1 work  1 official manifestation + multiple author manifestations one work can exist in: o one or more repositories o as different publication types o with different descriptive metadata o with different object files attached o with different object file metadata  Pushing and pulling metadata records from RePEc and IR into one system is bound to raise problems RePEc model compared to IR model

22 22 EO harvests AMF formatted metadata records from http://oai.repec.openlib.org/ http://oai.repec.openlib.org/ Overlap !! Same records are harvested from IR and RePEc Solution: XML Admin file contains directive Permits to specify which RePEc series do not need to be harvested from RePEc, since already delivered through IR BUT: IR contains articles produced by its authors These articles are contained in a journal RePEc series Overlap in EO cannot be avoided Pull metadata from RePEc

23 23 EO sets up “RePEc:ner” archive, containing ReDIF-X formatted records ReDIF-X All records are delivered as “ReDIF-Paper”, but with extra fields denoting the “real” publication status and version of text Overlap !! Most institutions already maintain RePEc series: these records must not be pushed by EO XML Admin file controls which series to feed in this “ner” archive boolean: to feed or not to feed If not given: all records with fulltext that are not working papers are mapped to one series for that institution RePEc series  OAI setspec of DIDL/MODS record BUT IR inherent problem of multiple copies/versions is pushed to RePEc Push metadata to RePEc

24 24 Template-type: ReDIF-Paper 1.0 Title: Block investments and the race for corporate control in Belgium Author-Name: Chapelle, Ariane Language: en Note: info:eu-repo/semantics/published X-PublishedAs-Type: article X-PublishedAs-Article-Year: 2004 X-PublishedAs-Article-Journal: Corporate Ownership & Control X-PublishedAs-Article-Volume: 2 X-PublishedAs-Article-Issue: 1 Order-URL: http://dipot.ulb.ac.be:8080/dspace/handle/2013/9943 File-URL: http://dipot.ulb.ac.be:8080/dspace/bitstream/2013/9943/1/ac-0007.pdf File-Format: application/pdf File-Version: authorVersion Handle: RePEc:ulb:ecoulb:2013/9943 Push metadata to RePEc: ReDIF-X

25 25 LogEc Aim: track abstract views and download clicks of publications presented through RePEc services (EconPapers, IDEAS,... Economists Online) NOT: tracking of usage at the level of the archives Downloads of publications contained in RePEc archives, initiated through a Google user do not show up in LogEc How: EO logs clicks abstract views and download clicks of object files On a monthly basis, EO transforms these log entries into requested LogEc format, using “rstat.pl” 2009-10 EconomistsOnline RePEc:aah:aarhec:1987-21 a: 65.55.207.69 66.235.124.10 d: 66.235.124.10 RePEc handle of publication is necessary  EO partners delivering content to RePEc directly (and that EO therefore doesn’t harvest from RePEc but from the IR) must include the RePEc handle in the DIDL/MODS record

26 26 LogEc DIDL[1] Item[1] Descriptor/Identifier (persistent identifier) Item[1..∞] (of type descriptiveMetadata) Item[0..∞] (of type objectFile) Descriptor/modified Item[0..1] (of type humanStartPage) Item[0..∞] (of type descriptiveMetadata) EORePEc RePEc handle Descriptor/modified byRef RePEc (AMF metadata)

27 27 CitEc Aim: citation analysis for RePEc publications How: Analyze text: extract and parse list of references from publications References are checked whether available in RePEc Cites: references to other RePEc publications Textual references CitedBy Co-citations EO publications (from our IRs) are pushed to RePEc and are therefore pulled through the CitEc processing EO has access to the resulting CitEc data, and presents this through the EO portal (not yet, will be in Feb 2010) RePEc handle of publication is necessary  EO partners delivering content to RePEc directly (and that EO therefore doesn’t harvest from RePEc but from the IR) must include the RePEc handle in the DIDL/MODS record

28 28 » BO Group 1 : Push/pull to/from RePEc » ReDIF-X data structure » Duplicates; different versions of identical publication » BO Group 2: Publishing models » Advantages/disadvantages of RePEc publishing model as opposed to IR publishing model » Push the two models together? Do we need to foresee specific services in the gateway or portal to make these two live together in peace? » BO Group 3: Future RePEc/EO services » What services should EO and RePEc jointly be looking at in the future in the interest of the economics researcher ? Theme 2: Economists Online and RePEc


Download ppt "Subject Repositories European collaboration in the international context 28-29 January 2010 Workshop Technical infrastructure & interoperability Benoit."

Similar presentations


Ads by Google