Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programmatic Interaction with Open Access Repositories

Similar presentations


Presentation on theme: "Programmatic Interaction with Open Access Repositories"— Presentation transcript:

1 Programmatic Interaction with Open Access Repositories
Roberto Barbera and Carla Carrubba – University of Catania - Italy e-Research Summer Hackfest – Catania (Italy)

2 Introduction: definitions and context Part 2
Outline Part 1 Introduction: definitions and context Part 2 Manually resource upload by submit interface Programmatic interaction with an Open Access Repository using APIs for data Searching Downloading Uploading MARCXML tags overview Programmatic interaction with an Open Access Repository using the OAI-PMH-standard protocol Part 3 Get authorship of research products

3 Part 1

4 Concepts and definitions (Source: Wikipedia)
Open Access repositories are powered by Digital Asset Management Systems (DAMSes), which are “intertwined structures incorporating both software and hardware that take care of management tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval and distribution of digital assets” A digital asset in essence is “anything that exists in a binary format and comes with the right to use” “Types of digital assets include, but are not exclusive to, photography, logos, illustrations, animations, audio-visual media, presentations, spreadsheets, Word and/or PDF documents, data and a multitude of other digital formats and their respective metadata”

5 Some of the most common DAMSes
Home page License CKAN Free CONTENTdm Commercial Digibib Digital Commons Commercial (hosted service) DigiTool DiVA-Portal Free (hosted service) dLibra Drupal DSpace Earmas EPrints EQUELLA Repository ETD-db Fedora Fez Greenstone HAL Invenio Islandora/Fedora intraLibrary MyCoRe Open Repository OPUS PURE SciELO VITAL WEKO XooNIps Others, more business or social oriented, are listed at

6 Sci-GaIA Task 3.1: Support the creation of federated and interoperable Open Access Document and Data Repositories in Africa, compliant with EU and other international guidelines Planned activities: Identification of already existing Open Access Document and Data Repositories in the region and inclusion in web based directories such as OpenDOAR and the CHAIN-REDS Knowledge Base Promotion of the Open Access Initiative (OAI) standards and of the OpenAIRE guidelines to make contents (both papers and data) stored on the African repositories more discoverable, searchable and hence visible worldwide Federation, through the use of Linked Data standards and Semantic Web technologies, of African Open Access Document and Data Repositories and to make them accessible and searchable from a unique entry point included in the project website Feasibility study for the creation of a pilot service to issue Persistent Identifiers (PIDs) compliant with the Handle System to be associated to documents and data Provision of a ready-to-install-and-configure appliance to quickly build and populate Open Access Repositories compliant with OAI, OpenDOAR and OpeAIRE standards/guidelines

7 The Sci-GaIA Open Access Repository
Requirements: Open source Distributed under a free license Deployable on a local infrastructure (i.e., not a hosted service) Standard compliant Well supported Scalable, up to O(106) – O(107) resources (to begin with) Choice: Invenio (latest stable version: v Sci-GaIA add-ons) Motivations: Fully compliant with all most important library standards, e.g. DCMI, Marc21 and OAI-PMH; Co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and used as institutional repository by about 30 scientific institutions worldwide; INSPIRE, SCOAP3 and ZENODO (the OpenAIRE flagship archive) repositories are based on Invenio; The CERN Document Server operates since 2002 and manages about 1.3 million records; UNESCO and UEMOA are leading an initiative to create a virtual library based on Invenio in 8 African countries (Benin, Burkina Faso, Côte d’Ivoire, Guinea Bissau, Mali, Niger, Senegal and Togo).

8 The Sci-GaIA Open Access Repository (http://oar.sci-gaia.eu/)
authentication federated Resources can be: Manually uploaded Automatically harvested and ingested from external sources Sci-GaIA add-ons to Invenio: The possibility to mint DataCite Digital Object Identifiers (DOIs) and assign them to the records stored in the OAR If existing, direct links to the altmetrics of each of the records contained in the OAR The correct metadata structure and the right OAI-PMH endpoint configuration to make the OAR compliant with version 3.0 of the OpenAIRE Guidelines

9 Compliance with standards (Full conforming with Open Archive Initiative’s standards & registered as an OpenDOAR data provider)

10 The Knowledge Workflow
First ICT2015

11 Research packages

12 The Sci-GaIA OAR itself as a research package
6 clones of the Sci-GaIA OAR are being deployed, both in Africa and Europe

13 Part 2

14 Submit a resource

15

16

17 Image submit Item 1 Item 2

18

19

20

21 There are three kind of APIs you can use: XML API JSON API Python API
Programmatic Interaction with an (Invenio-based) Open Access Repository Search Engine API There are three kind of APIs you can use: XML API JSON API Python API

22 Programmatic Interaction
XML API Syntax: GET /search?param1=value1&param2=value2&param3=value3… Example: Get the first 10 records in XML format where jrec= jump to record ID (e.g. 1 for first hit) rg=records-in-group-of (e.g. 10 hits per page) of= output format (e.g. Xm for XML format)

23 Programmatic Interaction
XML API Set ‘jrec’ and ‘rg’ appropriately to paginate the output Example: Do not set “rg” to high – there is a server-wide safety limit for it

24 Programmatic Interaction
XML API Example: Get the first 10 records that contains the string “Sci-GaIA Winter School” in the title: where: p=pattern (e.g. your query) f= field to search within (e.g. “title”, “athors”..) Get a record from a given DOI Get all records uploaded from a given date (e.g ) to another given date (e.g. today) where d1=first date YYYY-mm-dd format d2=second date YYYY-mm-dd format

25 Output of : http://oar. sci-gaia. eu/search

26 Programmatic Interaction
JSON API You can ask for JSON output format “of=recjson” to obtain it Use the same parameters as XML API Example: Get a record from a DOI: Get all records uploaded from a given date (e.g ) to another given date (e.g. today): where d1=first date YYYY-mm-dd format d2=second date YYYY-mm-dd format

27 Output of : http://oar. sci-gaia. eu/search

28 Programmatic Interaction
JSON API Example: Get only the abstract, title and authors of resources: where ot=output tags (e.g. ‘’ to get all fields, ‘title’ to get titles only)

29 Programmatic Interaction
Python API Invenio Search Engine can be called from within your Python programs via both a high-level and low-level API interface. Use the same parameters as XML and JSON API To know more about Python, XML and JSON API visit this guide:

30 Programmatic Interaction
Download records We need: PUBLIC KEY PRIVATE KEY SIGNATURE Calculate signature: myquery= Signature=HMAC-SHA1(myquery,Private-Key) Provided by the system We have to calculate

31 Programmatic Interaction
Upload records We have to: Send an authorisathion request for your IP address to Create a MARCXML file as input (e.g. your_file.xml) Example: curl –T your_file.xml -A invenio_webupload -H “Content-Type: application/marcxml+xml” To know more about Upload:

32 YOUR_FILE.XML MARC format is the standard in the library world
<?xml version="1.0" encoding="UTF-8"?> <collection xmlns=" <record xmlns=" </record> </collection>

33 your_file.xml <?xml version="1.0" encoding="UTF-8"?>
<collection xmlns=" <record xmlns=" <datafield tag=“ " ind1=" " ind2=" "> <subfield code=“”></subfield> ….. </datafield> …… </record> </collection>

34 your_file.xml <?xml version="1.0" encoding="UTF-8"?>
<collection xmlns=" <record xmlns=" <datafield tag="024" ind1="7" ind2=" "> <subfield code="a">DOI identifier</subfield> <subfield code="2">Type of identifier</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">First author</subfield> <subfield code="v">Affiliation</subfield> <subfield code="w">Country</subfield> <subfield code="j">orcid</subfield> …… </record> </collection>

35 http://oar.sci-gaia.eu/help/admin/howto-marc your_file.xml
To know more about MARCXML tags

36 Programmatic interaction
Search Engine based on the OAI-PMH-standard protocol The Sci-GaIA OAR OAI-PMH endpoint is publicly available at: Get detailed information about the available sets: Get the list of Dublic Core records: Get a record from the oai-identifier:

37 Output of :

38 Part 3

39 “Who’s this science of?”
How to provide authorship to research products?

40 ORCID (www.orcid.org – becoming a “de facto” standard)
More than 2.2 million ORCID IDs so far

41 Digital Object Identifiers
Thanks to UNICT, the Sci-GaIA OAR has an official prefix of: Unlimited numbers of sub-prefixes/DOIs can be created/minted All records in the OAR can be “claimed” in the ORCID profiles of their authors

42 Authorship of research products with OAR and ORCID (www.orcid.org)

43 Altmetrics (www.altmetrics.com)
The Sci-GaIA OAR automatically links its records to their altmetrics

44 Thank you! sci-gaia.eu

45 Invenio (software and documentation) Marc 21 OAI-PMH ORCID
References DAMS introduction DataCite Dublin Core Invenio (software and documentation) Marc 21 OAI-PMH ORCID Sci-GaIA OAR Installation and configuration guide


Download ppt "Programmatic Interaction with Open Access Repositories"

Similar presentations


Ads by Google