Presentation is loading. Please wait.

Presentation is loading. Please wait.

Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham

Similar presentations

Presentation on theme: "Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham"— Presentation transcript:

1 Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham

2 Background to OpenDOAR Created in 2005 – Lists over 2320 repositories (2013-07-02) Manually validated – High quality… – …but we didnt like to talk about the record counts Counts not updated after the initial entry – Unless prompted by users Fixed in 2012 – Record counts updated about every 2 weeks

3 Established counting methods Manual inspection – Labour-intensive Counting OAI-PMH record identifiers – Inefficient Handling big files Iterative – Unreliable File size limits and timeouts – Inaccurate Need to account for deleted records

4 How difficult can it be? SELECT COUNT(*) FROM repository; – Still fast even with added complexity – Statuses, Breakdown by date, etc. The number is often there on the web page – Headline number, or – x to y of z tally, or – Adding up numbers on a Browse by year page

5 OpenDOARs Strategy Avoid OAI-PMH whenever possible Use other m2m interfaces, if available/suitable Screen scrape numbers from web pages If all else fails, use manual methods Counts for full texts as well, where possible

6 Some examples…

7 Generic n records Documents avec texte intégral 229181

8 Generic x to y of z counters DSpace Browse Counter is a special case Showing results 1 to 20 of 6727

9 DSpace totalCnt Add-on NCKUR [40782/74662] [ / ] -

10 Generic Sum of List Counters EPrints count Browse List is a special case Add up the numbers in brackets

11 Number of items EPrints V.3 Counter

12 Generic Sum of Numbers Add up the numbers

13 Generic HTML tag counting Count item tags in HTML source code

14 Counting multiple pages Separate pages per letter, document type, etc Issues with Greenstone – lack of predictability

15 OAI-PMH ListIdentifiers: Simple http://... /oai?verb=ListIdentifiers&metadataPrefix=oai_dc Count these No resumptionToken

16 OAI-PMH ListIdentifiers: Iterative resumptionToken for blocks of identifiers 193114FUS

17 OAI-PMH completeListSize <resumptionToken completeListSize="89805" Bingo!

18 Twelve count harvesting methods Generic – Generic n records – Generic x to y of z counters – Generic Sum of List Counters – Generic HTML tag counting – Generic Sum of Numbers DSpace – DSpace Browse Counter – DSpace totalCnt Add-on EPrints – EPrints count Browse List – EPrints V.3 Counter OAI-PMH ListIdentifiers – Simple – Iterative – completeListSize Manual counting

19 Efficiency of the methods Iterative OAI-PMH so much slower

20 Relative Frequency of Methods

21 Ugent Numbers galore DSpace and EPrints Easily scrapeable counts

22 Count harvesting issues No counts visible or harvestable Static counts – often approx. – e.g. over 2m items Connectivity issues – Infrastructure limitations – e.g. heavy internet traffic – HTTP 401 (unauthorised) & 403 (forbidden) errors Data hidden in include files (e.g. JavaScript) – Not visible in View Source code No direct URL known for the pages with counts – Only accessible to human navigators Remodelled websites – requiring updated settings

23 Help OpenDOAR count your repository Display record counts on your home page – Using distinctive wording & highlighting – Ideally in or tags Ensure numbers can be seen in View Source code Ensure pages & files are not blocked to robots – Grant read-only access if necessary Implement OAI-PMH properly – Return ListIdentifiers in chunks – not one big file – Include completeListSize in the resumptionToken Tell us about any changes, so we can update settings

24 Ideas for the Future Comparing counts from OpenDOAR & ROAR – E.g. Nottm ePrints: 1,239 < 1,277 – E.g. HAL-Inserm: 7,498 > 2,773 OpenDOAR – Growth charts – Full text counts Extending OAI-PMH – Statistical features – Trial PSH

Download ppt "Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham"

Similar presentations

Ads by Google