Presentation is loading. Please wait.

Presentation is loading. Please wait.

Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham

Similar presentations


Presentation on theme: "Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham"— Presentation transcript:

1 Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham peter.millington@nottingham.ac.uk

2 http://www.opendoar.org/ Background to OpenDOAR Created in 2005 – Lists over 2320 repositories (2013-07-02) Manually validated – High quality… – …but we didnt like to talk about the record counts Counts not updated after the initial entry – Unless prompted by users Fixed in 2012 – Record counts updated about every 2 weeks

3 http://www.opendoar.org/ Established counting methods Manual inspection – Labour-intensive Counting OAI-PMH record identifiers – Inefficient Handling big files Iterative – Unreliable File size limits and timeouts – Inaccurate Need to account for deleted records

4 http://www.opendoar.org/ How difficult can it be? SELECT COUNT(*) FROM repository; – Still fast even with added complexity – Statuses, Breakdown by date, etc. The number is often there on the web page – Headline number, or – x to y of z tally, or – Adding up numbers on a Browse by year page

5 http://www.opendoar.org/ OpenDOARs Strategy Avoid OAI-PMH whenever possible Use other m2m interfaces, if available/suitable Screen scrape numbers from web pages If all else fails, use manual methods Counts for full texts as well, where possible

6 Some examples…

7 http://www.opendoar.org/ Generic n records Documents avec texte intégral 229181

8 http://www.opendoar.org/ Generic x to y of z counters DSpace Browse Counter is a special case Showing results 1 to 20 of 6727

9 DSpace totalCnt Add-on NCKUR [40782/74662] [ / ] -

10 Generic Sum of List Counters EPrints count Browse List is a special case Add up the numbers in brackets

11 Number of items EPrints V.3 Counter http://eprints.nonesuch.ac.uk/cgi/counter

12 Generic Sum of Numbers Add up the numbers

13 Generic HTML tag counting Count item tags in HTML source code

14 http://www.opendoar.org/ Counting multiple pages Separate pages per letter, document type, etc Issues with Greenstone – lack of predictability

15 OAI-PMH ListIdentifiers: Simple http://... /oai?verb=ListIdentifiers&metadataPrefix=oai_dc Count these No resumptionToken

16 OAI-PMH ListIdentifiers: Iterative resumptionToken for blocks of identifiers 193114FUS

17 OAI-PMH completeListSize { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/2/700137/slides/slide_17.jpg", "name": "OAI-PMH completeListSize

18 http://www.opendoar.org/ Twelve count harvesting methods Generic – Generic n records – Generic x to y of z counters – Generic Sum of List Counters – Generic HTML tag counting – Generic Sum of Numbers DSpace – DSpace Browse Counter – DSpace totalCnt Add-on EPrints – EPrints count Browse List – EPrints V.3 Counter OAI-PMH ListIdentifiers – Simple – Iterative – completeListSize Manual counting

19 Efficiency of the methods Iterative OAI-PMH so much slower

20 Relative Frequency of Methods

21 http://www.opendoar.org/ Ugent Numbers galore DSpace and EPrints Easily scrapeable counts

22 http://www.opendoar.org/ Count harvesting issues No counts visible or harvestable Static counts – often approx. – e.g. over 2m items Connectivity issues – Infrastructure limitations – e.g. heavy internet traffic – HTTP 401 (unauthorised) & 403 (forbidden) errors Data hidden in include files (e.g. JavaScript) – Not visible in View Source code No direct URL known for the pages with counts – Only accessible to human navigators Remodelled websites – requiring updated settings

23 http://www.opendoar.org/ Help OpenDOAR count your repository Display record counts on your home page – Using distinctive wording & highlighting – Ideally in or tags Ensure numbers can be seen in View Source code Ensure pages & files are not blocked to robots – Grant read-only access if necessary Implement OAI-PMH properly – Return ListIdentifiers in chunks – not one big file – Include completeListSize in the resumptionToken Tell us about any changes, so we can update settings

24 http://www.opendoar.org/ Ideas for the Future Comparing counts from OpenDOAR & ROAR – E.g. Nottm ePrints: 1,239 < 1,277 – E.g. HAL-Inserm: 7,498 > 2,773 OpenDOAR – Growth charts – Full text counts Extending OAI-PMH – Statistical features – Trial PSH


Download ppt "Counting on OpenDOAR Peter Millington SHERPA Technical Development Officer CRC, University of Nottingham"

Similar presentations


Ads by Google