Presentation is loading. Please wait.

Presentation is loading. Please wait.

Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham.

Similar presentations


Presentation on theme: "Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham."— Presentation transcript:

1 Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

2 Overview Introduction Global statistics The what & why of repository statistics Benchmarks & data sources Compilation methods Web usage logging tools Google Analytics demo Problems and solutions Group session – Key issues

3 Global Repository Statistics Data Sources – Global lists of repositories OpenDOAR- http://www.opendoar.org/http://www.opendoar.org/ ROAR- http://roar.eprints.org/http://roar.eprints.org/ Repository66- http://www.repository66.org/http://www.repository66.org/ May be useful for advocacy work Examples of types of chart & presentation

4

5

6

7

8

9 ROAR – Individual Growth Charts

10 ROAR – Individual Source Data MonthRecords Archives 20040712 20040834 20040977 200410106 200411149 200412164 200501187 200502212 200503272 200504324 200505389 200506426 200507446 200508492 200509547 200510607 200511631 200512750 200601794 200602860 2006031019 2006041090 2006051128 2006061307 MonthRecords Archives 2006071347 2006081405 2006091469 2006101530 2006111610 2006121705 2007011768 2007021853 2007031934 2007042042 2007052169 2007062239 2007072264 2007082352 2007092374 2007102400 2007112438 2007122484 2008012540 2008022573 2008032611 2008042643 2008052681 2008062689

11

12

13 Delegates What and Why of Statistics Rate of growth For advocacy Measure of success – for our paymasters Rate of usage Targeting weak areas – departments Measure of success Justifying funding Most downloaded author/paper Promotes interest and engagement from authors

14 Delegates What and Why of Statistics Where are visitors coming from – referrers Curiosity – is it being seen by the right people Citation statistics To demonstrate the beneficial impact of repositories Drilling down for more detail For a sense reality Steep slopes, animation, etc Glitzy marketing

15 Individual Repositories - Content Growth & Deposition rates Measure of progress Impact of advocacy events Impact of mandatory deposition Types of document or item Trend-watching? Breakdown by department and/or author How much is everyone contributing? Proportion of full text v metadata only Measure of usefulness

16 Item types: Universidade do Minho

17 Individual Repositories - Performance Proportion of publications deposited How comprehensive is the archive? Proportion of authors who are depositing Are they complying with local mandates? Compliance with funders mandates Are you meeting your obligations? Repository administration Are your turn round times acceptable?

18 Compliance with the CERN Mandate

19 Compliance Benchmarks Counting publications Institution-wide bibliographies e.g. Maintained by research managers Publication lists on departmental web pages Public/Commercial databases – ISI, Medline, etc Counting authors Who qualifies as an author? Academic staff, Research students, Managers University Calendars & Departmental staff lists

20 Individual Repositories - Usage Rates of usage Measure of usefulness Impact of news-related items Most downloaded items Identifying research(ers) with most impact? Engendering competition between authors? Downloads according to author Performance reviews? Geographical distribution of users Are you reaching your intended audience?

21 Sources of Data Repositorys own database OAI-PMH Servers access log Remote logging

22 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands

23 Copying from the Human Interface

24

25 Interactive SQL Commands mysql> SELECT type,COUNT(*) FROM eprint GROUP BY type; +-----------------+----------+ | type | COUNT(*) | +-----------------+----------+ | article | 456 | | book | 5 | | book_section | 39 | | conference_item | 173 | | exhibition | 1 | | monograph | 18 | | other | 3 | | thesis | 4 | +-----------------+----------+ 8 rows in set (0.00 sec)

26 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial

27 OAI-PMH ListIdentifiers

28 OAI-PMH ListRecords

29 ROAR - Celestial dateidentifierurl 20070618oai:bora.uib.no:1956/2270Department of Earth Science 20070625oai:bora.uib.no:1956/2272Department of History 20070625oai:bora.uib.no:1956/2273Department of the History of Religions 20070626oai:bora.uib.no:1956/2274Section for Endocrinology 20070626oai:bora.uib.no:1956/2275Department of the History of Religions 20070626oai:bora.uib.no:1956/2276Department of the History of Religions 20070626oai:bora.uib.no:1956/2277Department of the History of Religions 20070626oai:bora.uib.no:1956/2278Department of the History of Religions 20070626oai:bora.uib.no:1956/2279Department of Oral Sciences 20070626oai:bora.uib.no:1956/2281Department of the History of Religions 20070626oai:bora.uib.no:1956/2282Department of Sociology 20070626oai:bora.uib.no:1956/2283Else Æyen 20070628oai:bora.uib.no:1956/2284Section for Art History 20070629oai:bora.uib.no:1956/2285Section for Russian 20070629oai:bora.uib.no:1956/2286Department of Geography 20070629oai:bora.uib.no:1956/2287Department of Greek, Latin and Egyptology 20070702oai:bora.uib.no:1956/2288Section for Spanish 20070702oai:bora.uib.no:1956/2289Department of Mathematics 20070702oai:bora.uib.no:1956/2290Department of Geography 20070702oai:bora.uib.no:1956/2291Department of Geography 20070702oai:bora.uib.no:1956/2292Department of Biology 20070703oai:bora.uib.no:1956/2293Department of Biology

30 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial Servers access log Web usage statistics tools

31 Raw Web Access Logs 209.237.238.179 - - [10/Apr/2005:05:34:06 +0100] "GET /portfolio.css HTTP/1.0" 200 816 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:16:27 +0100] "GET /DAWN_Index.htm HTTP/1.0" 200 8392 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:17:44 +0100] "GET /Eric.htm HTTP/1.0" 200 6975 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:21:12 +0100] "GET /Library_Form.htm HTTP/1.0" 200 7709 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:22:48 +0100] "GET /cleansing.htm HTTP/1.0" 200 11016 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:25:02 +0100] "GET /index.htm HTTP/1.0" 200 7613 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:28:19 +0100] "GET /integration.htm HTTP/1.0" 200 8027 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:31:35 +0100] "GET /merging.htm HTTP/1.0" 200 9132 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:07:34:39 +0100] "GET /publication.htm HTTP/1.0" 200 5327 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:08:22:38 +0100] "GET /ABACUS_Index.htm HTTP/1.0" 200 5421 "-" "ia_archiver" 209.237.238.179 - - [10/Apr/2005:08:27:34 +0100] "GET /limitations.htm HTTP/1.0" 200 3781 "-" "ia_archiver" 210.173.179.17 - - [20/Dec/2004:13:22:03 +0000] "GET /robots.txt HTTP/1.1" 404 - "-" "gazz/5.0 (gazz@nttr.co.jp)" 210.173.179.17 - - [20/Dec/2004:13:23:51 +0000] "GET / HTTP/1.1" 200 7613 "-" "gazz/5.0 (gazz@nttr.co.jp)" 210.173.179.17 - - [20/Dec/2004:13:25:34 +0000] "GET /Logo.gif HTTP/1.1" 200 3838 "-" "gazz/5.0 (gazz@nttr.co.jp)" 210.173.179.17 - - [20/Dec/2004:13:27:17 +0000] "GET /contact.htm HTTP/1.1" 200 4626 "-" "gazz/5.0 (gazz@nttr.co.jp)" 210.173.179.17 - - [20/Dec/2004:13:29:00 +0000] "GET /profile.htm HTTP/1.1" 200 10533 "-" "gazz/5.0 (gazz@nttr.co.jp)" 210.173.179.17 - - [20/Dec/2004:13:37:35 +0000] "GET /index.htm HTTP/1.1" 200 7613 "-" "gazz/5.0 (gazz@nttr.co.jp)" 210.173.179.17 - - [20/Dec/2004:13:47:55 +0000] "GET /publication.htm HTTP/1.1" 200 5327 "-" "gazz/5.0 (gazz@nttr.co.jp)" 210.173.179.17 - - [20/Dec/2004:13:49:39 +0000] "GET /InsideInfo.jpg HTTP/1.1" 200 19372 "-" "gazz/5.0 (gazz@nttr.co.jp)" Recorded fields include: IP Address of the computer requesting a file Date & time transaction completed Name of file requested Success code – usually 200 for successfully completed File size in bytes

32 Web Usage Statistics Tools Analog http://www.analog.cx/ Webalizer http://www.mrunix.net/webalizer/ AWStats http://www.mrunix.net/webalizer/ etc.

33 Sample output from the Analog Statistics Package

34

35

36 Sample output from the Webalizer Statistics Package

37

38

39

40

41

42 Sample output from the AWStats Statistics Package

43

44

45

46

47

48

49 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial Servers access log Web usage statistics tools Remote logging Google Analytics

50 http://www.google.com/analytics Sign up to a Google Account Specify the URL to be logged Obtain snippet of JavaScript code Insert snippet into HTML of pages to be logged Ideally into a template file Make sure the modified pages are live! Logging starts automatically Log in to your account to view the analytics

51

52

53

54

55

56 Google Analytics JavaScript snippet var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-3477654-3"); pageTracker._initData(); pageTracker._trackPageview(); Find URL Containing/Excluding String e.g.pdf Regular expressions e.g./[0-9]*/for EPrints IDs

57 Problems Web bots and crawlers Inflating usage volume Scewing usage time series Auxiliary files & non-eprint pages CSS style sheet files Image files – jpeg, gif, etc. Index pages Linking URLs to bibliographic references What does that eprint number mean?

58 Problems and Solutions Web bots and crawlers Use robots.txt & meta robots tags to prevent crawling Filtering out known bots Still leaves maverick hackers & students bots Auxiliary files & non-eprint pages Configuring & tuning the analysis tool Filter using regular expressions Linking URLs to bibliographic references Programmatic concordance e.g. IRStats

59

60 Over to Chris for DSpace statistics…

61 What are your priorities for statistics?

62 Peter Millington peter.millington@nottingham.ac.uk


Download ppt "Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham."

Similar presentations


Ads by Google