Presentation is loading. Please wait.

Presentation is loading. Please wait.

Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham.

Similar presentations


Presentation on theme: "Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham."— Presentation transcript:

1 Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham

2 Overview Introduction Global statistics The what & why of repository statistics Benchmarks & data sources Compilation methods Web usage logging tools Google Analytics demo Problems and solutions Group session – Key issues

3 Global Repository Statistics Data Sources – Global lists of repositories OpenDOAR- ROAR- Repository66- May be useful for advocacy work Examples of types of chart & presentation

4

5

6

7

8

9 ROAR – Individual Growth Charts

10 ROAR – Individual Source Data MonthRecords Archives MonthRecords Archives

11

12

13 Delegates What and Why of Statistics Rate of growth For advocacy Measure of success – for our paymasters Rate of usage Targeting weak areas – departments Measure of success Justifying funding Most downloaded author/paper Promotes interest and engagement from authors

14 Delegates What and Why of Statistics Where are visitors coming from – referrers Curiosity – is it being seen by the right people Citation statistics To demonstrate the beneficial impact of repositories Drilling down for more detail For a sense reality Steep slopes, animation, etc Glitzy marketing

15 Individual Repositories - Content Growth & Deposition rates Measure of progress Impact of advocacy events Impact of mandatory deposition Types of document or item Trend-watching? Breakdown by department and/or author How much is everyone contributing? Proportion of full text v metadata only Measure of usefulness

16 Item types: Universidade do Minho

17 Individual Repositories - Performance Proportion of publications deposited How comprehensive is the archive? Proportion of authors who are depositing Are they complying with local mandates? Compliance with funders mandates Are you meeting your obligations? Repository administration Are your turn round times acceptable?

18 Compliance with the CERN Mandate

19 Compliance Benchmarks Counting publications Institution-wide bibliographies e.g. Maintained by research managers Publication lists on departmental web pages Public/Commercial databases – ISI, Medline, etc Counting authors Who qualifies as an author? Academic staff, Research students, Managers University Calendars & Departmental staff lists

20 Individual Repositories - Usage Rates of usage Measure of usefulness Impact of news-related items Most downloaded items Identifying research(ers) with most impact? Engendering competition between authors? Downloads according to author Performance reviews? Geographical distribution of users Are you reaching your intended audience?

21 Sources of Data Repositorys own database OAI-PMH Servers access log Remote logging

22 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands

23 Copying from the Human Interface

24

25 Interactive SQL Commands mysql> SELECT type,COUNT(*) FROM eprint GROUP BY type; | type | COUNT(*) | | article | 456 | | book | 5 | | book_section | 39 | | conference_item | 173 | | exhibition | 1 | | monograph | 18 | | other | 3 | | thesis | 4 | rows in set (0.00 sec)

26 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial

27 OAI-PMH ListIdentifiers

28 OAI-PMH ListRecords

29 ROAR - Celestial dateidentifierurl oai:bora.uib.no:1956/2270Department of Earth Science oai:bora.uib.no:1956/2272Department of History oai:bora.uib.no:1956/2273Department of the History of Religions oai:bora.uib.no:1956/2274Section for Endocrinology oai:bora.uib.no:1956/2275Department of the History of Religions oai:bora.uib.no:1956/2276Department of the History of Religions oai:bora.uib.no:1956/2277Department of the History of Religions oai:bora.uib.no:1956/2278Department of the History of Religions oai:bora.uib.no:1956/2279Department of Oral Sciences oai:bora.uib.no:1956/2281Department of the History of Religions oai:bora.uib.no:1956/2282Department of Sociology oai:bora.uib.no:1956/2283Else Æyen oai:bora.uib.no:1956/2284Section for Art History oai:bora.uib.no:1956/2285Section for Russian oai:bora.uib.no:1956/2286Department of Geography oai:bora.uib.no:1956/2287Department of Greek, Latin and Egyptology oai:bora.uib.no:1956/2288Section for Spanish oai:bora.uib.no:1956/2289Department of Mathematics oai:bora.uib.no:1956/2290Department of Geography oai:bora.uib.no:1956/2291Department of Geography oai:bora.uib.no:1956/2292Department of Biology oai:bora.uib.no:1956/2293Department of Biology

30 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial Servers access log Web usage statistics tools

31 Raw Web Access Logs [10/Apr/2005:05:34: ] "GET /portfolio.css HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:16: ] "GET /DAWN_Index.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:17: ] "GET /Eric.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:21: ] "GET /Library_Form.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:22: ] "GET /cleansing.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:25: ] "GET /index.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:28: ] "GET /integration.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:31: ] "GET /merging.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:07:34: ] "GET /publication.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:08:22: ] "GET /ABACUS_Index.htm HTTP/1.0" "-" "ia_archiver" [10/Apr/2005:08:27: ] "GET /limitations.htm HTTP/1.0" "-" "ia_archiver" [20/Dec/2004:13:22: ] "GET /robots.txt HTTP/1.1" "-" "gazz/ [20/Dec/2004:13:23: ] "GET / HTTP/1.1" "-" "gazz/ [20/Dec/2004:13:25: ] "GET /Logo.gif HTTP/1.1" "-" "gazz/ [20/Dec/2004:13:27: ] "GET /contact.htm HTTP/1.1" "-" "gazz/ [20/Dec/2004:13:29: ] "GET /profile.htm HTTP/1.1" "-" "gazz/ [20/Dec/2004:13:37: ] "GET /index.htm HTTP/1.1" "-" "gazz/ [20/Dec/2004:13:47: ] "GET /publication.htm HTTP/1.1" "-" "gazz/ [20/Dec/2004:13:49: ] "GET /InsideInfo.jpg HTTP/1.1" "-" "gazz/5.0 Recorded fields include: IP Address of the computer requesting a file Date & time transaction completed Name of file requested Success code – usually 200 for successfully completed File size in bytes

32 Web Usage Statistics Tools Analog Webalizer AWStats etc.

33 Sample output from the Analog Statistics Package

34

35

36 Sample output from the Webalizer Statistics Package

37

38

39

40

41

42 Sample output from the AWStats Statistics Package

43

44

45

46

47

48

49 Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial Servers access log Web usage statistics tools Remote logging Google Analytics

50 Sign up to a Google Account Specify the URL to be logged Obtain snippet of JavaScript code Insert snippet into HTML of pages to be logged Ideally into a template file Make sure the modified pages are live! Logging starts automatically Log in to your account to view the analytics

51

52

53

54

55

56 Google Analytics JavaScript snippet var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA "); pageTracker._initData(); pageTracker._trackPageview(); Find URL Containing/Excluding String e.g.pdf Regular expressions e.g./[0-9]*/for EPrints IDs

57 Problems Web bots and crawlers Inflating usage volume Scewing usage time series Auxiliary files & non-eprint pages CSS style sheet files Image files – jpeg, gif, etc. Index pages Linking URLs to bibliographic references What does that eprint number mean?

58 Problems and Solutions Web bots and crawlers Use robots.txt & meta robots tags to prevent crawling Filtering out known bots Still leaves maverick hackers & students bots Auxiliary files & non-eprint pages Configuring & tuning the analysis tool Filter using regular expressions Linking URLs to bibliographic references Programmatic concordance e.g. IRStats

59

60 Over to Chris for DSpace statistics…

61 What are your priorities for statistics?

62 Peter Millington


Download ppt "Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham."

Similar presentations


Ads by Google