Presentation on theme: "Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham."— Presentation transcript:
Repository Statistics Peter Millington Technical Development Officer SHERPA, University of Nottingham
Overview Introduction Global statistics The what & why of repository statistics Benchmarks & data sources Compilation methods Web usage logging tools Google Analytics demo Problems and solutions Group session – Key issues
Global Repository Statistics Data Sources – Global lists of repositories OpenDOAR- http://www.opendoar.org/http://www.opendoar.org/ ROAR- http://roar.eprints.org/http://roar.eprints.org/ Repository66- http://www.repository66.org/http://www.repository66.org/ May be useful for advocacy work Examples of types of chart & presentation
Delegates What and Why of Statistics Rate of growth For advocacy Measure of success – for our paymasters Rate of usage Targeting weak areas – departments Measure of success Justifying funding Most downloaded author/paper Promotes interest and engagement from authors
Delegates What and Why of Statistics Where are visitors coming from – referrers Curiosity – is it being seen by the right people Citation statistics To demonstrate the beneficial impact of repositories Drilling down for more detail For a sense reality Steep slopes, animation, etc Glitzy marketing
Individual Repositories - Content Growth & Deposition rates Measure of progress Impact of advocacy events Impact of mandatory deposition Types of document or item Trend-watching? Breakdown by department and/or author How much is everyone contributing? Proportion of full text v metadata only Measure of usefulness
Individual Repositories - Performance Proportion of publications deposited How comprehensive is the archive? Proportion of authors who are depositing Are they complying with local mandates? Compliance with funders mandates Are you meeting your obligations? Repository administration Are your turn round times acceptable?
Compliance Benchmarks Counting publications Institution-wide bibliographies e.g. Maintained by research managers Publication lists on departmental web pages Public/Commercial databases – ISI, Medline, etc Counting authors Who qualifies as an author? Academic staff, Research students, Managers University Calendars & Departmental staff lists
Individual Repositories - Usage Rates of usage Measure of usefulness Impact of news-related items Most downloaded items Identifying research(ers) with most impact? Engendering competition between authors? Downloads according to author Performance reviews? Geographical distribution of users Are you reaching your intended audience?
Sources of Data Repositorys own database OAI-PMH Servers access log Remote logging
Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands
ROAR - Celestial dateidentifierurl 20070618oai:bora.uib.no:1956/2270Department of Earth Science 20070625oai:bora.uib.no:1956/2272Department of History 20070625oai:bora.uib.no:1956/2273Department of the History of Religions 20070626oai:bora.uib.no:1956/2274Section for Endocrinology 20070626oai:bora.uib.no:1956/2275Department of the History of Religions 20070626oai:bora.uib.no:1956/2276Department of the History of Religions 20070626oai:bora.uib.no:1956/2277Department of the History of Religions 20070626oai:bora.uib.no:1956/2278Department of the History of Religions 20070626oai:bora.uib.no:1956/2279Department of Oral Sciences 20070626oai:bora.uib.no:1956/2281Department of the History of Religions 20070626oai:bora.uib.no:1956/2282Department of Sociology 20070626oai:bora.uib.no:1956/2283Else Æyen 20070628oai:bora.uib.no:1956/2284Section for Art History 20070629oai:bora.uib.no:1956/2285Section for Russian 20070629oai:bora.uib.no:1956/2286Department of Geography 20070629oai:bora.uib.no:1956/2287Department of Greek, Latin and Egyptology 20070702oai:bora.uib.no:1956/2288Section for Spanish 20070702oai:bora.uib.no:1956/2289Department of Mathematics 20070702oai:bora.uib.no:1956/2290Department of Geography 20070702oai:bora.uib.no:1956/2291Department of Geography 20070702oai:bora.uib.no:1956/2292Department of Biology 20070703oai:bora.uib.no:1956/2293Department of Biology
Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial Servers access log Web usage statistics tools
Compilation Methods Repositorys own database Copying from the human interface Interactive SQL commands OAI-PMH Harvesting programs – e.g. ROARs Celestial Servers access log Web usage statistics tools Remote logging Google Analytics
Problems Web bots and crawlers Inflating usage volume Scewing usage time series Auxiliary files & non-eprint pages CSS style sheet files Image files – jpeg, gif, etc. Index pages Linking URLs to bibliographic references What does that eprint number mean?
Problems and Solutions Web bots and crawlers Use robots.txt & meta robots tags to prevent crawling Filtering out known bots Still leaves maverick hackers & students bots Auxiliary files & non-eprint pages Configuring & tuning the analysis tool Filter using regular expressions Linking URLs to bibliographic references Programmatic concordance e.g. IRStats