Presentation on theme: "New COUNTER-based usage metrics for journals and other publications Peter Shepherd COUNTER August 2011."— Presentation transcript:
New COUNTER-based usage metrics for journals and other publications Peter Shepherd COUNTER August 2011
New metrics Journal Usage Factor: a usage-based measure of journal impact and quality ( funded by UKSG, RIN and others) PIRUS (Publisher and Institutional Repository Usage Statistics): usage statistics for individual articles (funded by JISC)
Journal Usage Factor: overview ISI's Impact Factor compensates for the fact that larger journals will tend to be cited more than smaller ones Can we do something similar for usage? In other words, should we seek to develop a “Usage Factor” as an additional measure of journal quality/value?
Journal Usage Factor advantages Especially helpful for journals and fields not covered by ISI Especially helpful for journals with high undergraduate or practitioner use Especially helpful for journals publishing relatively few articles Data available potentially sooner than with Impact Factors
Usage Factor Stage 2 Modelling and Analysis Real journal usage data analysed by John Cox Associates, Frontline GMS and CIBER Participating publishers:- American Chemical Society Emerald IOP Nature Publishing OUP Sage Springer
The data 326 journals 38 Engineering 32 Physical Sciences 119 Social Sciences 29 Business and Management 35 Humanities 102 Medicine and Life Sciences 57 Clinical Medicine c.150,000 articles 1GB of data
Results Content Type In social sciences JUFs were higher for non- article content In medicine and life sciences JUFs were higher for article content In humanities, physical sciences, and business & management, JUF differences between article and non-article content were not significant
Results Article Version In physical sciences the JUF was significantly (sometimes dramatically) lower when calculations were confined to the Version of Record In all other subjects the JUF was significantly higher when calculations were confined to the Version of Record
Results JUF and Impact Factor Little correlation apart from the Nature branded titles Some titles with no or very low impact factors have very high JUFs
Journal Usage Factor: some initial results
Recommendations- the metric The Journal Usage Factor should be calculated using the median rather than the arithmetic mean Usage data are highly skewed; most items attract relatively low use and a few are used many times. As a result, the use of the arithmetic mean is not appropriate
Recommendations- the metric A range of usage factors should ideally be published for each journal: a comprehensive factor (all items, all versions) plus supplementary factors for selected items (e.g. article and final versions). There is considerable variation in the relative use made of different document types and versions. This means that the usage factor will be affected substantially by the particular mix of items included in a given journal, all other things being equal
Recommendations- the metric Journal Usage Factors should be published as integers with no decimal places Monthly patterns of use at the item level are quite volatile and usage factors therefore include a component of statistical noise Journal Usage Factors should be published with appropriate confidence levels around the average to guide their interpretation As a result of this statistical noise, the mean usage factor should be interpreted within intervals of plus or minus 22 per cent
Recommendations- the metric The Journal Usage Factor should be calculated initially on the basis of a maximum time window of 24 months. It might be helpful later on to consider a 12-month window as well (or possibly even a 6-month window) to provide further insights This study shows that relatively short time windows capture a substantial proportion of the average lifetime interest in full journal content. Longer windows than 24-months are not recommended and this should be considered a maximum. There is possibly a case for considering a 12-month window, but there are counter- arguments here: the impact of publishing ahead of print especially.
Recommendations- the metric The Journal Usage Factor is not directly comparable across subject groups and should therefore be published and interpreted only within appropriate subject groupings. Usage, in months 1-12 especially, follows different patterns in different subject areas The Journal Usage Factor should be calculated using a publication window of 2 years Usage factors will tend to inflate across the board year-on-year as a result of many factors, including greater item discoverability through search engines and gateways. Changes to access arrangements (e.g. Google indexing) will have dramatic and lasting effects. The use of a two-year publication window would ameliorate some of these effects by providing a moving average as well as a greater number of data points for calculating the usage factor.
Recommendations- the metric There seems to be no reason why ranked lists of journals by usage factor should not gain acceptance The usage factor delivers journal rankings that are comparable in terms of their year-on-year stability with those generated from citation metrics such as the ISI impact factor and SNIP Small journals and titles with less than 100 downloads per item are unsuitable candidates for Journal Usage Factors: these are likely to be inaccurate and easily gamed Usage factors below a certain threshold value (perhaps 100 but research is needed on a larger scale to explore this further) are likely to be inaccurate due to statistical noise. The size of the journal should also be taken into account
Recommendations- the metric The Journal Usage Factor provides very different information from the citation Impact Factor and this fact should be emphasised in public communications. The usage factor does not appear to be statistically associated with measures of citation impact Further work is needed on usage factor gaming and on developing robust forensic techniques for its detection Attempts to game the usage factor are highly likely. CIBER’s view is that the real threat comes from software agents rather than human attack. The first line of defence has to be making sure that COUNTER protocols are robust against machine attack. The analysis in this report suggests that a cheap and expedient second line of defence would be to develop statistical forensics to identify suspicious behaviour, whether it is human or machine in origin.
Recommendations- the metric Further work is needed to broaden the scope of the project over time to include other usage-based metrics Although the scope of this study was to consider the Journal Usage Factor only, future work could look at the other indicators that mimic other aspects of online use, such as a ‘journal usage half-life’ or a ‘reading immediacy index’.
Recommendations - infrastructure Development of systems to automate the extraction and collation of data needed for JUF calculation is essential if calculation of this metric is to become routine Development of an agreed standard for content item types, to which journal specific item types would be mapped, is desirable as it would allow for greater sophistication in JUF calculation Development or adoption of a simple subject taxonomy to which journal titles would be assigned by their publishers Publishers should adopt standard “article version” definitions based on ALPSP/NISO recommendations
Next steps: Stage 3 Stage 3: Objectives Preparation of a draft Code of Practice for the Journal Usage Factor Further testing of the recommended methodology for calculating Journal Usage Factor Investigation of an appropriate, resilient subject taxonomy for the classification of journals Exploration of the options for an infrastructure to support the sustainable implementation of JUF Investigate the feasibility of applying the Usage Factor concept to other categories of publication For further information, see the full report of the Journal Usage Factor project on the COUNTER website:
PIRUS : why now? Increasing interest in article-level usage More journal articles hosted by Institutional and other Repositories Authors and funding agencies are increasingly interested in a reliable, global overview of usage of individual articles Online usage becoming an alternative, accepted measure of article and journal value Knowledge Exchange report recommends developing standards for usage reporting at the individual article level Usage-based metrics being considered as a tool for use in measuring the outputs of research.
PIRUS : why now? Article-level usage metrics now more practical Implementation by COUNTER of XML-based usage reports makes more granular reporting of usage a practical proposition Implementation by COUNTER of the SUSHI protocol facilitates the automated consolidation of usage data from different sources.
PIRUS : the challenge An article may be available from:- The main journal web site Ovid ProQuest PubMed Central Authors’ local Institutional Repositories If we want to assess article impact by counting usage, how can we maximise the actual usage that we capture?
PIRUS : mission and project aims Mission To develop a global standard to enable the recording, reporting and consolidation of online usage statistics for individual journal articles hosted by Institutional Repositories, Publishers and other entities Project aims Develop COUNTER-compliant usage reports at the individual article level Create guidelines which, if implemented, would enable any entity that hosts online journal articles to produce these reports Propose ways in which these reports might be consolidated at a global level in a standard way.
Reliable usage data will be available for journal articles, wherever they are held Repositories will have access to new functionality from open source software that will allow them to produce standardised usage reports from their data Digital repository systems will be more integral to research and closely aligned to research workflows and environments The authoritative status of PIRUS usage statistics will enhance the status of repository data and content The standard can be extended to cover other categories of content stored by repositories PIRUS: benefits
Technical: a workable technical model for the collection, processing and consolidation of individual article usage statistics has been developed. Organizational: an organizational model for a Central Clearing House that would be responsible for the collection, processing and consolidation of usage statistic has been proposed. Economic: the costs for repositories and publishers of generating the required usage reports, as well as the costs of any central clearing house/houses have been calculated and a model for recovering these costs has been proposed Political: the broad support of all the major stakeholder groups (repositories, publishers, authors, etc) is being sought. PIRUS: results
PIRUS: organizational issues Specifications for the Governance of PIRUS, going forward define the nature and mission of the Central Clearing House(s) (CCH) in more detail, in discussion with publishers and repositories Develop a specification for the technical, organizational and business models for the CCH
PIRUS2: governance going forward Principles Independent, not-for-profit organization International Representation of the main stakeholder groups Repositories Publishers Research Institutions Role Define and implement mission Strategic oversight Set and monitor standards Set fees and manage finances Select and monitor suppliers
PIRUS2: nature and mission of the Central Clearing House(s) One global CCH Cost-effective Industry is global, with global standards Easier to set and modify standards Simpler interface with publishers and repositories Can be outsourced Existing organizations exist with the required capabilities Scenarios to be supported See next slide……..
Step 1: a fulltext article is downloaded Step 2: tracker code invoked, generating an OpenURL log entry Step A1: OpenURL log entries sent to CCH responsible for creating and consolidating the usage statistics Step B1: OpenURL log entry sent to local server Step A3: COUNTER-compliant usage statistics collected and collated per article (DOI) in XML format Step B2: OpenURL log entries harvested by CCH responsible for creating and consolidating usage statistics Step B5: COUNTER compliant usage statistics available from CCH to authorized parties Step A4: COUNTER compliant usage statistics available from CCH to authorized parties Step C1: OpenURL log entry sent to local server Step A2: logs filtered by COUNTER rules Step B4: COUNTER-compliant usage statistics collected and collated per article (DOI) in XML format Step C2: logs filtered by COUNTER rules Step C3: COUNTER-compliant usage statistics collected and collated per article (DOI) in XML format Scenario A Scenario B Scenario C Step B3: logs filtered by COUNTER rules Step C4: COUNTER compliant usage statistics available from repository or publisher to CCH
PIRUS2: CCH operating principles The “bucket” of usage data should be controlled by the participants - they can decide whether to compile the usage reports themselves or to delegate that role to the CCH Access to the CCH should be limited to authorised parties Usage reports must state the sources from which they have been compiled to ensure transparency
PIRUS: outputs from the CCH Usage reports for publishers Usage reports for repositories Usage reports for research institutions Key requirements: Set of core reports Flexibility in outputs
A prototype service that meets the following criteria: A workable technical model, refined from that proposed in the original PIRUS project with more extensive tests with a larger and more diverse data set A practical organizational model based on co-operation between proven, existing suppliers of data processing, data management and auditing services that meets the requirement for an independent, trusted and reliable service. It is clear from a survey carried out at the end of this project that the majority of publishers are not, largely for economic reasons, yet ready to implement or participate in such a service An economic model that provides a cost-effective service and a logical, transparent basis for allocating costs among the different users of the service. While this economic model is based on costs that vendors of usage statistics services have validated as reasonable, there is strong resistance from publishers to accepting these costs Further information: PIRUS: summary
COUNTER Release 4 - objectives A single, unified Code covering all e-resources, including journals, databases, books, reference works, multimedia content, etc. Improve the application of XML and SUSHI in the design of the usage reports Take into account the outcomes of the Journal Usage Factor and PIRUS projects
COUNTER Release 4: timetable and development process April 2011: announcement of objectives, process and timetable for the development of Release 4; open invitation to submit suggestions April-June 2011: evaluation of submitted suggestions by COUNTER Executive June September 2011: development of Draft Release 4 October 2011: publication of Draft Release 4 October January 2012: comments received on Draft Release 4 March 2012: publication of Release 4 December 2013: deadline for implementation by vendors of Release 4 Further information: