` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate.

` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate

UKSG Usage Factor Project 1. Project rationale 2. Issues addressed before data collection and analysis 3. Collecting and analysing the data  What data we collected  Methodology  Issues and challenges 4. Results and Recommendations 5. Next steps

UKSG Usage Factor Project 1. Project rationale 2. Issues addressed before data collection and analysis 3. Collecting and analysing the data  What data we have collected  Methodology  Issues and challenges 4. Results and Recommendations 5. Next steps

The challenge……. ISI's Impact Factor compensates for the fact that larger journals will tend to be cited more than smaller ones Can we do something similar for usage? In other words, should we seek to develop a “Usage Factor” as an additional measure of journal quality/value?

For example….. Usage Factor = Total usage over period ‘x’ of articles published during period ‘y’ Total articles published during period ‘y’

Usage factor advantages Especially helpful for journals and fields not covered by ISI Especially helpful for journals with high undergraduate or practitioner use Especially helpful for journals publishing relatively few articles Data available potentially sooner than with Impact Factors

“Authors select journals that will give their articles prestige and reach. Impact Factor is a widely used surrogate for the former, while perceived circulation and readership reflect the latter. But usage is becoming more important as a measure of reach” Carol Tenopir

UKSG Usage Factor Project 1. Brief background 2. Issues addressed before data collection and analysis 3. Collecting and analysing the data  What data we have collected  Methodology  Issues and challenges 4. Results and Recommendations 5. Next steps

Key data issues we have addressed 1. Consistency – numerator/denominator 2. Defining article usage year 3. Defining article publication date 4. Different usage patterns by subject

With key data issues addressed, we developed a specification for a report via which participating publishers would deliver real usage data for analysis

Real journal usage data analysed by John Cox Associates and Frontline GMS Participating publishers:- American Chemical Society Emerald IOP Nature Publishing OUP Sage Springer

UKSG Usage Factor Project 1. Brief background 2. Issues addressed before data collection and analysis 3. Collecting and analysing the data  What data we collected  Methodology  Issues and challenges 4. Results and Recommendations 5. Next steps

The data 326 journals 38 Engineering 32 Physical Sciences 119 Social Sciences 29 Business and Management 35 Humanities 102 Medicine and Life Sciences 57 Clinical Medicine c.250,000 articles 3350 spreadsheets 1GB of data

The data

The calculation Usage Factor = Total usage over period ‘x’ of articles published during period ‘y’ Total articles published during period ‘y’ ‘x’ is the usage period ‘y’ is the publication period

JUF variables to be tested Journal Content type All content Articles only Version VoR All versions Publication Year 20062007200820092006-72007-82008-9 Usage period Months 1-12 Months 1-24 Months 13-24 Months 13-36

JUF variables to be tested Subject comparisons Broad subjects: Physical Sciences Medicine and Life Sciences Social Sciences Humanities Engineering Narrow subjects Business and Management Clinical Medicine

Issues and challenges Article metadata resides in multiple databases Key article metadata needed for JUF not included in usage log records Need to map and merge different records Lack of standards for key schemas and practices Different publisher policies on article version labelling and availability Multiple schemas for “article type” – typically journal rather than publisher specific Some difficulties retrieving detailed historical usage records by article, especially when data straddled transfer between systems

Results Content Type In social sciences JUFs were higher for non- article content In medicine and life sciences JUFs were higher for article content In humanities, physical sciences, and business & management, JUF differences between article and non-article content were not significant

Results Article Version In physical sciences the JUF was significantly (sometimes dramatically) lower when calculations were confined to the Version of Record In all other subjects the JUF was significantly higher when calculations were confined to the Version of Record

Results JUF and Impact Factor Little correlation apart from the Nature branded titles Some titles with no or very low impact factors have very high JUFs

Usage Factor Project – some initial results

34 How stable are journal rankings based on impact factor over time? ISI impact factors Medical and life science journals (n=36) How stable are journal rankings based on the classic ISI impact factor? The previously mentioned paper by Amin and Mabe shows that impact factors may fluctuate year on year by as much as plus or minus 40 per cent and that this is in part a function of journal size. As a point for comparing the properties of the journal usage factor, we start this section by looking at changes in rankings among 36 medical titles over three years. Key findings Journal rankings based on ISI impact factor are pretty consistent over the period 2006-2008. Rank order correlations 2008 vs 2007: Spearman’s rho = 0.973, p < 0.01 2007 vs 2006: Spearman’s rho = 0.971, p < 0.01 2008 vs 2006: Spearman’s rho = 0.959, p < 0.0 How to read this graphic We start by putting the journals into ranked order by ISI impact factor for each of the three years 2006-2009, These charts show how these ranked orderings compare across different years. For example, the middle chart in the right hand column compares 2007 and 2008. If there were no changes in journal ranking, all the journals would like on the diagonal. Journals below the diagonal have fallen down the order.

35 How stable are journal rankings based on usage factor over time? UKSG usage factors Medical and life science journals (n=48) The journal usage factors reported here are based on a single publication year with use being measured in months 1-24. Key findings Rankings based on a journal usage factor (based on one publication year and use in months 1-24), we find reasonable stability with high and statistically significant correlations. Rank order correlation 2008 vs 2007: Spearman’s rho = 0.886, p < 0.01 2007 vs 2006: Spearman’s rho = 0.862, p < 0.01 2008 vs 2006: Spearman’s rho = 0.755, p < 0.01 The correlations are smaller than for the impact factors in the previous slide, but they are still high. This analysis shows that usage factors are more volatile than impact factors and any journal rankings based on them will show greater churn year on year. But, broadly speaking, they do a similar job.

Recommendations – the metric The most promising JUF metric for further testing will be based on:- All content types except standing matter Non-article matter is published for a purpose and its usage forms part of the usage of the journal as a whole Item type control is difficult to manage All versions published For simplicity and completeness Publication period: 2 years For a greater “smoothing” effect on occasional unexplained peaks and troughs in usage To reduce the effect of pre-”Version of Record” publication Usage period: 2 years contemporaneous with publication period To capture peak post-publication usage To keep the metric as current as possible

Recommendations - infrastructure Development of systems to automate the extraction and collation of data needed for JUF calculation is essential if calculation of this metric is to become routine Development of an agreed standard for content item types, to which journal specific item types would be mapped, is desirable as it would allow for greater sophistication in JUF calculation Development or adoption of a simple subject taxonomy to which journal titles would be assigned by their publishers

Recommendations - infrastructure Publishers should adopt standard “article version” definitions based on NISO recommendations But no specific recommendations for the labelling or making available of these versions

Next steps Progress Report summarising Phase 1 and 2 will be published in Q1 2011 Meanwhile further analysis of the usage data collected in Phase 2 is being undertaken by CIBER at UCL

42 Patterns of use across time Monthly use of all items published in 2006 Engineering journals (n=21) About this analysis In order to have an informed discussion about the optimal length of the time window to use to record downloads for the usage factor, we need to understand how items are used over time. In this and the following analyses, we take all items published in 2006 and look at their monthly pattern of use over the subsequent three years. Ideally, we need a longer time series, but this is all we have. The trend line, which admittedly does not give an excellent fit to the data, suggests that aggregate usage of 2006 engineering items will trickle to near zero (i.e. become `asymptotic’) at around 45 months after publication. The life span of original research articles and review papers is likely to be longer, as the `all items’ approach used here will contain much relatively ephemeral material such as editorial material and rapid communications. How to read this graphic This chart shows the number of downloads each month with a trend line.

43 Patterns of use across time Monthly use of all items published in 2006 Humanities journals (n=24) About this slide Humanities items follow a generally similar pattern to engineering but with a shorter and more delayed peak. The trend line, which offers a reasonable fit to the data, suggests that aggregate usage of 2006 humanities items will trickle to near zero (i.e. become `asymptotic’) at around 48 months after publication.

44 Patterns of use across time Monthly use of all items published in 2006 Physical sciences journals (n=3) About this slide The monthly pattern of use for physical sciences items is very different from the other broad subjects in this study. There is a very sharp initial peak followed by continuing and steady interest in items in the period months 14-36 [caution: we only have three journals]. There is not enough data to justify calculating an end point for physical sciences articles, but the well-fitting trend line suggests it may have been reached at or just after 36 months.

45 Patterns of use across time Monthly use of all items published in 2006 Social sciences journals (n=115) About this slide The pattern in the social sciences is broadly similar to that for humanities items. The trend line, which offers a reasonable fit to the data, suggests that aggregate usage of 2006 social sciences items will trickle to near zero (i.e. become `asymptotic’) at around 47 months after publication.

46 Patterns of use across time Monthly use of all items published in 2006 Medical and life sciences journals (n=47) About this slide Monthly usage in the medical and life sciences shows an interesting double peak: a very immediate one in the first few months and another from month 12, which may well be due to a delayed open access and / or citation effect. The trend line, which offers a good fit to the data, suggests that aggregate usage of 2006 medical and life science items will trickle to near zero (i.e. become `asymptotic’) at around 40 months after publication.

47 Which time window is `best’? Cumulative use of all items published in 2006 by usage time window Comparison by broad subject area Usage months Engineeri ng Humaniti es Medical and life sciences Social sciences 1-1231.6%32.4%27.2%30.9% 1-1849.0%50.0%43.6%47.5% 1-2463.8%65.7%60.8%63.0% 13-2432.2%33.3%33.6%32.1% 13-3655.6%57.7%61.8%56.5% % of life time use

48 Patterns of use across time Early conclusions We ideally need a longer time series to be sure but it appears that a good working estimate for the useful lifetime of all items in all versions is about four years. The longevity of original research articles and review papers is likely to be longer than this, and possibly more highly differentiated between subjects but if all items are used, then this seems a reasonable position to take. All the subjects show a peak roughly between months 6 and 12 and a broadly similar (and steady) pattern of cumulative item use in years 1-3. Tentatively, a time window based on months 1-24 would seem to be the most appropriate: capturing information both about the peak and the subsequent steady state growth. If the estimations of lifetime use are accurate, roughly four years for all items, then it would appear that a 1-24 month window will capture a substantial proportion of lifetime use (all items, all versions), probably of the order of 60 per cent as a global figure. A 1-12 month window will capture around 30 per cent of lifetime use.

49 Patterns of use across items A problem with averages: the Bill Gates problem Medical and life sciences journals Mean = 335.5 Many articles used a few times A few articles used many times 49 About this slide The histogram shows the frequency with which individual items are downloaded. The publication year is 2008 and this chart shows use in month 2. The pattern is lognormal: many items are used rarely, a few items used many times. An issue using the arithmetic mean to summarise this data is that the few heavily used items will exert a major effect: the mean will be a lot higher than other averages such as the mode or median. The International Mathematical Union has criticised ISI’s citation impact factor for this reason.

50 Provisional conclusions Exploratory data analysis of UKSG usage factors Medical and life science journals (n=48) Publication year 2008, usage in months 1-24 Unadjusted Cox data, mean and 95% confidence intervals Publication year 2008, usage in months 1-24 Log transformed CIBER data, mean and 95% confidence intervals 354 539 466 653 303 407 438 554 Articles, final versions attract significantly higher use than all items, all versions ANOVA F=6.1,p < 0.01 Articles, final versions attract significantly higher use than all items, all versions ANOVA F=6.8,p < 0.01

51 Patterns of use across items Exploratory data analysis of UKSG usage factors Medical and life science journals (n=48), average JUFs CIBER 2008 1-24 How to read this graphic This heat map shows the 2008 JUF broken down by document type and version, with average downloads per category. The colour coding places the numbers in broad bands. Numbers on italics show the % of document types in the test collection. This is indicative only: publishers are not consistent in how they describe document types.

52 Patterns of use across items Early conclusions To be purist about this, the journal usage factor should not be based on an arithmetic average of the raw data. Several approaches are possible to a `better’ estimate of the average. The simplest is just to take the median. The approach taken in this report (and for the rest of the CIBER study) is to use a natural log transform. This may well not be necessary in the `real world’ but CIBER is pursuing this course of action because it will allow a range of statistical tests to be applied to the data with much greater confidence. There is very considerable variation in average use both by document type and version. This does not matter in one sense: if the intention of the exercise is to simply report on usage for the journal as a whole package, then it is completely valid to include all items and all versions. This would certainly have practical advantages for data processing. However, and the same stricture applies to the classic impact factor*, a journal usage factor could easily be manipulated by simply changing the balance of item types in the final issue. This is a serious issue for usage data. *See, for example, Mayur Amin and Michael Mabe, Impact factors: Use and abuse, Perspectives in Publishing 1, Elsevier, 2000.

Next steps - Phase 3? Stakeholders are considering how the next Phase of the JUF project might be structured and funded:- Issues relating to subject taxonomies, metadata and automation of publisher processes to be addressed In collaboration with data suppliers, develop agreed taxonomical and metadata standards and templates which will streamline the process of data collection and analysis More detailed practical recommendations for a cost- effective infrastructure to manage the Usage Factor process. Scaled up testing of candidate metrics recommended in Cox report

UKSG Usage Factor Project Many thanks to the sponsors of this latest phase:- GOLD SILVER ALPSP American Chemical Society STM Nature Publishing Group Springer

Thank you for your attention! http://www.uksg.org/usagefactors/

` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate.

Similar presentations

Presentation on theme: "` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate.

Similar presentations

Presentation on theme: "` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate."— Presentation transcript:

Similar presentations

About project

Feedback