Presentation on theme: "1 Quality criteria for data aggregation used in academic rankings IREG FORUM on University rankings Methodologies under scrutiny 16-17 May 2013, Warsaw,"— Presentation transcript:
1 Quality criteria for data aggregation used in academic rankings IREG FORUM on University rankings Methodologies under scrutiny 16-17 May 2013, Warsaw, Poland Michaela Saisana firstname.lastname@example.org European Commission, Joint Research Centre, Econometrics and Applied Statistics Unit
2 Outline Global rankings at the forefront of the policy debate Overview of two global university rankings (ARWU, THES) Statistical Coherence Tests Uncertainty analysis Policy Implications Conclusions
3 Outline Global rankings at the forefront of the policy debate Overview of two global university rankings (ARWU, THES) Statistical Coherence Tests Uncertainty analysis Policy Implications Conclusions
4 Definition of the university is broad: A university – as the name suggests – tends to encompass a broad range of purposes and dimensions, focus and missions difficult to condense into a compact measure Still, for reasons of governance, accountability and transparency, there is an increasing interest among policymakers as well as among practitioners in measuring and benchmarking "excellence" across universities. The growing mobility of students and researchers has also created a market for these measures among the prospective students and their families. Global rankings at the forefront of the policy debate
5 Global rankings have raised debates and policy responses (EU, national level): to improve the positioning of a country within the existing measures, to create new measures, to discuss regional performance (e.g. show that USA is well ahead of Europe in terms of cutting-edge university research) Global rankings at the forefront of the policy debate
6 10-fold increase in the last 10 years Guess how many contain the word “THES ranking” or “ARWU ranking”? 20% Global rankings at the forefront of the policy debate
7 1.Academic Ranking of World Universities (ARWU) (Shanghai Jiao Tong University), 2003 2.Webometrics (Spanish National Research Council), 2003 3.World University Ranking (Times Higher Education/Quacquarelli Symonds), 2004–09 4.Performance Ranking of Scientific Papers for Research Universities (HEEACT), 2007 5.Leiden Ranking (Centre for Science & Technology Studies, University of Leiden), 2008 6.World's Best Colleges and Universities (US News and World Report), 2008 7.SCImago Institutional Rankings, 2009 8.Global University Rankings (RatER) (Rating of Educational Resources, Russia), 2009 9.Top University Rankings (Quacquarelli Symonds), 2010 10.World University Ranking (Times Higher Education/Thomson Reuters—THE-TR), 2010 11. U-Multirank (European Commission), 2011 Global rankings at the forefront of the policy debate Over 60 countries have introduced national rankings, and there are numerous regional, specialist and professional rankings.
8 University rankings are used to judge about the performance of university systems … whether intended or not on by their proponents Global rankings at the forefront of the policy debate
9 France: Creation of 10 centres of HE excellence The minister of Education set a target to put at least 10 French universities among the top 100 in ARWU by 2012 President has put French standing in these international ranking at the forefront of the policy debate (Le Monde, 2008). Italy (0 Uni in the top 100 of the ARWU ranking seen as failure of the national educational system). Spain ( 1 Uni in the top 200 of the ARWU hailed as a great national achievement) Global rankings at the forefront of the policy debate
10 An OECD study shows that worldwide university leaders are concerned about ranking systems with consequences on the strategic and operational decisions they take to improve their research performance. (Hazelkorn, 2007) There over 16,000 HEIs, yet some of the global rankings merely capture the top 100 universities – less than 1%. (Hazelkorn, 2013) Global rankings at the forefront of the policy debate
11 An extreme impact of Global Rankings What - 2005 THES created a major controversy in Malaysia: country’s top two universities slipping by almost 100 places compared to 2004. Why - change in the ranking methodology (not well known fact and of limited comfort) Impact - Royal commission of inquiry to investigate the matter. A few weeks later, the Vice-Chancellor of the University of Malaysia stepped down. Global rankings at the forefront of the policy debate
12 Global rankings at the forefront of the policy debate Overview of two global university rankings (ARWU, THES) Statistical Coherence Tests Uncertainty analysis Policy Implications Conclusions
13 PROS and CONS 6 « objective » indicators Focus on research performance, overlooks other U. missions. Biased towards hard-science institutions Favours large institutions METHODOLOGY 6 indicators Best performing institution =100; score of other institutions calculated as a percentage Weighting scheme chosen by rankers Linear aggregation of the 6 indicators Overview – 2007 ARWU ranking
14 PROS and CONS Attempt to take into account teaching quality Two expert-based indicators: 50% of total (Subjective indicators, lack of transparency) yearly changes in methodology Measures research quantity METHODOLOGY 6 indicators z-score calculated for each indicator; best performing institution =100; other institutions are calculated as a percentage Weighting scheme: chosen by rankers Linear aggregation of the 6 indicators Overview – 2007 THES ranking
15 1 – Same top10: Harvard, Cambridge, Princeton, Cal- tech, MIT and Columbia 2 - Greater variations in the middle to lower end of the rankings 3 - Europe is lagging behind: both ARWU (else SJTU) and THES rankings Overview- Comparison (2007) 4 – THES favours UK universities: all UK universities below the line (in red)
16 University rankings- yearly published + Very appealing for capturing a university’s multiple missions in a single number + Allow one to situate a given university in the worldwide context - Can lead to misleading and/or simplistic policy conclusions
17 Question: Can we say something about the quality of the university rankings and the reliability of the results?
18 Global rankings at the forefront of the policy debate Overview of two global university rankings (ARWU, THES) Statistical Coherence Tests Uncertainty analysis Policy Implications Conclusions
19 The Stiglitz report (p.65): […] a general criticism that is frequently addressed at composite indicators, i.e. the arbitrary character of the procedures used to weight their various components. […] The problem is not that these weighting procedures are hidden, non- transparent or non-replicable – they are often very explicitly presented by the authors of the indices, and this is one of the strengths of this literature. The problem is rather that their normative implications are seldom made explicit or justified. Statistical coherence
20 Question: Can we say something about the quality of the university rankings and the reliability of the results?
21 Y = 0.5 x 1 + 0.5 x 2 Statistical coherence - Dean’s example X 1 : hours of teachingX 2 : # of publications Estimated R 1 2 = 0.0759, R 2 2 = 0.826, corr(x 1, x 2 ) =−0.151, V(x 1 ) = 116, V(x 2 ) = 614, V(y) = 162
22 To obviate this, the dean substitutes the model A professor comes by, looks at the last formula, and complains that publishing is disregarded in the department … X 1 : hours of teaching X 2 : number of publications Statistical coherence - Dean’s example Y = 0.5 x 1 + 0.5 x 2 Y = 0.7 x 1 + 0.3 x 2 with
23 Using these points we can compute a statistic that tells us: Example: Si =0.88 we could reduce the variation of the ARWU scores by 88% by fixing ‘Papers in Nature & Science’. Si: ruler for ‘importance’ Statistical coherence ARWU score
24 Statistical coherence First order sensitivity index Pearson’s correlation ratio Smoothed curve Unconditional variance Our suggestion: to assess the quality of a composite indicator using – instead of R i 2 (Pearson product moment correlation coefficient of the regression of y on x i ) its non-parametric equivalent
25 Features: it offers a precise definition of importance, that is ‘the expected reduction in variance of the CI that would be obtained if a variable could be fixed’; it can be used regardless of the degree of correlation between variables; it is model-free, in that it can be applied also in non-linear aggregations; it is not invasive, in that no changes are made to the CI or to the correlation structure of the indicators (unlike what we will see next on uncertainty analysis). Statistical coherence Pearson’s correlation ratio ‐ First order effect ‐ Top marginal variance - Main effect … Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A
26 One can hence compare the importance of an indicator as given by the nominal weight (assigned by developers) with the importance as measured by the first order effect (Si) to test the index for coherence. Statistical coherence
27 Statistical coherence - ARWU Si’s are more similar to each other than the nominal weights, i.e. ranging between 0.14 and 0.19 (normalized Si’s to unit sum; CV estimates) when weights should either be 0.10 or 0.20. Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A
28 Statistical coherence - THES The combined importance of peer-review variables (recruiters and academia) appears larger than stipulated by developers, indirectly supporting the hypothesis of linguistic bias at times addressed to THES. The teacher/student ratio, a key variable aimed at capturing the teaching dimension, is much less important than it should be (normalized Si is 0.09, nominal weight is 0.20). Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A
29 Global rankings at the forefront of the policy debate Overview of two global university rankings (ARWU, THES) Statistical Coherence Tests Uncertainty analysis Policy Implications Conclusions
30 Notwithstanding recent attempts to establish good practice in composite indicator construction (OECD, 2008), “there is no recipe for building composite indicators that is at the same time universally applicable and sufficiently detailed” (Cherchye et al., 2007). Booysen (2002, p.131) summarises the debate on composite indicators by noting that “not one single element of the methodology of composite indexing is above criticism”. Andrews et al. (2004)] argue that “many indices rarely have adequate scientific foundations to support precise rankings: […] typical practice is to acknowledge uncertainty in the text of the report and then to present a table with unambiguous rankings” Uncertainty analysis - Why?
31 Space of alternatives Including/ excluding variables Normalisation Missing data Weights Aggregation Country 1 10 20 30 40 50 60 Model averaging: whenever a choice in the composite setting-up may not be strongly supported or if you may not trust one single model, we’ll recommend you to use more models Country 2Country 3 Uncertainty analysis - How?
32 How to shake coupled stairs How coupled stairs are shaken in most of available literature Uncertainty analysis - How?
33 Objective of UA: NOT to verify whether the two global university rankings are legitimate models to measure university performance To test whether the rankings and/or their associated inferences are robust or volatile with respect to changes in the methodological assumptions within a plausible and legitimate range. Uncertainty analysis – ARWU & THES Question: Can we say something about the quality of the university rankings and the reliability of the results? Source: Saisana, D’Hombres, Saltelli, 2011, Research Policy 40, 165–177
34 Activate simultaneously different sources of uncertainty that cover a wide spectrum of methodological assumptions Estimate the FREQUENCY of the university ranks obtained in the different simulations imputationweighting normalization Number of indicators Aggregation 70 scenarios Uncertainty analysis – ARWU & THES
35 Harvard, Stanford, Berkley, Cambridge, MIT: top 5 in more than 75% of our simulations. Univ California: original rank 18 th but could be ranked anywhere between the 6 th and 100 th position Impact of assumptions: much stronger for the middle ranked universities Uncertainty analysis – ARWU
36 Impact of uncertainties on the university ranks is even more apparent. M.I.T.: ranked 9th, but confirmed only in 13% of simulations (plausible range [4, 35]) Very high volatility also for universities ranked 10 th -20th position, e.g., Duke Univ, John Hopkins Univ, Cornell Univ. Uncertainty analysis – THES
39 1.HEI provide an array of services and positive externalities to society (universal education, innovation and growth, active citizens, capable entrepreneurs and administrators, etc.) which call for multi-dimensional measures of effectiveness and/or efficiency. 2.A clear statement of the purpose of any such measure is also needed, as measuring scientific excellence is not the same as measuring e.g. employability or innovation potential, or where to study, or how to reform the university system so as to increase the visibility of national universities. Policy implications
40 3.Indicators and league tables are enough to start a discussion on higher education issues BUT not sufficient to conclude it. 4.Assigned university rank largely depends on the methodological assumptions made in compiling the rankings. 9 in 10 universities shift over 10 positions in the 2008 SJTU. 92 positions (Univ Autonoma Madrid) and 277 positions (Univ Zaragoza) in Spain, 71 positions (Univ Milan) and 321 positions (Polytechnic Inst Milan) in Italy, 22 positions (Univ Paris 06) and 386 positions (Univ Nancy 1) in France. Policy implications
41 5.A multi-modeling approach can offer a representative picture of the classification of universities by ranking institutions in a range bracket, as opposed to assigning a specific rank which is not representative of the plurality of opinions on how to assess university performance. 6.The compilation of university rankings should always be accompanied by coherence tests & robustness analysis. Policy implications
42 ‘rankings are here to stay, and it is therefore worth the time and effort to get them right’ (Alan Gilbert, Nature News, 2007) ‘because they define what “world-class” is to the broadest audience, these measures cannot be ignored by anyone interested in measuring the performance of tertiary education institutions’ (Jamil Salmi, 2009) Conclusions
43 ‘rankings are here to stay’ (Sanoff, 1998) ‘ranking systems are clearly here to stay’ (Merisotis, 2002) ‘tables: they may be flawed but they are here to stay’ (Leach, 2004) ‘they are here to stay’ (Hazelcorn, 2007) ‘like them or not, rankings are here to stay’ (Olds, 2010) ‘whether or not colleagues and universities agree with the various ranking systems and league table findings is insignificant, rankings are here to stay’ (UNESCO, 2010) ‘educationalists are well able to find fault with rankings on numerous grounds and may reject them outright. However, given that they are here to stay…’ (Trofallis, 2012) ‘while many institutions had reservations about the methodologies used by the rankings compliers, there was a growing recognition that rankings and classifications were here to stay’ (Osborne, 2013) Conclusions
44 More at: http://composite-indicators.jrc.ec.europa.eu (or simply Google “composite indicators” – 1 st hit)
45 1.Paruolo P., Saisana M., Saltelli A., 2013, Ratings and Rankings: voodoo or science?. J Royal Statistical Society A 176(2). 2.Saisana M., D’Hombres B., Saltelli A., 2011, Rickety Numbers: Volatility of university rankings and policy implications. Research Policy 40, 165–177. 3.Saisana M., D’Hombres B., 2008, Higher Education Rankings: Robustness Issues and Critical Assessment, EUR 23487, Joint Research Centre, Publications Office of the European Union, Italy. 4.Saisana M., Saltelli A., Tarantola S., 2005, Uncertainty and sensitivity analysis techniques as tools for the analysis and validation of composite indicators. J Royal Statistical Society A 168(2), 307-323. 5.OECD/JRC, 2008, Handbook on Constructing Composite Indicators. Methodology and user Guide, OECD Publishing, ISBN 978-92-64-04345-9. References and Related Reading