Presentation on theme: "Metrics, research award grades, and the REF Harvey Goldstein University of Bristol With support from Mary Day, Ian Diamond and Phil Sooben."— Presentation transcript:
Metrics, research award grades, and the REF Harvey Goldstein University of Bristol With support from Mary Day, Ian Diamond and Phil Sooben
The context REF proposal to use metrics –Journal impact factors and citations –Research income –Research students –Research council grant application grades Little discussion so far of the technical measurement issues associated with Research Council awards
The database All ESRC applications Details of applicants, reviewer, assessor and board grades Identification of departments and HEIs Award amounts (not considered) Final analysis of 2698 applications, 1698 departments
A naïve analysis Consider the discipline of Education –Note that we have not been able to assign departments to RAE disciplines so principal discipline used. Similar results for other disciplines Final award grade converted to a numeric score All award types considered – similar results if fellowships excluded PI weighted more than Co-apps: same award score given to each applicant Weighted analysis of these scores in a 3-level model: –Application within Applicant within HEI
Results of 2-level model Insensitive to specific weighting system
Problems Invalid analysis since scores not independent: – Imagine a situation where we have N applications, each of which has a different pair of applicants drawn from two particular HEIs, A & B where for an application each applicant is given the applications awarded score. A simple analysis would compare the mean score for HEI A with the mean score for B, but these mean scores are equal by definition. Thus this analysis contains no information about HEI differences, as opposed to the case where for each pair we have a score derived separately for each applicant. Applicants may also come from different departments not associated with the principal discipline
A more valid analysis We reconceptualise the data as follows: –We assume each applicant contributes a level of quality to the application – –The application score is just the average of these (weighted according to whether PI or Coapp) –Some applicants are on more than one application associated with different combinations of other applicants and this allows us, in principle, to assign (estimate) a score for each applicant –Known as a multiple membership (MM) model Formally: –i indexes application, j indexes applicant, is application score
Another serious problem There are, for education, 454 applications and 989 applicants and in general there are more applicants than applications. This means that we cannot use the MM model to score applicants – non-identifiability. However, there are only 98 HEIs so we can fit a model that identifies the HEI only (aggregating all applicants for one HEI within an application – will lead to some overestimation of the separation of HEIs). This provides HEI/department scores.
Results Note that HEI variance now about half what we saw before.
Caterpillar plot Note how all confidence intervals overlap zero So no separation from overall mean is possible. Also, of the four highest in naïve analysis, only one is in four highest here. Similar result if fellowships excluded
Its even more complicated So far all applicants on an application have been assigned to the principal discipline. We need to assign to their actual discipline/department and this implies we should carry out a joint analysis of all applications Again, there are 2698 applications and but only 1698 departments So we have a MM model and we estimate scores for each department
Results The between-department variance is now larger (19%). Only 0.5% of departments have CIs overlapping the mean. Including the principal discipline in the model indicates (moderate) discipline differences in award grading (see below). Table 6. Two level multiple membership model for all applications with numerical final grade as response. MCMC estimates. ParameterEstimateStandard error Intercept Level 2 variance (HEI/department) Level 1 variance (Application) VPC18.8%
One hundred lowest and highest ranked residuals for multiple membership model using all departments, with 95% confidence intervals.
MM model with selected principal disciplines (>100 applications) ParameterEstimateStandard error Intercept (Econ) Management Social Policy Education Sociology Human Geog Psychology Level 2 variance (HEI/department) Level 1 variance (Application) VPC16.0%
Using the results Given uncertainty how useful are they? Can they be combined (formally) with citations to provide greater precision? The technical limitations of the analyses are likely to apply to citation analyses also –E.g. analysis of NAS 2001 database shows 2,600 papers with 13,000 unique authors (Borner et al., 2004) What are side effects – perverse incentives
Perverse incentives All high stakes performance monitoring systems encourage gaming – some possibilities: –Large numbers of co-applicants squeezed into applications –Discouraging of cross-disciplinary applications –HEI behaviour would change over time with a destabilising and distorting effect. –Encouragement of many small and short term grants rather than fewer large and long term ones. –Distort behaviour of referees and board members (How?)
Comparisons with RAE 2008 scores Results for Economics and Education: Simple (4,3,2,1,0) RAE scoring system Insensitive to other scorings Dept. results (residuals) from ESRC analysis (weighted) averaged to RAE HEI categories.
Correlations between RAE and ESRC scores – selected disciplines DisciplineCorrelation Sociology0.25 Economics0.50 Education0.30 Psychology0.19 Management0.07
Economics 27 HEIs. Correlation =0.50 (P<0.01) highest 7 RAE scores are (from the top) are:LSE, UCL, Warwick, Oxford, Essex, Nottingham, Bristol
Economics RAE ranks
Education 37 HEIs. Correlation = 0.30 (P=0.07) The top 7 are: IOE=Oxford, Cambridge=Kings, Bristol= Leeds, Exeter
What next? Incorporation of other research councils in a combined analysis Include citation data in a combined model: –In the REF it can be argued that an analysis at least as complex as the present is unavoidable for validity –Using citations encounters the same issues of more applicants than papers/books.