Universiteit Antwerpen Conference "New Frontiers in Evaluation", Vienna, April 24th-25th 2006. Reliability and Comparability of Peer Review Results Nadine.

Slides:

Advertisements

Similar presentations

Usage statistics in context - panel discussion on understanding usage, measuring success Peter Shepherd Project Director COUNTER AAP/PSP 9 February 2005.

Advertisements

Potential impact of PISA

A NEW METRIC FOR A NEW COHESION POLICY by Fabrizio Barca * * Italian Ministry of Economy and Finance. Special Advisor to the European Commission. Perugia,

Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.

Dissertation Writing.

SEED Department of Sustainable development, Environmental Sciences and Engineering Institutionen för Hållbar utveckling, Miljövetenskap och Teknik Anna.

A Performance-Based Evaluation Model for Rewarding Merit in Italian Schools Donatella Poliandri, INVALSI Paola Muzzioli, INVALSI Isabella Quadrelli, INVALSI.

NSF Merit Review Criteria Revision Background. Established Spring 2010 Rationale: – More than 13 years since the last in-depth review and revision of.

Asha Balakrishnan Vanessa Peña Bhavya Lal Task Leader November 5, 2011

Critical appraisal of the literature Michael Ferenczi Head of Year 4 Head of Molecular Medicine Section, National Heart and Lung Institute.

1 Academic Rankings of Universities in the OIC Countries April 2007 April 2007.

Aims Correlation between ISI citation counts and either Google Scholar or Google Web/URL citation counts for articles in OA journals in eight disciplines.

H E L S I N G I N K A U P P A K O R K E A K O U L U H E L S I N K I S C H O O L O F E C O N O M I C S Orientaatiopäivät 1 Writing Scientific.

Research Proposal Development of research question

Office of Portfolio Analysis CSR Advisory Council October 20, 2014 George Santangelo Ian Hutchins Fai Chan Office of Portfolio Analysis (OPA) Division.

Jukka-Pekka Suomela 2014 Ethics and quality in research and publishing.

How to write a publishable qualitative article

Publishing strategies – A seminar about the scientific publishing landscape Peter Sjögårde, Bibliometric analyst KTH Royal Institute of Technology, ECE.

Performance Measurement. Integration of Information for Process Improvement and Innovation Palmira López-Fresno President. Quality Service Committee Spanish.

Dilemma and Governance of Peer Review in Humanities and Social Sciences Research CHUNLIN JIANG WISE Lab & Science of Science and Management of Technology.

Culture Programme - Selection procedure Katharina Riediger Infoday Praha 10/06/2010.

ACADEMIC PERFORMANCE AUDIT

Tips for Authors Submitting Manuscripts for the Special Issue on The Science of Community Engagement Darius Tandon, PhD Deputy Editor Eric B. Bass, MD,

Experiences with a bibliometric indicator for performance-based funding of research institutions in Norway Gunnar Sivertsen Nordic Institute for Studies.

Quality school library – how do we find out? Polona Vilar Department of LIS&BS, Faculty of Arts, University of Ljubljana, SLO Ivanka Stričević Department.

Bibliometrics: coming ready or not CAUL, September 2005 Cathrine Harboe-Ree.

How to Write a Critical Review of Research Articles

20 YEARS OF SCIENTIFIC RESEARCH IN HEALTH/WORK/ENVIRONMENT September 6, 2012 Thoughts of a reviewer Prof Dick Heederik, PhD IRAS, Utrecht University, The.

CHAPTER III IMPLEMENTATIONANDPROCEDURES.  4-5 pages  Describes in detail how the study was conducted.  For a quantitative project, explain how you.

Crossing Methodological Borders to Develop and Implement an Approach for Determining the Value of Energy Efficiency R&D Programs Presented at the American.

THOMSON REUTERS—GLOBAL INSTITUTIONAL PROFILES PROJECT DR. NAN MA SCIENCE AND SOLUTION CONSULTANT THOMSON REUTERS OCT 19 TH, 2010.

Journal Impact Factors: What Are They & How Can They Be Used? Pamela Sherwill, MLS, AHIP April 27, 2004.

Transparency in Searching and Choosing Peer Reviewers Doris DEKLEVA SMREKAR, M.Sc.Arch. Central Technological Library at the University of Ljubljana, Trg.

Evaluation in R&D sphere in Ukraine: Real practice and problems of transition to new standards Igor Yegorov Centre for S&T Potential and Science History.

Where Should I Publish? Journal Ranking Tools eigenfactor.org SCImago is a freely available web resource available at This uses.

Plymouth Health Community NICE Guidance Implementation Group Workshop Two: Debriding agents and specialist wound care clinics. Pressure ulcer risk assessment.

The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:

Communication of Research Results, Local Science Journals, and Evaluation of Scientific Performance: One or Three Divergent Issues? Anna María Prat Head.

Literature Review. Outline of the lesson Learning objective Definition Components of literature review Elements of LR Citation in the text Learning Activity.

Practicalities and Technical Implementation of a Synchronised Call Christian Listabarth.

EuroCRIS Platform Meeting - Vienna 2-3 October 1998 CRIS as a source for tracking science publication patterns Fulvio Naldi - Carlo Di Mento Italian National.

Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.

1 Making a Grope for an Understanding of Taiwan’s Scientific Performance through the Use of Quantified Indicators Prof. Dr. Hsien-Chun Meng Science and.

Round Table Discussion Bibliometric Indicators in Research Evaluation and Policy Colloque Evolution des publications scientifiques Académie des sciences,

Outcomes of the online academia consultation Mr. Christopher Clark Head, Partnership and Resource Mobilization Division International.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

How is a grant reviewed? Prepared by Professor Bob Bortolussi, Dalhousie University

The FDES revision process: progress so far, state of the art, the way forward United Nations Statistics Division.

Pilot and Feasibility Studies NIHR Research Design Service Sam Norton, Liz Steed, Lauren Bell.

Publishing Educational Research Articles Dr. David Kaufman Faculty of Education Simon Fraser University Presented at Universitas Terbuka March 4, 2011.

Monday, June 23, 2008Slide 1 KSU Females prospective on Maternity Services in PHC Maternity Services in Primary Health Care Centers : The Females Perception.

1 Prepared by: Laila al-Hasan. 1. Definition of research 2. Characteristics of research 3. Types of research 4. Objectives 5. Inquiry mode 2 Prepared.

Project VIABLE - Direct Behavior Rating: Evaluating Behaviors with Positive and Negative Definitions Rose Jaffery 1, Albee T. Ongusco 3, Amy M. Briesch.

Impact of Salary Bonus in Environmental Engineering Yee-Shuan Lee* and Yuh-Shan Ho # Bibliometric Centre, Taipei Medical University - Wan-Fang Hospital.

CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.

INTRODUCTION TO BIBLIOMETRICS 1. History Terminology Uses 2.

Internal Funds: Some practical suggestions from OR-member(s) Matthias E. Storme Facultaire onderzoeksdag 19 mei 2015.

Studying the use of research knowledge in public bureaucracies Mathieu Ouimet, Ph.D. Department of Political Science Faculty of Social Sciences CHUQ Research.

Where Should I Publish? Journal Ranking Tools

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Writing a sound proposal

Literature Reviews and Research Overview

European VIRTA pilot – current situation

European VIRTA pilot – eurooppalaisen julkaisutietovirran pilotointi

Reading Research Papers-A Basic Guide to Critical Analysis

UC policy states: "Superior intellectual attainment, as evidenced both in teaching and in research or other creative achievement, is an indispensable.

Journal evaluation and selection journal

بسم الله الرحمن الرحیم.

Student Evaluations of Teaching (SETs)

Internal and External Quality Assurance Systems for Cycle 3 (Doctoral) programmes "PROMOTING INTERNATIONALIZATION OF RESEARCH THROUGH ESTABLISHMENT AND.

Presentation transcript:

Universiteit Antwerpen Conference "New Frontiers in Evaluation", Vienna, April 24th-25th Reliability and Comparability of Peer Review Results Nadine Rons, Coordinator of Research Evaluations & Policy Studies Research & Development Department, Vrije Universiteit Brussel Eric Spruyt, Head of the Research Administration Department Universiteit Antwerpen

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 2 “Three cheers for peers” ‘Three cheers for peers’, Editorial, Nature 439, 118 (12 January 2006). "Thanks are due to researchers who act as referees, as editors resolve their often contradictory advice." "Only in a minority of cases does every referee agree..."

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 3 Presentation plan I.Validation of results Reliability & comparability II.Material investigated 'Ex post' peer review + citation analysis of teams III.Investigation of results Reliability: inter-peer agreement & different rating habits Comparability: related concepts & intrinsic characteristics IV.Conclusions Aimed at improved results, a better understanding, choosing the right method

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 4 I. Validation of results 1. Reliability Peer review: principal method to evaluate research quality. BUT: various kinds of bias & different rating habits. & Not always feasible to use measures limiting their influence.  Possible to measure reliability ? 2. Comparability H F Moed (2005), 'Citation Analysis in Research Evaluation', chapter 18: 'Peer Review and the Validity of Citation Analysis', Springer. More reliable results  better correlations with other outcomes? Correlations often relatively weak & depending on the discipline.  Can this be explained? (crucial for further acceptance!)

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 5 II. Material investigated (Peer review) 1. Peer review –Shared principles for the panel-evaluations of teams per discipline: Expertise-based International level Uniform treatment Coherence of results Multi-criteria approach Pertinent advice –Exceptions: Different experts for each team (1 discipline at VUB). Specific methodology using different indicators (1 discipline at UA).

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 6 II. Material investigated (Peer VUB) –VUB-indicators: Standard procedure 'VUB-Richtstramien voor de Disciplinegewijze Onderzoeksevaluaties', VUB Research Council (2001). Scientific merit of the research / uniqueness of the research Research approach / plan / focus / coordination Innovation Quality of the research team Probability that the research objectives will be achieved Research productivity Potential impact on further research and on the development of applications Potential impact for transition to or utility for the community Dominant character of the research (fundamental / applied / policy oriented) Overall research evaluation

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 7 II. Material investigated (Peer UA) –UA-indicators: 'Protocol 1998' for the Assessment of Research Quality, Association of Universities of the Netherlands (VSNU, 1998). Academic quality Academic productivity Scientific relevance Academic perspective Exception (1 discipline, "partial" indicators): Publications Projects Conference participations Other Globally

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 8 II. Material investigated (Citation analysis) 2. Citation analysis 'New Bibliometric Tools for the Assessment of National Research Performance: Database Description, Overview of Indicators and First Apllications', H F Moed et al., Scientometrics 33 (1995). –Centre for Science and Technology Studies (CWTS), Leiden University. –Thomson ISI citation indexes, corresponding period, same teams. –Indicators include: CPP/JCSm: citations / publication with respect to expectations for the journals CPP/FCSm: citations / publication with respect to expectations for the field JCSm/FCSm: journal citation score with respect to expectations for the field

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 9 III. Investigation of results (Overview) 1. Reliability a. Inter-peer agreement: Three groups of evaluations according to measured level of agreement. b. Rating habits: Panel-procedures vs. exception with different experts for each team.  Influence on results & on correlations between peer review indicators investigated. 2. Comparability a. Related concepts: 'Global' vs. 'partial' indicators & variation with discipline. b. Intrinsic characteristics of methods: Contributions to ratings counted differently & scale effects.  Influence on comparability investigated.

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 10 III. Investigation of results (1. Reliability, a. Inter-peer agreement) 1.Reliability 1. a. Inter-peer agreement In panels: different opinions  different positions of teams.  Level of inter-peer agreement measured by correlations between the ratings from different peers.  3 groups compared: panels with high, intermediate and low inter-peer agreement.

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 11 III. Investigation of results (1. Reliability, a. Inter-peer agreement) – Influence on results: Results compared to citation analysis:  Better inter-peer agreement = higher number of significant correlations, BUT: only at the higher aggregation level of the 3 groups.  Other mechanisms have a stronger impact on correlations. – Influence on correlations between peer review indicators: Significant correlations for each pair of peer review indicators, for each of the 3 groups (also for indiviual disciplines).  Correlations between peer review indicators are relatively robust for variations in inter-peer agreement.

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 12 III. Investigation of results (1. Reliability, b. Rating habits) 1.b. Rating habits Opinions  ratings: according to own habits, reference levels in other evaluations, scores given to other files, known use of scores,... Two cases compared: Exception with different experts for each team  scores not necessarily in line with opinions. Standard panel-evaluations  uniform reference level.

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 13 III. Investigation of results (1. Reliability, b. Rating habits) – Influence on results: Results compared to citation analysis: Panel-evaluations: significant correlations for all peer review indicators with some or all citation analysis indicators (& vice versa). Different experts: significant correlation for only 1 pair of indicators.  Rating habits can influence results significantly. – Influence on correlations between peer review indicators: Panel-evaluations: significant correlations for all pairs of indicators. Different experts: significant correlations for only 8% of the pairs.  Low observed correlations between indicators (expected to be correlated) can indicate diverging rating habits.

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 14 III. Investigation of results (2. Comparability, a. Related concepts) 2. Comparability 2.a. Related concepts – Partial indicators (publications, projects, conferences,...): no significant correlations between peer review indicators, in contrast to global indicators (scientific merit, productivity, relevance,...).  Performances in different activities are not necessarily correlated. – Correlations of peer review with citation analysis indicators: the pairs correlating best strongly vary with discipline.  An indicator may not represent a same concept for all subject areas.  Always use more than one indicator!

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 15 III. Investigation of results (2. Comparability, b. Intrinsic characteristics) 2.b. Intrinsic characteristics – Contributions to ratings: Different in the minds of peers (pro & contra) and in citation analysis (positive counts). – Scale effects: Minimum & maximum limits & their position with respect to the mean value.

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 16 III. Investigation of results (2. Comparability, b. Intrinsic characteristics) Peer rating frequency distribution: –Peer ratings: pro & contra, also elements counted 'negatively'. –Scale: minimum & maximum limit. Relative frequency distribution of peer results 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% LOW (1)LOW (2) FAIR (3)FAIR (4) AVERAGE (5)AVERAGE (6) GOOD (7)GOOD (8) HIGH (9) HIGH (10) Peer results Percentage of the number of teams (58) Scientific merit of the research — Uniqueness of the research Research approach / plan / focus / co-ordination Innovation Quality of the research team Probability that the research objectives will be achieved Research productivity Potential impact on further research and on the development of applications Potential for transition to or utility for the community Overall research evaluation

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 17 III. Investigation of results (2. Comparability, b. Intrinsic characteristics) Citation impact frequency distribution: –Citation impact: only positive counts, strong influence of highly cited articles. –Scale: minimum limit closer to mean & no maximum limit. Relative frequency distribution of citation impact All teams in the pure ISI analysis 0% 5% 10% 15% 20% 25% 30% 35% 40% 0,10,40,711,31,61,92,22,52,83,1 Indicator value Percentage of the number of teams (60) CPP/JCSm CPP/FCSm

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 18 III. Investigation of results (2. Comparability, b. Intrinsic characteristics)  Good correlations only when effects of intrinsic characteristics can be filtered out. Scientific relevance vs. Field citation impact High & intermediate inter-peer agreement group 0,0 0,5 1,0 1,5 2,0 2,5 3, Peer review "Scientific relevance" score Field citation impact (CPP/FCSm) ?

Universiteit Antwerpen Reliability and Comparability of Peer Review Results Nadine Rons (Vrije Universiteit Brussel) & Eric Spruyt (Universiteit Antwerpen) | pag. 19 IV. Conclusions Reliability – Peer review results can be influenced considerably by rating habits. – It is recommended to create a uniform reference level (e.g. using panel procedures) or check for signs of low reliability by analysing the outcomes of the peer evaluation itself. Comparability – Besides reliability, comparability of results depends on the nature of the indicators, on the subject area, on intrinsic characteristics of the methods,... – Different methods describe different aspects. The most suitable method should be carefully chosen or developed. Evaluations should always be based on a series of indicators, never on one single indicator.