Presentation is loading. Please wait.

Presentation is loading. Please wait.

Archived File The file below has been archived for historical reference purposes only. The content and links are no longer maintained and may be outdated.

Similar presentations


Presentation on theme: "Archived File The file below has been archived for historical reference purposes only. The content and links are no longer maintained and may be outdated."— Presentation transcript:

1 Archived File The file below has been archived for historical reference purposes only. The content and links are no longer maintained and may be outdated. See the OER Public Archive Home Page for more details about archived files.archivedOER Public Archive Home Page

2 Statistical Parameters in Peer Review David Kaplan, MD, PhD Professor of Pathology Case School of Medicine Case Western Reserve University

3 Primary Assumptions The Center for Scientific Review (CSR) is the gatekeeper for NIH NIH has difficulty in recognizing innovativeness in part because there is no good measure To promote innovativeness, it is most reasonable to make changes in the procedures of CSR

4 Secondary Assumptions Peer review at CSR should utilize statistics in the most robust or powerful way possible Peer review should reflect the peer group as broadly as possible

5 Peer review, as currently practiced, involves obtaining inappropriately small sample sizes (2 or 3 reviews/application) Sample size for peer review is constrained by the size of grant applications Peer review does not use random sampling but instead uses quota sampling which is subject to significant bias Peer review at NIH involves discussions among reviewers that alter scores, meaning that the scores are not independently derived Scoring of proposals is low-precision but an unrealistic degree of precision is required to differentiate among many applications Sampling for NIH Peer Review

6 Peer review, as currently practiced, uses the arithmetic mean alone in judging an application Scores given in peer review are nonparametric, producing ordinal (rank ordering) evaluations but not information to scale Other statistical calculations known to be important for nonparametric analysis should be considered: variance: the scatter of the distribution skew: the symmetry of the distribution kurtosis: the peakedness of the distribution Statistical Parameters Other than the Mean

7 In order to use statistical parameters other than the arithmetic mean, a funding agency would have to collect larger samples in a more random manner with independence among the opinions proffered Larger samples would require smaller applications Randomness would require a different selection scheme for reviewers Independence among reviewers would obviate the need for meetings Scores would be set in a low-precision scale Why establish a complementary peer review system? Proposed Peer Review Sampling

8 Current NIH peer review, using the arithmetic mean alone to evaluate applications, has done well in identifying excellent grant proposals What it has not done well in identifying are innovative grant proposals Innovativeness and excellence should not be conflated There is no robust measure for innovativeness at the present time Perhaps statistical measures other than the arithmetic mean could identify innovative grant proposals Identifying Innovation

9 Variance/Kurtosis as a Measure of Innovativeness Hypothesis: variance and/or kurtosis can be robust indicators of innovativeness Innovation elicits controversy instead of consensus Consensus is reached when proposals are relatively close to what is generally accepted Innovation refers to proposals that are new or unusual enough so that consensus cannot be reached Variance and Kurtosis are statistically valid measures that could indicate the degree of controversy (or conversely the degree of consensus) associated with a proposal These assertions rely upon the assumption that a statistically robust system of sampling and scoring is established

10 Association between Innovation and Controversy The essence of innovation is novelty New ideas naturally collide with established ideas This collision is manifest as controversy Controversy can be measured by variance and/or kurtosis in a statistically valid system

11 Value of Innovation Addresses Its Controversial Nature Innovation is accepted to be particularly valuable because it allows for large advances in our understanding Resolving controversy offers the potential for large advances (as opposed to the further development of accepted ideas or technologies which advance our understanding incrementally) The association of innovation with controversy makes sense in terms of the potential for large advances in understanding which would also account for the most important and valuable characteristic

12 Possible Distributions and Associated Statistical Parameters for the Evaluation of Applications with an n of 30 Reviewers mean = 3.83 variance = 0.78 kurtosis = 0.12 1 2 3 4 5 Proposal Score mean = 3.97 variance = 0.31 kurtosis = 6.89 mean = 2.93 variance = 1.39 kurtosis = -1.21 mean = 2.97 variance = 1.92 kurtosis = -1.94

13 Variance/Kurtosis Mean obvious innovativeexcellent flawed traditional selection selection for innovation

14 Proposal: Specific Aims To establish a test of a peer review system that utilizes various statistical parameters in order to identify innovativeness To determine the number and type of reviewers needed to provide stable values for the various statistical parameters

15 Proposal: Experimental Design 1-page grant applications will be written in sets of 3 with varying degrees of relative innovativeness as assessed by design and accepted by an independent panel The proposals will be sent out to 20-100 reviewers who will be asked: 1) to score the proposal on a scale of 1-5 in terms of the significance of the proposed work to enhance our scientific understanding/capabilities; 2) to let us know how long the reviewer spent on the proposal; 3) to indicate the number of years since the reviewer has completed post- doctoral fellowship and the academic rank of the reviewer; and 4) to score on a scale of 1-5 the relative closeness the proposal is to the reviewer’s area of expertise in both conceptual and technical arenas Proposal scores will be analyzed for arithmetic means, variance, skew, and kurtosis and reviewers will be assessed for the number of independent evaluations needed to obtain stable statistical values, seniority, and relative closeness to the area of investigation

16 Proposal: Significance Identification of innovativeness is valuable Peer review of 1-page proposals by a survey of many scientists without a meeting is more efficient than the current system and is more representative of the sensibilities of the entire relevant biomedical research community With an independent measure of excellence versus innovativeness, programs whose goal it is to develop ideas already accepted versus programs whose goal it is to develop new ideas can be accommodated and thereby enrich the portfolio of projects funded Establishment of a review paradigm for evaluating the granting program itself

17 Potential Benefits Minimizing bias Greater satisfaction among scientists Greater control for administration Identification of innovativeness along with excellence Solidify CSR as a flexible and intelligent regulator of NIH granting activities


Download ppt "Archived File The file below has been archived for historical reference purposes only. The content and links are no longer maintained and may be outdated."

Similar presentations


Ads by Google