We’re always interested in data, and at NIH we use data to examine the impact of new policies. Among the changes NIH implemented under its Enhancing Peer Review initiative was the assignment of scores to each of five individual criteria for research grant applications: significance, investigator(s), innovation, approach, and environment. The purpose of these criterion scores is to provide additional information to the applicant, but today I am going to use them to show you how we can examine reviewer behavior.
This topic will be familiar to those readers following the NIGMS Feedback Loop. In his latest post on this topic, NIGMS Director Dr. Jeremy Berg presented an analysis conducted by my Division of Information Services. I’d like to elaborate a bit more on this and provide some additional analyses.
Note that the data presented represents 54,727 research grant applications submitted for funding in fiscal year 2010. Of these, 32,546 applications were discussed and received final overall impact scores.
Below you see distributions of the criterion scores (averaged across reviewers) and the numerical impact scores. (50% of impact scores fall within the colored boxes with 25% below the median in dark blue and 25% above the median in light blue). These distributions reveal differences in the use of the rating scales by reviewers. It appears that reviewers use the full scoring range for all 5 criteria and the impact score (the lines below and above each colored box range from 1 to 9), but you will notice that for the approach criterion reviewers distribute their scores more widely than for the other criteria.
Also, the criterion scores are moderately correlated with each other and with the overall impact score.
As expected, when the criterion scores are subjected to a factor analysis, their inter-correlations result in two factors—one on which significance, innovation, and approach load heavily—accounting for most of the covariance among criterion scores, and a second factor with a smaller portion of covariance involving investigator and environment.
|Criterion||Factor 1||Factor 2|
For applications receiving numerical impact scores (about 60% of the total), we used multiple regression to create a descriptive model to predict impact scores using the applications’ criterion scores, while attempting to control for ten different “institutional” factors (e.g., whether the application was new, a renewal, or a resubmission). In the model, scores for the approach criterion had the largest regression weight, followed by criterion scores for significance, innovation, investigator, and environment. The same pattern of results was observed across multiple rounds of peer review and institute funding decisions.
As Jeremy noted in one of his posts, it may be tempting to over interpret such complex data, so we continue to explore their implications and will watch to see if and how these patterns evolve over time. I’m sure we’ll be hearing from you on your interpretations of these data, which will be interesting and informative.