10 Comments
RePORT is your go-to source for NIH data, and I’m excited to let you know about a new addition to the NIH Data Book on RePORT: data on peer review across NIH.
The new “NIH Peer Review” section provides information related to initial peer review across NIH. It includes data on peer review organized by the Center of Scientific Review as well as by NIH institutes and centers.
Which kinds of data, and what do they tell us? For starters, the data truly demonstrate how you, our peer reviewers, are the lifeline of the scientific process. In 2013 alone nearly 25,000 of you served the biomedical research enterprise by providing over 230,000 critiques of grant applications and by participating in over 2,500 peer review meetings. The numbers of reviewers and review meetings have remained relatively stable over the past 3 years, even during last year’s shutdown, again demonstrating what a valuable and reliable resource you are.
I also want to particularly call your attention to the data on the major activity codes of the applications reviewed, and on the breakdown of “R”-series activity codes. If you look at these two graphs together you will note the breadth of award programs that undergo peer review and that the R01s, which make up the bulk of “R”-applications, only comprise about 40% of the full range of applications that go through the peer review system.
I encourage you to check out this new addition to the NIH Data Book and the introduction text and notes that explain the kinds of data included, and how we calculate these data. It’s truly a humbling experience to look at the landscape of peer review at NIH and how the enormous commitment of the biomedical research community makes it all possible. As I said before in a short video I recorded after the shutdown, I truly thank our peer reviewers for the remarkable benefit you provide to our enterprise.
Dr. Rockey, has NIH ever validated its peer-review process? That is, has it ever cross-checked a meaningful number of applications, *including* unsuccessful/unfunded ones, either (a) in a head-to-head comparison with a new set of reviewers or (b) against subsequent reality, i.e., later progress in biomedical science along the lines proposed in an application? If so, what statistical power characterized this experiment?
I ask because such validation is a way to learn whether a process is built on quicksand or granite, and (when analyzing validation data more closely) to learn how to improve. Cross-checking against reality is precisely what distinguishes modern science from medieval scholasticism.
So validation is not only what we expect of any NIH _applicant_. It is also expected of any organization — scientific, manufacturing, or otherwise — whose culture or leadership honestly believes in the value of knowing about the quicksand and of learning how to improve. Conversely, it is not expected of organizations that see themselves as political: After all, cross-checking might expose inconvenient deficiencies.
So NIH’s willingness to cross-check its peer-review process seems to be an *observable* attribute of belief that it is a scientific (vs. political) organization. Just my opinion, of course.
I agree that the review system is amazing. However, as the recipient of multiple scores on R01 and U01 proposals over the last year ranging from 11 to 40, and as collaborator on many more proposals that did well, I feel I can raise an issue not as “sour grapes”, but as a legitimate concern. I continue to think that the system would be improved if reviewers felt at least some sense of accountability for the accuracy of their comments and critiques. In the system as presently organized, there is none, other than the very rare challenge of a particularly egregious review. However, reviewer mistakes and mistaken assumptions are rampant. If reviewers thought that evaluations of proposals would be randomly sampled (even 1% or less) and reviewed by an independent source for accuracy, they might be inclined to write more careful and accurate comments. I understand that this would be a lot of extra work, but it would prevent or diminish the casual, careless comments that undermine the review of a good proposal based on reviewer error.
Let me suggest that the older system of resubmission allows at least the applicant to assay the reviewer more than once. While people rotate frequently from review panels, we would never get a long term measure of behavior.
The fundamental issue is a shortage of funds and how those funds are used. If the NIH set a lower limit on PI salaries or eliminated them (it is federal welfare program for universities), many more investigators could be funded. It would also help if the NIH audited the indirect costs since I know for a fact that they are misused. Schools do not deliver to the PI what they charge the NIH.
I agree with the sentiment that more care needs to be applied to evaluating and monitoring the review process. If data collection of quantitative metrics of the scientific outcome of the process can be gathered and evaluated, some improvement of the outcome should be possible. Changing multiple aspects of the review process at essentially the same time might also not have been the best experimental plan!
A couple of years ago I was a reviewer on an IAR panel, which caused me to evaluate the process in a way that I hadn’t tried before. This led me to conclude that there should be some evaluation of both the reviewers and the SRO, with consequences. What I noted about the reviewers was the following.
(1) A few reviewers were impossibly late getting their reviews in, so that by late evening Pacific time the night before the online discussion phase was supposed to start, we still had no comments. Because of time zone differences, some of us were unable to read the late reviews that finally got posted shortly before the actual review discussion time started. The review process cannot occur properly if the reviewers don’t meet their deadlines. Reviewer’s who submit late reviews could be held accountable *in their own submissions* in the future by adding points to their scores, and particularly good (on-time) reviewers might be given a bit of credit towards lower scores in their future reviews.
(2) I did a little statistical analysis of the preliminary scores, and there appeared to be significant differences among the scores of the reviewers that weren’t reflected in the subsequent discussions. Such initial-score effects can unfairly affect the decisions about which proposals to discuss, as well as the final scores. Why not correct for reviewer effects, at least in the initial scores and ranking to reduce this kind of effect? As scientists, we would do this for any experimental or observational data set for which we found day, batch, or other effects, so why not do so for reviewer effects? It would also be possible to develop reviewer-specific calibrations over time to create increasingly accurate measures of reviewer effects.
What I noticed about the management of this review by the SRO was equally problematic: there were problems with communication during the online-discussion window, not enough time allocated, with last-minute additions of more time, and then unexpected removal of that added time without sufficient notice. There has to be management of the process that is both sufficient to get the process done, and fair to the people who submit their proposals. I have done other reviews where the SRO was spectacular, and really facilitated an efficient yet fair process. Evaluation of SROs and particular study sections would help as long as there is added training (for those who need it) and/or other consequences. Maybe if the reviewers judge the management of a particular study section to be sufficiently bad the review process should be re-done without requiring the authors of the proposals to re-submit applications. Again, as scientists, if we obtain data that shows that a particular data-set is contaminated by bad results, we may discard that data set, or try to re-do the experiment.
I agree with Dr Hasins comments about accountability. “I continue to think that the system would be improved if reviewers felt at least some sense of accountability for the accuracy of their comments and critiques. In the system as presently organized, there is none, other than the very rare challenge of a particularly egregious review. However, reviewer mistakes and mistaken assumptions are rampant.”
I would also say that I am concerned in the case of SBIR review panels that very few of the reviewers are small business scientists with experience in grants. The panels are usually consist of academics and a few commercial scientists with little or no experience in grants or the realities of performing research in a small business environment. Big business scientists usually have little experience outside of a tiny niche and are very aggressive towards alternative technologies that may impact their company products. More effort is needed to place experienced small business scientists on SBIR/STTR review panels.
Agreed. The quality of the review is the area need to be improved.
This study was conducted by NHBLI investigators. I wondered if other ICs have conducted similar studies.
Title: Percentile Ranking and Citation Impact of a Large Cohort of National Heart, Lung, and Blood Institute–Funded Cardiovascular R01 Grants
in Circulation Research, January 9 2014 (Circulation Research.2014; 114: 600-606)
Rationale: Funding decisions for cardiovascular R01 grant applications at the National Heart, Lung, and Blood Institute (NHLBI) largely hinge on percentile rankings. It is not known whether this approach enables the highest impact science.
Objective: Our aim was to conduct an observational analysis of percentile rankings and bibliometric outcomes for a contemporary set of funded NHLBI cardiovascular R01 grants.
Conclusions: In a large cohort of NHLBI-funded cardiovascular R01 grants, we were unable to find a monotonic association between better percentile ranking and higher scientific impact as assessed by citation metrics.
Sally,
Can you get the Data information in downloadable spreadsheet form so we can do our own analyses?
The data shown is available on the “data” tab for each graph, and can be exported into a table in Word.
I agree that the reviewers need to be responsible and accountable. I have been a reviewer for over 30 years and review for quite a few study sections and am troubled. Several points come to my mimed.
1. Reviewers want the grant they think is the best in their pile of grants want to be funded. So they often discredit other grants and push theirs forward by giving a great score. Imagine several reviewers doing the same.. clustering of scores near the top.
2. Young reviewers get carried away by nitty grit ties of which cell type, model etc, and miss the overall picture.
3. Anonymity should be both ways…Is there a way the grant could be totally blinded and a different panel or a subset of reviewers looking only at the PIs record/environment etc?
4. NIH sends mixed messages to investigators by not challenging reviewers. Of course, who would like to be challenged for $200 and days of work!
5. Signaling pathways, microRNA, stem cells, etc have taken over hypothesis driven science. Reviewers prefer these over hypothesis driven approaches.
The thrill of being a scientist is waning…