More Information for Understanding Impact Scores


As you may know, the recent “Enhancing Peer Review” self-study process at NIH led to, among other things, the introduction of bulleted critiques and overall impact scores. We received a lot of feedback on the changes, and a recurring theme was the desire for more information about the impact scores.

In light of this, we have begun asking reviewers to write a paragraph summarizing the factors that informed their overall impact score to supplement the bulleted critiques. This paragraph is included in the summary statements, and I hope you find it useful.

We also initiated a continuous review of the peer review process. You can see a report on the first round of surveys on the OER website.


  1. I have been reviewing NIH proposals for over 20 years. My interpretation of the survey results is that the format of the review form does not matter. My experience is that the most important factor in a valid review is the quality of the reviewers. NIH needs to find a way to reward study section members and listen to their input. Too many well-funded investigators refuse to review proposals but complain loudly if they are not funded. My impression is that most study section members feel that they are fighting a losing battle against the NIH regarding quality review of proposals. The emphasis is more and more on simple metrics and less and less on the quality of the science. The fundamental criterion needs to be impact on the field.

    1. I fully agree with Mr. Fitzpatrick. The consideration of the NIH peer review data being not thorough enough isn’t the real issue. Bulleted critiques and impact scores aside, the fact remains that quality proposals need to be acknowledged in a prompt manner. There needs to be better filtering of proposals so as to be received by the most appropriate investigators.

  2. I have been a continuously funded NIH R01 principal investigator since 1993 and my opinion is that the bulleted critiques are not sufficiently informative. When complicated issues are being addressed in a review, it is simply not possible to reduce the conversation to a simplistic level without leaving the applicant largely unable to fathom what the study section has discussed. What is needed is a summary of the give-and take that the study section discussion led to . I have found that the only way to really know what transpired during the discussion is to get on the phone and contact the study section SRA to ask for additional information. Amazingly enough, this works quite well. I have been able to find out from the SRA what the real problem might have been with a particular grant. I hope you will consider my comments seriously and that you will appreciate that for those of us who spend endless hours preparing a grant proposal, we deserve a more informative response on the part of the reviewers. George G. Holz, Professor of Medicine and Pharmacology, SUNY Upstate Medical University, Syracuse, NY.,

  3. The new scoring system has not been well adopted and as before its adoption the range of scores in many study sections is very narrow. In many cases a rating of 4 or worse is given to a bullet point which has only positives and under negatives the comment is none. That can not be and SROs are unable to correct this situation. Thus it is no wonder that they and POs are less satisfied with the results. In addition with the ordinal system, non reviewing members of the study section are more likely to go with a single “neagative” review even if it is offered by the least informed and expert reviewer.

    1. I agree completely with the comments above. I once participated in a review of contracts for NIH and they used a system where unless negative comments were added, you could not take points for that particular item. That forced the reviewers to think more carefully when assigning scores. But the perpetual problem continues: study sections, by design, are risk-averse and do not reward innovation!

  4. The bulleted critiques are fine. The main issue is the 0-9 scorring system. I served in many pannels and I was surprized to see how easy most of the reviewers jump between numbers. A 10-90 scale will do a much better job.
    10-20 oustanding
    20-30 excellent
    30-40 v good

  5. I am kind of leary of the so-called impact score. It is skewed toward the established scientists and/or institutions. This score is heavily based on the reputation of the PI or the institution. Thus the less knowns are biased.

  6. I am a continuously NIH funded researcher since 1972 and a member of over 50 NIH study sections, including two 4 year stints as a regular member. I have strong feelings about the new peer review initiatives that derived from these experiences as an NIH grant applicant and as a reviewer. The idea of an impact score is a good one, but the 1 to 9 scale is extremely limiting for the reviewer. Having an impact scoring range of 10 to 90 would enable the reviewer to give a more precise evaluation of applications and would spread out the scores. These outcomes should fit in with the CSR’s often stated desire to reduce the “bunching” of scores by study sections.
    The bulleted critiques are faster to write and are useful for some of the headings in the review format (Candidate and Environment). However, they do not allow the reviewer to formulate a meaningful critique of the important sections, ie., Approach, Innovation and Significance, that will be useful to the applicant in a revised application. As a researcher who has been helped by the criticisms raised in grant reviews, I think a major goal of any changes in the review process should be to assist the applicant, ie., to increase the information flowing from the study section to the applicant. It is a worthy aim to make the reviewer’s job easier, but not at the expense of the applicant.

  7. I couldn’t agree more with George Holz. I am a long term investigator who has served on Study Sections as reviewer and chair and I too find the bullet points to be of little use when considering a re-submission. I concur that the SRO and PO can be of great help, but not all are willing or able to provide such information. Besides it should not require such a second step; surely a proper review process should directly inform the applicant.

  8. I like very much this insightful commentary of my colleagues because it accurately reflects on the current situation with NIH funding. And it doesn’t look pretty, no matter how much we try to improve the review process. The bottom line is quite simple: there is not enough money to sustain the US leadership role in sciences. With all the consequences, of course. As an illustration point, for some of us successful funding means to be placed within 9%. Now, do we really believe that among 11 researchers, while this group often includes scientists with a proven funding record of at least 10-20 years, only one PI is capable of making an application that has significant impact in the field and thus deserves the funding?

  9. I currently serve on an NIH study section and have served on other study sections and special emphasis panels in the past. I prefer the new impact scoring system to the old one, perhaps because our SRA and our Chair are so diligent in making sure that our final scores are compatible with our critiques.

    However, I do have a concern about the new scoring system. The Scoring System and Procedure document that we reviewers are supposed to follow includes three different sets of anchors for the new rating scale. The first concerns how high or low the impact of the research is expected to be, the second is based on adjectives ranging from poor to exceptional, and the third reflects the application’s balance of strengths and weaknesses.

    Most of us probably rely more heavily on the last set of anchors than on the other two. Most of the time, the three sets of anchors point us toward the same scores, so it doesn’t matter anyway. Occasionally, however, the anchors seem to pull our ratings in different directions. For example, an application may be excellent in most respects (suggesting a rating of 3), yet have at least one moderate weakness (suggesting a rating of 5).

    Furthermore, the new scoring system rules imply that there is a fourth set of anchors in stating that 5 is considered an average score. This suggests that in addition to following the criterion-based anchors discussed above, we are also expected to follow a normative approach. I appreciate the fact that this is intended to constrain grade inflation, and that this is a challenging problem. Nevertheless, it can be difficult to adhere to both kinds of approaches at the same time. Each application deserves to criterion scored on its own merits, even if the meeting’s average score rises above or falls below 5.

    In my opinion, these aren’t fatal flaws – not even close. I would score the new scoring system at least a 3 (very strong with only some minor weaknesses). Nevertheless, I would urge NIH to give some thought to how to iron out these minor kinks in the new scoring system.

  10. The SRO is usually willing to provide a “gestalt” of the discussion, and frequently his/her impressions from the discussion is more helpful than the actual written review. In this day and age of technology, communications, and transparency, why doesn’t the NIH just provide each proposal author with a written transcript of the discussion that takes place regarding their proposal (in PDF format, posted to their eCommons account)? The actual content of the critique would be clearly communicated to the author, and the reviewers would be relieved of the burden of writing lengthy critiques, while still getting the crucial points across to the PI. Undoubtedly the bullet-point critiques would be fleshed out nicely by this approach.

  11. I agree with Jennifer, Summary by SRA followed by the written transcript of the discussion will be much more helpful than the bullet-point critics.

Before submitting your comment, please review our blog comment policies.

Leave a Reply

Your email address will not be published. Required fields are marked *