Perspectives on Peer Review at the NIH

Posted

In today’s New England Journal of Medicine, Richard Nakamura, the director of NIH’s Center for Scientific Review (CSR), and I published an essay titled “Reviewing Peer Review at the NIH.”1 As the competition for NIH research grants has become increasingly stiff, review scores often are pointed to as the reason for failure to obtain funding. Indeed, over the past few years, peer review has come under increasing scrutiny. Critics have argued that peer review fails in its primary mission – to help funding agencies make the best decisions about which projects and which investigators to support.2,3 Recent analyses of NIH grants suggest that peer review scores, at best, are weak predictors of bibliometric outcomes (i.e. publications and citations)4, 5, 6 whereas prior investigator productivity may do a better job of predicting grant productivity.4,7

In our essay, Richard and I consider three issues regarding the ongoing debates about peer review. First, how do we measure scientific impact? It is clear that it is not enough to focus solely on bibliometric outcomes, which have their share of problems. One approach, proposed by Ioannidis and Khoury, goes by the “PQRST” moniker: Productivity (which includes bibliometrics), Quality, Reproducibility, Sharing of data and other resources, and Translational influence.8

Second, we note that imprecise predictions of productivity do not necessarily mean that the current peer review system is failing. As measured by current cutting-edge techniques,9 NIH-funded grants are performing well—producing at least twice as many papers as expected, and these papers garner unusually high numbers of citations in their respective fields.4,7 Furthermore, given the relatively low success rates in securing funding, it may not be surprising that peer review cannot yield precise distinctions among proposals that are all excellent or outstanding.

Third, we focus on the idea that the process of funding science should itself be subject to evaluation “using the most rigorous scientific tools we have at our disposal.” Some thought leaders have called on funding agencies to apply the scientific method to the study of their own work.10 We consider a number of analyses or experiments we could do, including changing the way scores are reported (e.g., using “bins” rather than numeric scores); anonymizing grant applications prior to review; testing differentiated peer review processes (e.g., having one study section only review new applications while another only reviews renewals); and comparing peer review scores to nonbibliometric measures, such as reproducibility of findings,  meaningful data sharing, impact on clinical practice guidelines, and so on.

We invite you to read our Perspective essay1 (which the New England Journal of Medicine is making available for free) and to join us in efforts to enhance our most important common interest: to fund the best science and the best scientists who, working with all stakeholders, will advance knowledge and improve our public’s health.

49 Comments

  1. As a junior faculty member in neuroscience still struggling to achieve R01 funding, I appreciate the increased focus on rigor and transparency in grant applications. Nothing frustrates me more than reading papers where correlated outcomes are assumed to be causative, and often causality is implied in the manuscript titles, which leads study section members to believe that certain work has already been done. Unfortunately study section members are universally busy and overbooked, leading to the application of heuristics (institutional reputation; personal relationships; unconscious bias related to gender and names associated with certain ethnicities). The system needs to be double-blind.
    I currently hold a K01 from NIDDK and attended the NIDDK K-Awardees workshop in April 2015. When I emphasized the need for double-blind reviewing, the idea was criticized on the basis that investigators often cite their own work, which would enable reviewers to discern their identity. This argument is logically flawed because, if double-blind is the goal, then moving from certainty in identifying the applicant to ‘probably’ being able to discern the identity of the applicant is a step in the right direction.
    I also brought up the article in Science on significant disparities in R01 funding rates for African-American investigators (Price, Aug 19 2011: Overcoming the R01 Race Gap). When I asked what steps were being taken to address these disparities, the official from CSR indicated that the “data aren’t there yet” to make changes in policy. This official subsequently said that NIH statisticians are conducting additional analysis of these data, which begs the following questions: If the data “aren’t there yet,” then why were the data published in Science? The application of additional statistics a posteriori also gives the impression that NIH is seeking whatever covariates it can find to make the relationship go away.
    In my opinion, the NIH scientific review system is broken and should be replaced with content-mining algorithms. Citation counts and impact scores could be combined with search algorithms (similar to those used by Google) to evaluate the Approach, Environment, Significance, Innovation, and Investigator. As a neuroscientist, I am well aware that the human brain is wired for pattern detection and pattern completion, which leads to the application of stereotypes and cognitive shortcuts that favor speed over objectivity. At least in my field, the increase in interdisciplinary research means that no one individual (or three individuals) is qualified to review five years of planned experiments.

    1. I agree with this comment. I think peers reviewed grant applications and papers should handled double-blindly . At some other stage competency of the researchers may be evaluated by program stuff.

    2. One problem with making the peer review process “double blind” is that it would not allow peer reviewers to evaluate the competency of the applicant/team. Still, it may be worth trying on a limited basis to see if outcomes are positive.

      1. As a seasoned reviewer, I agree. We evaluate the probability of success. That includes prior productivity and publications. If I have 3 papers from the PI showing he can make iPSCs, I don’t care about the detailed methods, if I don’t, I feel look carefully at the institution and for what else s/he has done, and how realistic it is that s/he can make a new approach work.

        No, I don’t see double blind review to work – we aren’t picking research projects out of the sky with just our brains, we have history (or not), colleagues (or not), an infrastructure… it all contributes. I can see that to work in pure math or something like that, not in biomedicine

        1. That is right. We don’t evaluate just the ideas, we evaluate what can be realistically achieved, and it becomes even more difficult with the “bullet-point” format. Besides, I write blind reviews for some journals, and I don’t think it works for several reasons. One of them, I know my field, and am able to figure out pretty well what team submitted the manuscript, despite all herculean efforts of the editorial office.

    3. It can be double-blind when evaluating the ideas, i.e. significance, technical feasibility, novelty, etc. After reviews of these aspects are putting down, review the team and environment, etc.

      Also, when judging scientific impact, it should not put too much weight on publications and citations. Judging based only on these should be avoided. When concentrated on solving a important problem, sometimes, publication will be slowed down. There are instances that it takes time for good sciences to be cited. In addition, different fields have different citation rates.

    4. Great comment. A more fair NIH review system should be double-blind, at least during the initial stage so that each proposal is solely judged by its significance, scientific merit and rigor. This should be common sense like what we have in the current peer-reviewed publication process. Really disappointed that NIH has failed to adopt this change!

    5. We rightly insist on rigor and reliability in our science, we should expect no less from the mechanism by which the science is supported.I have read through the comments below and a number of people have talked about inter-rater reliability (IRR). Suggestions for assessing IRR include comparing scores among the three reviewers within a study section,reviewing scores from program project grants, etc., but none get to the root of the problem since the “n’s” are so small. I would suggest a specifically designed, rigorous attempt to assess IRR. Take 10 grants and give them to 10 study sections each comprised of 10 members. Make the review criteria such that all participants review each grant, and that all grants are to be discussed. Each grant then would get 100 initial review scores and 10 study section scores. With these sample sizes we can get reliable IRR assessments and at least have an idea what levels of rigor and reliability are inherent in the present review process.

  2. I have had a similar experience in past as a junior faculty applying for RO1 grant.
    Reviewers comments were unrelated and focused more on the older data (which was rather used by us to challenge the current concept and show a way forward) to deny the funding. Funny thing is that the same grant was later funded by a different agency with most enthusiastic reviewers. So, I agree with Dr. Stranehan that the current review system is deeply flawed and in a larger sense, it’s there to support a well-oiled “Buddy System” that rewards patronage than the idea/nature of science. A non-discriminatory algorithm is the NEED of the hour!

    1. Please, everyone, let’s stop claiming that the grant review process at CSR is broken and an old boy/buddy system and citing as evidence when a grant gets disparate scores in two different reviews. Neither is true. Grant reviewing is a human exercise and subject to all that that entails. Reviewers come with different biases and expertise, as do writers of grants. I have ad hoc’ed on several CSR study sections, been permanent member and even chaired one. My respect for the work that CSR employees do only grows with time. Most I have worked with are focused on optimizing the system and recruiting the best and most appropriate reviewers for each grant. One thing that would help is if seasoned researchers/reviewers agree more readily to serve. Another is that universities or other employers could do a better job of recognizing or even rewarding those who do serve. Having more than three reviewers will not help in my opinion but will only add to the work load and “noise” in the discussion. Some comments above suggest that reviewers should spend more time reading the literature to fact check or update themselves on each grant being reviewed. No, absolutely not. Every applicant gets the same amount of space to make their case. The burden is on the writer to make it clearly and effectively. Finally, I would argue that there is a rather pervasive attitude I have observed in academics where faculty, particularly under-funded faculty members, are encouraged to submit more grants, even told they should submit one every cycle. To this I say again, nonsense. Particularly more junior faculty members should focus on their best stuff and invest the time in writing well, building their logical and persuasive argument, and getting critical feedback from colleagues. This all takes time. A chair or senior faculty member who is willing to invest such time and training in more junior colleagues is invaluable. Just asking people to submit more, without going to the effort required to optimize their end product, is not leadership or mentoring but rather an escape from same.

  3. .What is really needed it at least a doubling of the NIH budget. Funding less than 10% of incoming proposals results not only in the discouraging of people contemplating scientific careers, but wastes a considerable effort by people whose proposals are scored in the 10-20th percentile. I had the good luck to enter the system in the 1960s when support was much better than now. The current response to the low level of funding includes strict limitations on proposal format, reluctance of study sections to support ‘iffy’ proposals and a general lowering or morale at the NIH. We apparently need more effective lobbyists to point out that just as we begin to understand the finer points of biology and medicine, the support to utilize this knowledge should not be waning.

    1. David makes a good point. Many articles have been published by Scientists on the stagnant NIH funding, which is responsible for the exodus of highly trained postdocs/scientists from academia and the hyper-competitiveness of funding. Some members of Congress have provided support (through op-eds) to increase NIH funding substantially, while others have vocally supported this idea.

      Each year, NIH only asks for small incremental increases in its budget. NIH has extremely legitimate reasons to ask Congress to bring NIH funding levels to what it should have been even adjusted for inflation. Thanks Vick Ribardi for pointing out in this discussion that:“….A basic source of frustration is not NIH’s fault: there are more excellent grants submitted than can be funded, and some of them must be rejected no matter what….”.

      NIH could ask for substantial increases initially and then mandated inflation-linked
      increases adjusted to ensure that all excellent grants are funded. Then we can see how
      Congress acts. May be, somebody from NIH can comment on why is NIH not asking for this.

  4. As a study section member, I can point out at least 2 major problem with the review process: high number of grants assigned for a reviewer (usually 9) and incomplete overlap between the reviewer’s expertise and the topic of the grant. NIH might consider recruiting more reviewers (thus reducing the load) and, perhaps generating more specialized study sections, so that each grant will receive ample attention from a real specialist in the respective field.

    1. I completely agree with the views about the burden on the reviewers and lately due to the budget shortfall many of the special emphasis panel conducts phone reviews which in my judgment is futile. You may get two reviewers scoring excellent but the third one is triaging and during the phone discussion we are asked to see if we want to discuss the grant but many time due to the phone duration no one brings the dead grant back.

      1. I agree with this comment as well. The number of grants to review is too high. Doing study section by phone is really quite impossible. I have participated in this type of review twice and found it extremely difficult.

        Further, as an investigator trying to get funded, it would be great to get at least one, even better, two of the same reviewers on the second submission of a grant. Often when you address the comments from the first round, you get a different and often disparate set of comments on the second round. Adding fuel to the fire is when you address all of the comments from the first round and the score is worse on the second round. This is very frustrating.

  5. Being on study sections and having received several summary statements, I have noticed two correctable issues with the current study section system:
    1. In many cases, at least one of the three reviewers assigned to a grant is less qualified to review the grant (compared to unassigned reviewers from the same study section) based on their expertise. Since only ~half of the grants are discussed, the three preliminary scores play a critical role in whether a grant proposal will be discussed. So, the study section reviewers should get a chance to at least list a subset of grants that fall within their expertise and then the best qualified reviewers can be assigned to a grant.
    2. Some reviewers assign a score of 3 or 4 (out of the range 1-9) in specific review categories (like significance, innovation, and approach), but then they do not list any weaknesses in those categories. The reviewers should be made accountable to describe their scores, which will then help the applicant understand the scores better in terms of specific weaknesses in the proposal.

    1. Having participated on both sides of the process, I think R Abrol raises a good point that has not received much attention – the third reviewer. While smart people can disagree, sometimes the divergent opinion clearly falls out of bounds (statistical? majority opinion? logical?). With reduced funding, there’s no room to average-out this inevitable “noise” in the process. Admittedly Program can sometimes rescue some of these proposals, but this process is delayed, hardly transparent and not available in many cases. Among the analyses CSR might consider should be looking at the consistency of preliminary scores among reviewers to see if there is a problem. With current funding, no amount of discussion during the review can mitigate the damage done by that third reviewer.

  6. Given a fixed budget, the issue becomes how to distribute those funds. A little math shows that a threshold cutoff is the least efficient since someone over the cutoff will do ok and those below do nothing. I suggest that a more efficient distribution system is to fund applicants proportional to their scores with a constraint on the available amount of money. The higher the score the larger the fraction of the requested budget is granted. The lower the score, the smaller is the fraction of the requested budget funded, but the integral of the funding is fixed at the existing budget. The fractions are set so that about 25-30% of applicants are funded to some degree.

    1. I agree with the idea the general idea that Federick proposes of funding more proposals at a reduced level. The fact is that most researchers applying for NIH funds are well qualified to conduct their proposed NIH-relevant research, regardless of what the study section says. By not funding meritorious proposals, research grinds to a halt. By supplying a more consistent stream of funding, even if its at reduced rate, research and productivity continue. If the funds are not sufficient the researcher can apply for more funding. I propose at least two lines of funding: 1) sustenance funding and 2) bountiful funding to support larger efforts.

      1. I emailed Jeremy Berg some years ago with a very similar idea of funding the best scored grants at highest level, and decreasing funding for grants scoring well but not as well. To my amazement he emailed right back and explained that altering the funding in such a way could not be done at the NIH level; it would require an act of Congress.

  7. Dr Lauer cites one call for applying the scientific method toward the review of grant funding. Here is another, written by my post-graduate research mentor:
    Heilman KM. National Institutes of Health: the lottery [letter]. Ann Neurol 2007;62:106.

    Nothing surprising here, but adds to the concern: who is reviewing the reviewers? What is the inter-rater reliability?
    As with Dr Varma, I once had a grant flunked by the American Heart Association, only to be wildly endorsed by an NIH study section a few months later. Go figure.

    In my grant review participation for the American Heart Association, I found misconceptions by fellow reviewers who did not take the trouble to resolve uncertainty in their reviews by tracking down and reading primary publications. I was able to save some good grants that way because I spoke out, because I took the time to read up on what I was voting on. Those grants would not have passed had I not done so or been absent. Such is the process, a spin at the roulette wheel.

    I have just returned from an NIH study section. Two days of ennui, fatigue. Reviewers were charged up at first, then the afternoons wore on, people reading their e-mail or whatnot as we sat around the table, necessary potty breaks while grants were discussed. Reviewers voted based on what they heard others say, less so with their own understanding of the grant at hand. Regression to the mean applied in the scores. The old adage that a grant can succeed or not depending on whether a reviewer is passionate, and has a certain personality type, holds. The SRO kept nudging us on, saying we had a time limit. We were under pressure, as usual.

    One of the scoring categories is “innovation.” This is the worst and should be abandoned. So what if a grant is not innovative (ie, it uses tried and true methods of investigation rather than come up with new ones), if it has a potential beneficial impact on health care? I hated scoring that category, because it does not affect my overall score. Not sure if that is true for other reviewers.

    So much of grant funding at the NIH supports applications with little potential impact on health care (for humans anyway), but are rewarded for using buzzwords, advanced technical writing, theoretical abstract constructs that sound impressive but will not help Joe Taxpayer. I keep reminding myself, we are reviewing on the public’s dime and for the benefit of the public, and yet so much gets funded for gimmickry that really will not advance health care in the lifetime of the applicant–but surely will advance careers. Isn’t that what it’s all about, after all?

    By the way, I still have yet to get that R01 myself, despite years of trying. So much wasted energy and time.

    I do not know if there is a better system, but obviously human flaws in the reviewers can change investigators’ careers.

    1. Once clarification: Innovation can be either technical or conceptual. Applying standard, state-of-the-art technologies to an innovative question can be far more innovative than applying a cutting edge technology to an answered question. This is reflected in the review criteria. It’s up to the SROs to remind reviewers of this.

  8. Bibliometrics is already a part of the evaluation process, but not in the sense HHMI is using it; past performance as an indicator for future performance. Obviously, past performance in terms of citation impact using not simply individual papers in so-called high impact journals is not the way to go. Overall measure such as h-index (or m-index to correct for academic age) go well beyond simply counting numbers of papers and guessing their impact. IMHO, all faculty applying for NIH or NSF funding should provide a link to their scholarly impact page (Scopus, Google scholar) etc to allow measuring a PI against a standard impact in the discipline. One could even provide matching impact numbers of other researchers in the same field at equally sized universities, correcting for private versus public universities to obtain a simple measurement of individual performance.
    Obviously, questions of public dissemination of data in a timely fashion are irrelevant for people with an average publication record of 6 or more papers per year but are in issue for faculty with only one data paper every other year, no matter the journal that paper will be published in.
    Using modern bibliometirc approaches can go a long way to overcome the past difficulty in measuring past performance and can help to evaluate future performance as an integral part of grant evaluation.
    I like the ‘PQRST” moniker, but believe it looks multiple times at the same issue but from different angles: Productivity without quality, including reproducibility, will not lead to sharing of data that will have profound translational influence.

  9. I think the NIH review system does a good job overall, in spite of all the heat it gets. A basic source of frustration is not NIH’s fault: there are more excellent grants submitted than can be funded, and some of them must be rejected no matter what. The other source of frustration has to do with the small n: the small number of reviews evaluating the grant. Reviewers are human, and have their biases, prejudices, preferences, and quirks. These cannot be eliminated, but can be reduced by increasing the n. In a hyper-competitive environment, a slightest negative comment by one of the reviewers is enough to, say, send the grant from 10 percentile to 15 percentile, which is the difference between funded and not funded. Increase the number of reviewers, so one odd comment by someone does not derail the grant if it is positively reviewed by many. Increase single/noise by reducing noise with a larger n. Of course, this would mean more work for more reviewers, and more effort in finding qualified experts. But maybe it is worth it.

  10. Respected scientists,
    Peer reviewers have their own views and they always respect to the rules and regulations of NIH. Many outstanding research benefits are not reached to the tax payer’s hand and the people of the world.

  11. The fundamental problem with NIH’s peer-review is that, being essentially a measurement System, nobody has taken the time to evaluate its noise level (the standard deviation of the scores). The reason why the System is broken is likely because the noise is above our ability to detect the signal (fundable score). All the solutions that are being proposed are good efforts, but in order to solve the problem we first must know *the signal-to-noise ratio*, there is no way around it. By the way, an easy way to reduce the noise is to reiterate the measurement with a feedback loop: allow applicants to send a rebuttal to the comments *prior* to the completion of the study section! — that would reduce a great deal of misunderstandings!

    1. I agree with the last suggestion. If possible, allow applicants a chance to correct misunderstandings before the study section meets.

  12. Why doesn’t NIH simply select a random 5 or 10% of the applications for review by a 2nd IRG? (In my experience, most applications that I have reviewed, and most that my colleagues or I have submitted, could easily have been reviewed by a different IRG than the one chosen.) Whether NIH would choose to make any use of an application’s 2nd score is not the point. Reproducibility of the scores, at least in a rank-ordering sense, is a requirement of reliability. This is a quintessential application of “apply[ing] the scientific method to the study of [NIH’s] own work.”

    In fact, this begs the question, “Where has NIH been all these years, in managing a reliable review process?” If we were reviewing an application that solicited expert opinions, we would roundly criticize the research plan if it didn’t include an evaluation of reliability!

    1. After the change in NIH policy regarding resubmissions, I resubmitted a previously triaged R01 proposal to another study section and had it funded at the 5th percentile. My recent experience with NIH is that the scores received are essentially random, and do not correlate with the amount of time I spend on a proposal or what I perceive its intellectual quality to be. It is also common (in my experience) to have at least one reviewer who makes no substantive scientific comments on a proposal, suggesting that he/she is not qualified to review it or spent no more than a cursory amount of time looking at it.

      A decade ago though the process seemed more fair, in that the reviewers generally were well qualified to review my proposal and had comments and criticisms that were mostly fair.

      1. While I don’t agree the reviews are “essentially random”, I do believe they contain a substantial random component, one that is certainly large enough to make or break any submission. And that is extremely discouraging.

        I have not had the same experience with a rejected grant submitted to a new study section, but have submitted grants to special study sessions where they were reviewed like the best thing since sliced bread (and funded) when the same grant in a general study section was panned for what seemed like trivial reasons (and triaged).

        Thus it is very unclear where the signal (funding) lies within the noise (all the random comments received).

  13. “Increase the number of reviewers, so one odd comment by someone does not derail the grant if it is positively reviewed by many.”

    This is a great idea.

    NIH staffers will complain that it’s too tough to find reviewers, but that’s only because NIH accepts excuses too easily. Panel review should be a mandatory obligation for every funded reviewer. If they don’t do it (and do it well), then they are not eligible for future funding. You’ve got to contribute to the system if you want to benefit from the system.

    Others might complain that this will increase the percentage of people reviewing grants without appropriate expertise. It’s the applicant’s job to explain how and why. In my experience, people who are thinking clearly have no trouble explaining how and why to almost anyone. And this is the taxpayer’s money! If you can’t get a scientific colleague to appreciate what you’re doing… Also, more ‘expertise’ also inevitably means more conflict of interest, especially in this hypercompetitive environment. If I really wanted to send my best ideas and detailed plans to competitors, I can do it myself. No thanks NIH.

    1. I agree with the comment to increase the number of reviewer. It should be an obligation for funding to participate as a reviewer. Furthermore, one learns a lot from participating in a study section. Perhaps eliminating the distinction between Reviewer 1, 2, and the Discussant such that all 3 reviews (Reviewer 1, 2, 3) are complete reviews, thus discouraging cursory review.

  14. NIH has deliberately set up a series of barriers to funding grants, a system they call “peer review”. The original point of the process was to discriminate the sound research proposals from the ones that are not well thought out or impractical. With time, the system has gotten more and more complicated, so that its current purpose seems to be to make sure researchers spend more time writing grants than doing actual research. Statistically, your chance of getting an R01 funded is around 10%, which means you need to write 10 grants or grant revisions to get one funded. Remember also that the %ile scores are only calculated for grants that are actually discussed. Put with all the ones that are scored below that cutoff, and the odds are far worse. However, a select group of CFRs (consistently funded researchers) seem to get much better odds, and much more direct funding.
    To put this in perspective, odds ratios for the general research community are worse than if you buy a ticket in many public lotteries, where as many as one in eight tickets may be winners. And there it only costs you some pocket money, not months of your life excluding your children from your study so you can polish your proposal or rebut your critics. Because after all, some research work must be done, most of which can only be done in daylight hours. With this situation, is it any wonder women decide to be lab techs and support personnel, rather than trying to become independent researchers? The system also seems designed to make sure women stay in those positions, as without their support the reigning professors could not maintain their CFR status.
    Solution: 1) Recognize the problem of big labs getting bigger and smaller ones shrinking to nothing. 2) Do not keep defending a grading system built up over nearly a century of tradition (as a recent paper in Science tried to do, ignoring the fact that many of the higher (worse) scored grants had better citations/article than those with the lowest (best) scores. 3) Find a way to fund researchers with small groups who have expertise in areas that should not be lost, without making them become research factories. 4) And finally, know that funding one big group in a given area, while turning away related proposals from less well known or publicized researchers is an invitation to undetectable fraud or overhyped results. The only way science can function is to have others who can check a result, even if funding them seems redundant to program officers.

  15. many good thoughts and comments. my 2 cents as investigator and reviewer….
    Success evaluation should take into account degree of risk.
    NIH is encouraging high risk projects with its “innovation” category and translation emphasis.
    The higher the risk the less likely the project is to be successful. There is a tension between producing/reviewing a “sure thing” and a “game changer”.

    1. There is an easy solution to this: Special high risk grants. NIH has several categories of them – reviewing these was one of the most pleasant reviews. The kiss of death was if someone stated: This has enough preliminary data or is solidly based in existing science and could be an R01

  16. I agree with most of the comments above, and have concluded that this unreliable system if fundamentally broken. I would especially like to emphasize the comments about innovation listed above. I think that it is easier for non-expert reviewers to agree on technical innovation than it is to agree on a conceptual innovation. So, the innovation scores are biased towards easily recognizable technical innovations. Also, the way the current system is applied, it typically generates the impression that scientific advancement occurs only through transient, flashy technical breakthroughs, and not through systematic work that pieces together a complex puzzle, replicates complex findings, and develops a comprehensive data base from which one can draw accurate conclusions. At this point, the system does not at all reward proposals that do the latter, only the former. Organisms interact with their environment by exploitation as well as exploration, but the current system does not have the right balance at all.

  17. After service on many, many study sections, I believe most reviewers are reasonable and are trying to do a good job. NIH could do a much better job of identifying the removing reviewers that are negatively or positively biased. The only review score that matters is the Impact score, and the assignment of the Impact is inherently subjective since we cannot predict the impact of any science in advance of conducting it.
    Many reviewers rank the applications by whatever criteria they choose (mostly flaws in the approach coupled with whether they are personally interested in the research or researcher), to assign a range of Impact scores — since one has to spread out scores. Then they find reasons to justify that Impact score given, and too many times these comments are capricious. The problem is that there are too many good proposals in each study section and therefore by requiring score spread for practical considerations, the process appears to be random. I don’t have a solution, but think that using more objective measures of an application’s merits should be developed and not the perceived Impact. When my grants do well, the system is working; when my grants do poorly, the system is flawed. My happiness with the outcome of the review process will not change regardless of the system used or the number or quality of the reviewers

  18. I agree with two proposed changes in the NIH peer-review and funding processes: (1) Increasing the number of reviewers for each application, to dilute the impact of personal bias. (2) Providing funding to more investigators at reduced levels, by adjusting the amount of award according to the score received up to a cut-off.

  19. 1. The selection of members in panels should be done by voting within the professional Associations that represent the discipline in question. For example, the panel members of NIDCD representing Olfaction and Taste should be elected by votes in the professional group representing these areas in US. In the case of Olfaction and Taste, AChems.
    2. Diminish the $ in each grant so that more proposals can be funded.
    3. Simplify the administrative process. When I started in the 80th it was possible to put together a grant by the PI, This is not possible today.
    4. Increase emphases on past productivity and originality. Use published original papers in refereed journals with a long history of publishing good science and no goal of making money.
    5. Instigate a large number of young investigators 5 year 50,000/year grants. The application can be based on the applicant’s doctoral thesis and a short proposal. Funding for another 5 years’ grant will be based on productivity by the applicant.
    6. Decrease micromanagement of funded projects.

  20. We know that you have to be a good writer to get a grant funded. You must be able to communicate an entire story to the reader. The same is said for getting an article published. Therefore, by measuring the number of publications as a performance indicator, is not more a measure of writing skill than science?

  21. The essay by Nakamura and Lauer is a primer in one thing only: bureaucracy. Is their notion of success is a high PQRST score as described by Ioannidis and Khoury, then it reduces the grant review process to bean-counting. Higher numbers of papers (the “P” in the the algorithm) that are cited at greater frequencies describes a self-fulfilling, looped system, namely one in which grants go to those who most often serve on the editorial boards of high-impact journals. Where is the “R” in your algorithm when the numbers of retractions in high-impact journals is going through the roof? And this would increase drastically if there was a real mechanism for researchers to report when they could not reproduce major findings in high-impact papers (funded, often, by multiple R01s). What is NOT needed is more top-down metric-based re-engineering. Rather, Nakamura and Lauer should actually read the many outstanding suggestions made on “Open Mike” pages such as these; I would be happy to read their point-by-point response to these issues in their NEJM follow-up paper.

  22. I am a seasoned reviewer but also was screwed. I had 3 reviewers giving me 1-2 scores and a person from the floor convincing enough of the panel with a completely flawed argument (you need deeper sequencing for complex disorders than for Mendelian, complete BS) to lower their scores. My colleagues asked if I had a personal enemy on the panel reading the comment. There are too many incompetent scientists and too many of them are on NIH panels – the most seasoned ones “aren’t making themselves available” – no, they really aren’t, they are busy. Don’t misunderstand, the majority is fine, and trying hard, but when we are at such tight rates, one bad apple can spoil it, and hence the impression of chance – and of the same grant doing better in another study section.

    Reviewers are not rewarded enough – I love being in the “in” – it helps my science to be on, but is too much work – this is true for editorial boards and paper reviews too. Nobody asks me how many reviews I did when it comes to merit increases.

    Solutions:

    Pay reviewers more. I recently dealt with eLIFE – it was a mind-boggleing difference compared to other papers, dealing with a journal where the editors and reviewers acted as professionals. I understand they get paid. They are responsible. If you get paid well, you want to do a good enough job to be asked to come back. As long as it is a service that you need to be arm-twisted into, not rewarded at any point, incentive is lacking.

    NSF has mail in reviews. Get (paid) mail in review in specific areas to reduce the load for reviewers and make the discussion more substantive.

    Previous good moves were the triage system (I have reviewed pre-triage…) and the limit on resubmissions – although that is now off too, in part because of the chance factor.

    Productivity: Do the math correctly! In every business except our science this includes calculation of in and output. Most PIs with 4 big grants list all 4 on all papers. Its easy to have a 15 paper productivity after 5 years, the rich get more grants, but a lab with 1 R01 plus some private is disadvantaged, even though data have shown that in terms of cost -effectiveness, small labs are more productive. Calling 1 paper/year, 4-5 papers in a 4 year grant low productivity just isn’t right. Calculate papers per $100,000 of total funding of the PI, or something like that. Ignore commercial papers that publish as long as you pay (there are dozens of them around now!)

    Discriminate by age: Not only positive for “early career” but also negatively, as >65 year old PIs not only have the highest funding rates, they also have higher salaries and hence contribute more than their share to the cost of the research

    Decrease the salary cap to $100,000. An institution can pay what they want to an MD bringing in money, but his research is not bringing in the money.

    Demand at least 50% faculty effort for each $250 K grant, with at least 25% from one lead PI. That will limit the number of R01s one PI can have to 4

    Alternatively, as many have said before, limit total paid NIH effort to 50% of salary, as in reality, we all have other jobs as faculty too – clinical, teaching… nobody can work 100% on NIH grants, and Universities should not expect NIH to fully pay their faculty!

    I guess I am also a hopeless socialist, I want pennies for the poor (reviewers), limit the rich from getting even richer, increase taxes for the rich old boys, and accountability.

  23. As I see it there are multiple problems: (a) SROs and POs who have (generally) never had independent careers running programs, so they are out of touch with the ground realities (as with the FDA, many are hired directly out of postdocs), (b) too little increases in the annual budgets for the NIH, (c) burgeoning numbers of applications, (d) reviewer workloads, (e) inadequate reviewer expertise, and reviewer biases about research areas/PIs, (f) old boy network, (g) SRO’s connections with reviewers (i.e. “friends of the SRO”), etc. etc.
    One possible solution, look at the NSF. On an experimental basis, have specialized review groups with rotating program personnel for 2-3 years slots. Program personnel must be established investigators who have performed research successfully in these difficult times. The NIH can pay the rotator salaries via the home institution, just as the NSF does.
    I have had continuous funding from NSF and NIH for over 19 years, but NOTHING major from the NIH!!! I have had multiple NSF grants, so as I see it my research is good enough for the NSF but not for the NIH. I must be missing something. In both serving on study sections and having had applications reviewed at the NIH, I must say that reviewer selection and workloads are both significant problems, at the least.

  24. 4 easy steps to fix the NIH/CSR addressing fundamental problems of grant submissions, review process and funding of research projects:
    1) Grant submissions: NIH/ CSR should pre-screen grants and NOT REVIEW a certain % – return to applicant for revision. Eliminate grants at bottom: poorly developed/written, insignificant, not within scope of NIH. Eliminate grants at top that are essentially duplicative of current funding, e.g. GeneX in Tissue A,B,C (multi-R01 funded PI’s), or technically sound but lack innovation/ significance with respect to PI research program. Reduction of apps by ~30% would increase payline significantly with the same level of NIH funding, e.g. from 10% to 15%. Pre-screening of apps may discourage from PIs from submitting multi-apps in the lottery approach to funding. Many agencies use this approach.
    2) Grant review: The use of individual scores for components was the right idea but in my experience has been a failure since the grant is still given only a single score. So instead give 2 scores: 1) the reviewer’s overall score and also the composite score from adding up the 4 criteria (eliminate Environment score-lump with Approach). This would give NIH some wiggle room to fund applications that are strong in e.g. Innovation and Significance but with concerns in Approach. In my experience most reviewers/reviews still get entirely hung up on Approach (“have the mice been made?”).
    3) Study section reviewers: The reviews will only be as good as the reviewers. Make it a requirement prior to awarding of funds that any scientist receiving NIH funds will be expected to serve on SS with possible loss of funds if no service. Give ample time on SS for discussion of grants which have merit yet there is significant disagreement.
    4) Funds: A) Limit PI effort in total to 50%. This would discourage PIs from multiple grants for purely financial reasons, require universities to buy-in to their faculty, and stabilize the academic faculty by providing more diverse sources of funding with ~25-50% of salary NIH-dependent. B) increase modular grant awards to $350,000 (from 250K) so that 1 NIH R01 really can support 1 project, eliminating the absolute need for 2 R01s at $500,000. C) Form a cmte to examine and make recommendations regarding indirect funds.
    I believe that these small and easily implemented changes in CSR/NIH would be a major step in the right direction in salvaging the scientific community and restoring its faith in the NIH government enterprise, each of which are now on the rocks.

  25. Given all the recent concern regarding reproducibility in science, we should be asking to what degree peer review outcomes are reproducible.

    One way to address this question is to analyze the following existing data from the NIH review system. Consider that the individual projects within a program project application can be submitted independent of the program project. When this happens, it means that an individual project P is reviewed twice: once as part of the program project, and once individually. We should expect their scores to be very closely correlate, and to give us information about how reproducible peer review is. Be aware of course that P is not reviewed in the exact same form in both cases, since when P is reviewed in the context of the larger program project, it is put in a different context.

    1. Or just run parallel study sections. A simple and effective experiment. I wonder why this did not occur to the heads of the OER and CSR?

      Naturally most of us who are in the trenches of peer review (on both sides) believe it would reveal the impossibility of distinguishing grants in the 8%-28% zone with any reliability.

      But it sure would quiet down a lot of complaining if interrater reliability turned out to be higher than predicted. Seems like a good thing to find out.

  26. I agree that there is a complete lack of accountability for reviewers, and that problem has become more critical as there are fewer and fewer truly qualified reviewers available. As many have said, applications can be condemned by a single reviewer, and more and more reviewers appear to be unfamiliar with the research and methods related to the grant and/or mistaken in their assertions in the reviews. I, too, have gotten enthusiastic responses from a special panel convened for an RFA, and uninformed reviews from CSR panel reviewers.

    Typically, the primary reviewer decides whether an application is discussed or not, and there appears to be no oversight of these judgements. Applicants have no recourse when a reviewer who is not knowledgeable triages an application based on uninformed opinions (rather than valid scientific reasons). Resubmission doesn’t work when reviewers are both uninformed and certain about their opinions! Like others, I’ve had two applications with almost identical methods receive completely inconsistent comments and scores about the rigor of the methods – one was funded and the other triaged. The difference was in the expertise of the reviewers, not the rigor of the science.

    The following steps (some suggested by others above) could go a long way toward addressing these problems:
    1. Eliminate “primary” “secondary” and “tertiary” reviewer status. All reviewers are expected to provide and be ready to support a detailed review. If all three are carefully reading the application (as they are expected to!), this should not be extra work for any.
    2. Focus extra effort only where it is needed…on applications that are not consistently reviewed. Calculate inter-rater reliability of the initial scores that reviewers submit (BEFORE the meeting). If all three do careful reviews that they are ready to support (see #1) and their scores are highly consistent, then additional work probably isn’t warranted. But for applications with poor inter-rater reliability of pre-meeting scores, discussion should be mandatory and reviewers should know that they will be called upon to justify their ratings to the review panel if they are inconsistent with the other reviewers. That is not a problem for reviews based on knowledge of the area and methods. This alone will improve the efforts of reviewers who give low scores after a cursory reading, knowing that they will not have to present their reasons.
    3. Calculate interrater reliability on final scores (after discussion) from all panel members. For applications with poor interrater reliability, set aside the application for a two weeks to give applicants a chance to respond. Assign an experienced reviewer to evaluate the reviews and response and to determine whether low or high reviews were more accurate assessments.
    4. Allow applicants and SROs to anonymously rate the quality of the reviews. Remove reviewers who get poor ratings – even after a single round. Most of us can tell the difference between a fair and well-informed, but critical review and a cursory and ill-informed review. And SROs probably can spot the bad ones easily! Also set aside funds that only highly rated reviewers can compete for. This could motivate many to participate and provide careful reviews.

  27. NIH peer review does not appear to be based on any reliable metrics. Here is my suggestion how to evaluate applications from established investigators:
    Take the H-factor (index) and divide it by the sum of federal research dollars awarded to the applicant over time till present. This would we a reasonable formula that considers both input and output.

Before submitting your comment, please review our blog comment policies.

Leave a Reply

Your email address will not be published. Required fields are marked *