In previous blogs, we talked about citation measures as one metric for scientific productivity. Raw citation counts are inherently problematic – different fields cite at different rates, and citation counts rise and fall in the months to years after a publication appears. Therefore, a number of bibliometric scholars have focused on developing methods that measure citation impact while also accounting for field of study and time of publication.
We are pleased to report that on September 6, PLoS Biology published a paper from our NIH colleagues in the Office of Portfolio Analysis on “The Relative Citation Ratio: A New Metric that Uses Citation Rates to Measure Influence at the Article Level.” Before we delve into the details and look at some real data, let me point out that this measure is available to you for free – you can go to our web site (https://icite.od.nih.gov), enter your PubMed Identification Numbers (PMIDs), and receive a detailed citation report on the articles that you’re interested in. And they don’t have to be NIH articles – you can receive citation measures (including the Relative Citation Ratio) on over 99.9% of articles posted in PubMed between 1995 and 2014. That’s well over 13 million articles!
Now, let’s take a look at how the Relative Citation Ratio (RCR) works.
The goal of the Relative Citation Ratio is to quantify the impact and influence of a research article both within the context of its research field and benchmarked against publications resulting from NIH R01 awards. Figure 1 shows a red dot, our “article of interest.” It is cited by a number of articles (blue dots in the upper row), which we call “citing articles.” We can count the number of citations over time to our article of interest, yielding an “actual citation rate.” These citing articles not only cite our article of interest, but they also cite other articles, which we call the “co-citation network.” The corpus of co-citation network papers can be safely assumed to be reflective of the research field (or fields) of our article of interest. We can then calculate an estimated “field citation rate” by recording the citation rates of the journals that published the co-citation network papers.
In Figure 2, we show how to develop a benchmark measure for field-normalized citation rates by plotting the actual and field citation rates for a large cohort of NIH R01-supported articles.
With this in hand, we can compare the actual citation rate to the expected citation rate (a rate that depends on field and time of publication and that is benchmarked to an “NIH norm”), yielding the relative citation ratio, or RCR.
Figure 3 summarizes the process and what the results mean. If a paper is cited exactly as often as would be expected based on the NIH-norm, the RCR is 1. If a paper is never cited, the RCR is 0. If a paper is cited a great deal, the RCR will exceed 2, or 20, or even more.
So, now let’s look at some real data. We used NIH’s Scientific Publication Information Retrieval & Evaluation System (SPIRES) to identify over 1.4 million NIH-supported papers published between 1995 and 2014. SPIRES – as described in a 2011 Rock Talk blog, is an NIH system to map PubMed publications to NIH grants. (More about its capabilities is described in a manuscript published around the same time).
Figure 4 is a histogram showing the spread of RCR – note that there is a highly skewed – very very skewed! – distribution with most papers have low RCR values but a few having high, even extremely high values. This is consistent with patterns described in previous literature on scientific citations.
To make these data more informative, let’s look at Figure 5 in which we log-transform the X-axis. Now we see a (not quite) log-normal distribution with a median value of 1 (as would be expected for NIH papers). There are a few papers with RCR values of zero – now easy to see – while the large mass of papers have RCR values between 0.1 and 10.
Figure 6 shows another way to present these data – a box plot. The median (middle black line) is 1, while the 25th and 75th percentiles (bottom and top of the box) are 0.46 and 1.98. It turns out that NIH papers comprise about 11% of all PubMed entries. Figure 7 adds a box plot for nearly 12 million papers that were not supported by NIH: the median is much lower at 0.36 while the 25th and 75th percentiles are 0.06 and 0.96.
Just by looking at the two box plots side by side, we can see that on the whole, NIH papers have higher RCRs. However, there are NIH and non-NIH papers with extremely high values (greater or much greater than 20).
However, this is not the whole story. We can better show these data by superimposing density distribution plots over the box plots (aptly called “violin plots”). Figure 8 shows violin plots and we see that there is a much larger group of papers with RCR values of zero among the non-NIH papers.
In fact, we found that 3% of NIH papers were never cited, while over 20% of the non-NIH papers were never cited.
One obvious question is whether these values really mean anything in terms of scientific quality. Yes, we can count citations and make some fancy calculations, but do these have anything to do with high-quality science, the kind of science that we at NIH most want to support. The authors of the PLoS Biology paper present data (Figure 9) from 3 separate studies in which expert peer reviewers rated a group of papers; in all 3 cases, there was a strong association between scores by expert opinion and RCR.
In future blogs we will revisit some prior questions and look at others leveraging the RCR. For example, we can look at the association of grant budgets and grant durations with productivity as assessed by RCR. We can, as some of you suggested before, compare outcomes for different scientists, different organizations, and different types of grants. We can also consider other outcomes, like submission or publication of patents. We’ll also look at other tools that enable us to trace the development of discoveries over time and to follow a field from basic to translational to applied work. And please feel free to send us your questions about analyses we or others should do so that we can work together to make the field of grant-funding one that is more and more grounded in evidence.
Congratulations to the authors of the PLoS Biology paper. I would like to thank the Office of Portfolio Analysis for helping me learn about the Relative Citation Ratio. I would also like to thank my colleagues in OER (including the Statistical Analysis and Reporting Branch and the Office of Data Systems) for helping me with these analyses.