RCDeCade: 10 Years and Still Counting


Remember hearing those stories about how your grand-PIs had to walk five miles, in the snow, uphill, with no shoes just to learn how NIH spent its research budget? Well, believe it or not, but that was just ten years ago. Today, we have the Research, Condition, and Disease Categorization (RCDC) webtool to do this in a blink of an eye. Now, following the official release of Fiscal Year (FY) 2017 data and updated estimates for FYs 2018 and 2019 last month, we wanted to celebrate a successful decade of service.

With origins stemming from the NIH Reauthorization Act of 2006, and now available via NIH RePORT, RCDC is a helpful resource for investigators, advocacy groups, Congress, and the public to easily see how much NIH spends on certain research areas year by year. Since 2008, we have made regular enhancements to make the system more beneficial to all. Project listings published. Data cleansed. Quality enhanced. Disease burden information added. And, today, 285 categories reported online.

Consistency in RCDC is critical for accurate NIH budget reporting. To achieve this, NIH experts defined boundaries for each RCDC category—akin to a biomedical thesaurus—using an automated text mining tool. Based on the importance of a scientific concept to a category topic, a definition is agreed to by the experts, a match score calculated, and a list of projects produced. Once the experts agree, a final budget report is developed accounting for all obligated dollars in each category and the NIH Institute or Center who obligated those dollars.  It’s also important to acknowledge that the process is not perfect – there is some misclassification.  Just like with diagnostic tests, a tension exists between sensitivity (i.e. are we finding all the projects that fit into a category) and specificity (i.e. are the projects linked a certain category truly linked to that category). Want more on the categorization process? Go here.

Many astute RCDC users know, as additional research areas are added, the overall total reported by RCDC will also increase. If you do that now, the total is a whopping $186 billion. That number is obviously much higher than NIH’s budget. This occurs because the same project can be reported in multiple categories in one year. RCDC categories are by their nature overlapping (for example, Brain Disorders, Neuroscience, and Mental Health). Further, we cannot partially parse project funding in a meaningful and consistent way, for example, deciding that 40% of the monies went to one category, 30% went to another, and 30% went to a third.

Categories are also limited to those requested by Congress and Executive Branch leadership. This means that RCDC categories do not encompass all types of biomedical research. Thus, some NIH-funded projects might fall into several of the reported categories, whereas others may not be captured at all.

Even with those caveats, the beauty of using the RCDC method is that it gives us consistent annual reporting and the ability to see any potential trends in spending. But, how can we truly know if the RCDC coding process is consistent over time and not artificially inflating the reported amounts?

To answer this, we started with the publicly available RCDC topics in fiscal year (FY) 2008 that used an automated coding process. In other words, we focused on 207 of the original 215 categories available in FY 2008—the remaining eight were excluded because they were manually coded (e.g. Women’s Health and Health Disparities). We followed what happened with these topics through FY2017.

Let’s start with actual (nominal) appropriations to NIH. As a point of reference, since each fiscal year is compared back to it, the 2008 figures will always begin at the zero percent change mark (Figure 1). NIH’s nominal appropriations, as we know, steadily increased most years over the past decade (landing at about 12 percent higher in 2017), with the notable exception of 2013, the year of sequestration (green line).

We can next see that the percent change in the fraction of NIH projects assigned to the 207 categories is within 2 percent between FY 2008 and 2016 (gray line)—note, we introduced a better categorization engine in FY 2017, one that still allows for comparable observations to be made. Nevertheless, the percent change in the sum of spending in the 207 categories trends upward more strongly, when compared back to FY 2008 (orange line). The increase from 2008 to 2017 was about 35 percent more during that span of time and is a result of the increases in average project costs.


Figure 1 compares the original 207 automated RCDC categories on a variety of metrics over time. The X axis represents fiscal year 2008 to 2017, while the Y axis is percent change from -20 to 40 percent. There are five unique lines displayed on the graph. A green with “X” markers line represents NIH appropriations each fiscal year versus FY 2008. The orange line with diamond markers represents the percent change in the sum of spending in the 207 categories. The gray line with triangle markers represents the percent change in the fraction of NIH projects assigned to the 207 categories. The red dashed with circle markers line represents the percent change in average project cost in the 207 categories. The light blue line with square markers represents the Sum of the percent changes in both the average project cost and average number of projects in the 207 categories.

The average cost per project drove the reported increase. When looking at the percent change in average project cost (red line) as well as the Sum of the percent changes in both the average project cost and average number of projects (light blue line) in the original 207 categories, we see they both trend upward over time. Many factors may lead to this outcome, among them being the rising cost of doing biomedical and behavioral research in general.

What about the changes we made to the RCDC algorithms themselves? Could that also contribute to the noticeable changes in average project counts of categorized and reported funding amounts we see here? Well, yes, but the change was negligible. As mentioned earlier, the normal % change attributed to refining the categorization process was at most 2 percent each year, with the notable exception for FY 2017.

These findings suggest that the automated RCDC categorization does contribute to a consistent reporting process over the past decade. Most of changes seen in project funding are due instead primarily to increases in the average cost of research projects—not the algorithms.

When combined with other tools, these resources are a quite powerful means to assess the research enterprise. Our staff and stakeholders can trust that RCDC can help them get a quick, accurate, and consistent look into how NIH spends its funds in many different research areas. We’re excited to see where this tool will take us going forward, especially as its consistency and transparency will help us better forecast future trends in biomedical and behavioral research spending.

I would like to thank Rick Ikeda, Judy Riggie, and Andrei Manoli within the NIH Office of Extramural Research for their dedicated and hard work on this project.

Before submitting your comment, please review our blog comment policies.

Leave a Reply

Your email address will not be published. Required fields are marked *