NIH has long been committed to transparency into who and what we fund. We have previously discussed the value of freely-available web tools that allow you to gain insight into NIH funding decisions. Award data available via RePORT and RePORTER, for instance, include non-sensitive information such as awardee institution, principal investigator, funding levels, research abstracts, as well as associated publications, patents, and other project outcomes. Better yet, if you want to see all of these data all at once, then ExPORTER allows you to download over 25 years’ worth of such non-sensitive NIH grant award data.
Researchers have used this grant information in creative and thought-provoking ways to explore NIH funding decisions. For example, both Fang, Bowen, and Casadevall as well as Li and Agha analyzed post-award research productivity according to pre-award peer review scores. Li, Azoulay, and Sampat linked publications resulting from NIH awards to patents. Boris et al used RePORTER data to verify self-reported awards in the dermatology field. Cleary et al used RePORTER data to show that all recent new drug approvals were in some meaningful way linked to NIH funding. And as I wrote in this 2017 post, Katz and Matter looked at some NIH data and described what they saw as inequality and stasis in the biomedical enterprise.
The data available through RePORT are quite powerful in their own right. However, compelling arguments exist for why researchers outside NIH should have access to even more information associated with the grants process. In addition to the non-sensitive data, NIH maintains sensitive information collected via the grants process in its internal research administration systems. Such data includes information on peer review outcomes, progress reports, and demographics of individuals listed in NIH grant applications.
As part of a Request for Information (RFI) described here and other future engagements, NIH is considering which categories of this sensitive data may be shared with researchers in compliance with applicable laws, while safeguarding sensitive, personally-identifiable, and confidential information (NOT-OD-19-085). NIH is beginning the process of exploring the costs and benefits of providing approved research organizations-controlled access to such structured, de-identifiable NIH administrative and scientific information in a formal and controlled way—through a secure data enclave.
Over the years, institutions, professional societies, advocates, and researchers interested in the science of science and innovation policy have requested access to more of these sensitive NIH data. Under certain circumstances, NIH allows researchers to enter into special data use agreements or other contractual arrangements to access these data for specific research purposes. As an example, NIH issued a contract that allowed Ginther et al to look at demographic information of researchers identified on applications, which they later used to compare with receipt of a major NIH award. It is important to remember though, that even when we permit such access, we remain dedicated to safeguarding your sensitive, personally-identifiable, and confidential information (please listen to the related NIH All About Grants podcast on this topic for more: MP3 / Transcript , 7 minutes).
Last December, the Advisory Committee to the NIH Director’s Next Generation Working Group recommended increased access to NIH administrative data. These administrative data have the capacity of “empowering career decision making through the availability of NIH data” and increasing accessibility of internal data for researchers studying the biomedical workforce” (see Theme 5.1 of the work group recommendations to NIH here).
Some federal agencies host unique environments allowing researchers access to agency information. The Centers for Medicare and Medicaid Services allows users to obtain research, data, and statistics on topics like actuary studies, compliance monitoring, and claims.Moreover, the Census Bureau makes certain administrative data available to reduce respondent burden and enhance analyses on changes in the U.S. population, demographics, economy, and social conditions. Though such avenues exist, concerns still remain around data security, personal privacy, affiliated costs to manage the platforms, how physical or virtual environments are controlled, and the overall need to know.
As noted earlier, we recently issued an RFI that seeks community input on considerations for securing these data, where they may be accessed, requirements for a research plan, and procedures for exporting information outside the enclave. Moreover, we hope to better understand what biomedical and/or behavioral research questions may be answered, if the enclave should be physical or virtual, how many seats an organization may be interested in using, what policies are needed to secure the information, and proposed steps to take in case of data breaches.
All RFI responses must be submitted electronically here by Wednesday, May 30, 2019.
This RFI is our attempt to gauge initial interest in a data enclave and identify issues that we will need to think through. This is your opportunity to tell us if this is something you would use, if you have concerns to share, and suggestions to improve the idea. If the response is positive, we fully expect to continue to engage the community to help us refine the idea.