Requesting Your Input on the Draft NIH Strategic Plan for Data Science


To capitalize on the opportunities presented by advances in data science, the National Institutes of Health (NIH) is developing a Strategic Plan for Data Science. This plan describes NIH’s overarching goals, strategic objectives, and implementation tactics for promoting the modernization of the NIH-funded biomedical data science ecosystem. As part of the planning process, NIH has published a draft of the strategic plan today, along with a Request for Information (RFI) to seek input from stakeholders, including members of the scientific community, academic institutions, the private sector, health professionals, professional societies, advocacy groups, patient communities, as well as other interested members of the public.

On behalf of Dr. Jon Lorsch, Director of the National Institute of General Medical Sciences and co-chair of the NIH Scientific Data Council, which is overseeing development of the Strategic Plan for Data Science, I encourage your comments and suggestions. Responses should be submitted via an online form by April 2, 2018.


  1. I like the initial NIH draft ( that outlines the background, motivational needs, and forward-looking goals. Some specific directions that can be strengthened in the final NIH strategic plan for data science include:

    • Focus on research to develop novel statistical techniques for data obfuscation (data-sifting) that can be used to de-identify sensitive data (e.g., PHI, EHR/EMR). The FAIR data sharing principles are terrific, but lots of investigators and organization hoard data under privacy and personal protection pretext. The rate of data collection does increase exponentially, however, its value decreases also exponentially from the point is collected! To enable rapid, effective and secure sharing of data, we need to design, implement and broadly adapt statistical obfuscation methods.
    • Support the development of powerful compressive big data analytic methods that facilitate on-the-fly handling, processing, visualization, inference and analytics of messy data – (1) large in size, (2) multi-scale resolution, (3) incomplete, (4) multisource, (5) incongruent formats, and (6) complex (longitudinal, heterogeneous, high-dimensional).
    • Enforce (1) collaboration between multiple transdisciplinary investigators, (2) sharing of data, protocols, tools and services, and (3) “continuous development” of algorithms, software, pipeline workflows, and technologies. It’s not practical, cost-effective, or to wait for dissemination until a product is complete, validated, proven and standardized – we need to embrace “continuous development” strategies. Federally funded activities should be required to embrace “open-science” principles even during their development phase!
    • NIH CSR process is dated. “Innovation” is required on paper; however, most NIH review panels/study sections stifle novel ideas as risky, uncertainty and too much chance in status-quo! Proposal review panel formation, logistics, meeting organization and decision-making should involve young scholars, non-scientists, and entrepreneurs! There should be a term-limits for reviewers to discourage “life-long professional reviewers”, who may be risk-averse, disruption-intolerant, or too entrenched.

  2. I was terribly underwhelmed by the NIH plan for data science. It lacks vision and clearly auditable goals that are coupled with real world outcomes for individual Americans. It seems to be just a plan to store, analyze, and allow retrieval of datasets without regard to how individual Americans may benefit. Obviously the plan is not associated with preventing or curing or even managing illness or lowering costs. And since it has no clearly auditable goals with regard to the aforementioned, there is clearly no need to plan for failure because success is so poorly defined.
    Id like to think “big data” and soft computing via nature inspired algorithms should do more than “extract knowledge” And so NIH’s plan reflects the cobra effect in research and that there is a lack of motivational intensity at NIH with regard to lowering the prevalence of illness. So, ultimately, NIH’ s plan is consistent with their efforts to learn as much as they desire about illness, as well as academic capitalism, without regard to real world outcomes. As we all know, NIH focuses on “understanding” illness and learning about mechanisms and has resisted efforts at monies being spent for translational research. So much of the research done, is wasteful, and of questionable reliability. There is no overall method modeling of research at NIH and consumer groups like MeAction and lupus organizations have shown their dismay at the regulatory failure of NIH to make real progress for certain conditions. The data plan doesn’t address these issues and this further evidences that NIH is not focused on the public but is rather too focused on the needs of academics and their epistemic communities, who lack vision, as well with regard to lowering the prevalence of illness.
    So NIH’s strategic data plan is, much ado about doing the same old same old and allowing the old boy network to continue to focus on pet interests regardless of their applicability to individuals needs. But since NIH is largely deaf to dialogue and has trouble even with a courtiers response I don’t need to be Bernoulli or Bayes to know my comments will be ignored. NIH already has made up its mind and is intent on doing what best serves themselves and their affiliates in academia and industry.
    NIH has had trouble with the Office of Research Integrity over the last few years, as well. Suboptimized research is the rule at NIH. There is little interest in designing a system that directly and efficiently serves individual need. In fact, as we know research at NIH commits the fallacy of appeal to probability and since the focus is on population health, research is stuck in the iron cage of a calculus of probabilities and chances that, is neither adaptive nor progressive by design. Why should anyone who truly cares about individuals wellbeing promote more of that? So its no wonder NIH’s strategic plan doesn’t target individual wellbeing and will continue to promote the anti-patterns, paradigm paralysis , glacial pace of translation, and escalation of commitment that has lead to greater costs of health care without any lowering of the prevalence of illness. I will oppose the plan with my Congressional representatives.
    The NIH strategic plan is a visionless plan and is too loosely coupled with the needs of individuals and the larger society. Taxpayers will be paying for a database that will largely serve the academic interests of government and academics and is not meant to be of service to individual taxpayers.

  3. Patient Generated Health Data, generated using mobile apps and IoT/sensors/wearable, data from patient ordered labs/tests (also called quantified self-data in some context), is growing very rapidly. This includes new types of data that are currently not available to clinicians and physicians but can provide help to understand disease cause (e.g., indoor air quality for an asthma patient), and support new health management strategies such as self-monitoring, self-appraisal, self-management, timely intervention and disease progression tracking/predictions. I wish the report recognizes this exciting new opportunity for better collecting and utilizing such data.

Before submitting your comment, please review our blog comment policies.

Leave a Reply

Your email address will not be published. Required fields are marked *