CHICAGO—Big data, also known as data science, is an exciting, emerging field of research, and the opportunities are expanding, according to panelists of Answering Clinically Relevant Questions Using Large Datasets, a session at the 2018 ACR/ARHP Annual Meeting.
In 2013, the National Institutes of Health (NIH) launched Big Data to Knowledge (BD2K), an initiative to facilitate the use of large biomedical data sets for research, the design of new research tools and methodologies, and training researchers. The initiative promotes discovery in this new landscape, said Kenneth J. Ottenbacher, PhD, OTR, Russell Shearn Moody Distinguished Chair in Neurological Rehabilitation at the University of Texas Medical Branch, Galveston. He is also a principal investigator at the Center for Large Data Research and Data Sharing in Rehabilitation (CLDR).
Accessible, Shareable Data
“Data need to be findable, accessible, interoperable and reusable. That means the data are out there and can be used again,” said Dr. Ottenbacher. “This [approach] is a new concept of data science within the NIH. There are so many new, different ways we can use these new, different types of data. We should be trying to take advantage of that.”
Big data in biomedical research includes the National Patient-Centered Clinical Research Network (PCORnet), a program designed to make clinical research faster, easier and cheaper by using large-scale data sets. PCORnet’s Open Door program provides training to researchers and access to clinical data.
“Another area of research that is gaining interest is data from electronic health records [EHRs],” said Dr. Ottenbacher. “Healthcare systems are expanding and merging, and paying attention to their data through Epic or other medical records systems. There is a lot of interest in how we can use those EHRs to not just plan for care, but also understand care better. We do a lot of work with Medicare data. In five to 10 years, most of the people who do what we do won’t be using Medicare data anymore. They’ll be doing their research with [EHRs].”1
Large data sets available to investigators include:
- All of Us, the cornerstone of the NIH’s Precision Medicine Initiative, which has a goal of recruiting 1 million volunteers to provide health data that will be available for clinical research;
- The National Institute of Child Health and Human Development’s Data and Specimen Hub (DASH), a bank of de-identified clinical data from its funded studies for use in secondary research;
- The Multicenter Osteoarthritis Study (MOST), a collection of patient case report data and knee joint imaging; and
- The Osteoarthritis Initiative (OAI), an NIH-sponsored observational osteoarthritis (OA) study that offers investigators a large bank of raw data and more than 20 million images; and
- The Arthritis Foundation and the OsteoArthritis Society International (OARSI) co-sponsored a 2016 study to demonstrate how to correctly classify OA patients using the national clinical repository of the U.S. Veterans Health Administration.2
Researchers may find relevant data sets through the NIH-supported bioCADDIE, a searchable index of data repositories available for sharing. The NIH is also funding grants to teach researchers how to archive data to encourage data sharing, said Dr. Ottenbacher, who works with CLDR, which funds pilot projects and archives data related to physical disability, rehabilitation and recovery.