Datasets
The following summaries provide a brief overview of the Regenstrief Center for Healthcare Engineering’s past and current datasets.
For more information on data resources contact Mohammad Adibuzzaman at madibuzz@purdue.edu.
![]() |
The data set has the EHR of 69M patients spanning 20 years and 750 facilities in the US. The size of the data set is ~6 TB. This data set has significant potential for Purdue researchers at the intersection of life science and data science. This longitudinal data set gives Purdue a unique edge over other engineering focused universities. |
![]() |
MIMIC is an openly available dataset developed by the MIT Lab for Computational Physiology, comprising deidentified health data associated with >40,000 critical care patients. It includes demographics, vital signs, laboratory tests, medications, etc.. |
![]() |
Indiana Family and Social Services Medicaid Data This data set is comprised of de-identified data of all Medicaid enrollees in Indiana focusing on healthcare improvements for long-term care (LTC) and substance use disorder (SUD) patients. It includes demographics, provider information, diagnoses, procedures, and medications, from the Indiana Medicaid Enterprise Data Warehouse (EDW). This encompasses over 5 years of data and will be updated every 6 months. |
![]() |
Purdue Claims Data Fully deidentified dataset of medical claims, prescription claims, eligibility, biometrics, and Johns Hopkins risk assessment information for Purdue University, starting in 2014. This data set is available to researchers pending IRB approval. See HIPAA for more details on RCHE's data access policy and procedures. Data Dictionary (requires PUCA credentials to view) |
![]() |
Kaggle contains a wide variety of datasets including some healthcare related datasets. |
![]() |
Kaggle assists in analyzing the ongoing spread of this infectious disease, posted by CDC. |
![]() |
Health Insurance Marketplace Dataset Kaggle assists in exploring health and dental plans data in the US Health Insurance Marketplace, posted by US Department of Health and Human Services. |
![]() |
Virtual Pediatric Systems (VPS) Pediatric ICU Data VPS holds the largest research database in pediatric critical care in the world and encourages research utilizing the data. RCHE has contacts within the association and can help get researchers started on a research request. Further reading on this data set as well as information on other pediatric critical care research resources can be found here. |