Skip to main content

Data Hub Overview

Leveraging Purdue’s advanced engineering and computer science skills, Data Hub serves as a collaborative hub for the integration of life science data and enhanced analytics to provide the functional backbone with logistical support for both the Purdue and broader research community. By harnessing the various sources of data available (i.e. electronic health records, medical claims, genomics, wearables) onto one platform, investigators have access to more robust and diverse data sets to solve critical life science research problems. The cyberinfrastructure of Data Hub has a cluster computing infrastructure architected by the Regenstrief Center in collaboration with the Department of Computer Science. The hardware, security and HIPAA aligned server maintenance efforts are supported by the Rosen Center for Advanced Computing at Purdue. The framework is developed with open source technologies to support scalability, and data aggregation of heterogeneous sources. Currently, the system hosts several large EHR and claims data sources such as Cerner Health Facts (69M patients), Indiana Medicaid Claims data, Purdue Employee claims data, among others with a storage capacity of 300 terabytes.