NEW HUB ADDRESSES CHALLENGES WITH PRESERVING, SHARING RESEARCH DATA

Tablet showing Data Center hub

JAN.-FEB. 2017 |

All too familiar with the phrase “publish or perish,” researchers aren’t shy about sharing their results with colleagues, but they often don’t effectively store and share the full set of data that leads to those results.

“This is a large problem, and it needs a solution,” says Santiago Pujol, a professor of civil engineering at Purdue and the academic director for ITaP Research Computing, who worries that challenges in preserving research data can result in lost knowledge and unnecessary duplication of work. Pujol and ITaP research scientists are developing a platform, called DataCenterHub, that can help mitigate such concerns.

People have created databases for storing research data, but they’re typically specific to an individual field of study and inadequate for other researchers’ needs. Pujol and his then-graduate student Lucas Laughery, who is now a postdoctoral researcher at Purdue, realized the need for a new solution.

Pujol and Laughery turned to ITaP Senior Research Scientist Ann Christine Catlin and her team for help building a solution. The result of that collaboration is DataCenterHub, a repository that preserves data from all kinds of experiments and presents it to researchers in a clear, easily accessible way.

Rather than requiring a user to click through multiple screens to learn additional details about an experiment, DataCenterHub presents its uploaded datasets in a table, with each experiment in its own row and columns for attributes such as experiment title, source and date. A user can sort the datasets by any of these attributes or perform a keyword search. This makes finding a particular dataset much easier and also enables comparisons between datasets.

DataCenterHub also organizes different kinds of data into separate groups, and provides formatting guidelines for certain types of files, resulting in a standardized format for uploaded data that eliminates inconsistencies between datasets uploaded by different users.

“We’re trying to show researchers that instead of putting their data in a safe and setting that safe off to the side, they can put it into a (digital) safe that has windows into it and has shelves that help you organize it,” says Laughery.

DataCenterHub currently hosts almost 30 terabytes of research data, with more being added every day. Faculty or graduate students who would be interested in attending a presentation to learn more about DataCenterHub and how it might help with their research data needs should contact Lucas Laughery, llaugher@purdue.edu.

– Adrienne Miller, ITaP technology writer (mill2027@purdue.edu)