RESEARCH COMPUTING RESOURCE PROFILE: DATA SCIENCE

data

SEPTEMBER-OCTOBER 2019 |

ITaP Research Computing offers campus researchers hardware for computationally demanding and data driven research, as well as expert staff to assist them.

One way ITaP Research Computing is helping Purdue researchers decrease their time to science is through their support of the data science work of researchers, students and Purdue’s larger Integrative Data Science Initiative.

The toolkit

Purdue’s newest community cluster supercomputer, Gilbreth, has GPUs on every node and is optimized for machine learning and deep learning. Gilbreth plays an important role in the success of Purdue’s Integrative Data Science Initiative (IDSI), a Purdue-wide initiative to help support and coordinate research, education and broader engagement in data science at Purdue, with a goal of bringing “data science to all.” Gilbreth is used by some of the research projects that have been funded by the IDSI, and is also used by the Data Science Consulting Service, a component of the IDSI that offers Purdue researchers hands-on support with data analysis.

Community clustering makes more computer power available for Purdue researchers than faculty and campus units could individually afford, all with the added benefit that ITaP Research Computing installs and maintains the clusters and provides expert staff support. All community cluster supercomputers provide web browser-based tools for interactive data science, in addition to more traditional batch computing.

For researchers who don’t need the full power of Purdue’s supercomputers but are doing computationally intensive data analysis that may be too much for a laptop to handle, Data Workbench provides a solution. Data Workbench is an interactive computing environment that provides access from anywhere to web-based data analysis tools such as JupyterHub and R Studio Server.

In addition, ITaP Research Computing offers a variety of research data storage solutions, including Data Depot. Purdue hosts its own version of GitHub, the version control system. Staff from ITaP Research Computing periodically offer Software Carpentry workshops to introduce researchers to modern scientific computing. These workshops teach the basics of Unix, version control with Git, and programming languages that are useful for data science, such as Python and R.

Staff expert

Geoff Lentner, ITaP Research Computing data scientist. Like most of the Research Services and Support team, Geoff combines a science background (in his case, physics and astronomy) with expertise in high-performance computing, and helps faculty members use ITaP Research Computing resources to accomplish their research more efficiently. Geoff also has specific experience in working with data and helps research groups better manage the way they collect, store and analyze their data. He’s also the go-to person for help with research software engineering in Python. Geoff and the other ITaP Research Computing staff experts can be reached at rcac-help@purdue.edu.

Faculty research highlight

Jason Ackerson, assistant professor of agronomy, uses Data Workbench and its web-based data analysis tools to speed up computations related to his research in soil mapping.

“It’s taken something that was a real chore, and it’s now a trivial task,” says Ackerson, who used to leave his personal computer running for days to do what he can now do on Data Workbench in a few hours.

Educational highlight

Statistics professor Mark Daniel Ward has taught big data analysis to Purdue freshmen using the Scholar cluster. The cluster has also been used by The Data Mine, the data science learning community that is part of the IDSI. Beginning in fall 2019, Scholar now offers GPU nodes for instructors to teach accelerated applications from machine learning to Cryo-EM microscopy.

For more information about ITaP Research Computing resources and services, email rcac-help@purdue.edu or contact Preston Smith, ITaP’s director of research services and support, psmith@purdue.edu or 49-49729.

Writer: Adrienne Miller, ITaP science and technology writer, mill2027@purdue.edu