Research Foundation News

May 5, 2020

System designed to improve database performance for health care, IoT

chaterji-flow Workflow of SOPHIA with offline model building and the online operation. It also demonstrates the interactions of the noSQL cluster and a static configuration tuner called Rafiki. (Image provided) Download image

WEST LAFAYETTE, Ind. – Sometimes it is best to work smarter and not harder. The same holds true when it comes to peak performance for databases.

One of the big challenges for using databases – whether for health care, Internet of Things or other data-intensive applications – is that higher speeds come at a cost of higher operating costs, leading to over-provisioning of data centers for high data availability and database performance.

With higher data volumes, databases may queue workloads, such as reads and writes, and not be able to yield stable and predictable performance, which may be a deal-breaker for critical autonomous systems in smart cities or in the military.

A team of computer scientists from Purdue University has created a system, called SOPHIA, designed to help users reconfigure databases for optimal performance with time-varying workloads and for diverse applications ranging from metagenomics to high-performance computing (HPC) to IoT, where high-throughput, resilient databases are critical.

The Purdue team presented the SOPHIA technology at the 2019 USENIX Annual Technical Conference. A snapshot of the technology can be found here: https://bit.ly/ICAN-noSQL-optimization.

“You have to look before you leap when it comes to databases,” said Somali Chaterji, a Purdue assistant professor of agricultural and biological engineering, who directs the Innovatory for Cells and Neural Machines [ICAN] and led the paper. “You don’t want to be a systems administrator who constantly changes the database’s configuration parameters, naïvely, with a parameter space of more than 50 performance-sensitive and often interdependent parameters, because there is a performance cost to the reconfiguration step. That is where SOPHIA’s cost-benefit analyzer comes into play, as it performs reconfiguration of noSQL databases only when the benefit outweighs the cost of the reconfiguration.”

chaterji-sophia The effect of reconfiguration on the performance of the system. SOPHIA uses the workload duration information to estimate the cost and benefit of each reconfiguration step and generates plans that are globally beneficial. (Image provided) Download image

Purdue’s SOPHIA system has three components: a workload predictor, a cost-benefit analyzer, and a decentralized reconfiguration protocol that is aware of the data availability requirements of the organization.

“Our three components work together to understand the workload for a database and then performs a cost-benefit analysis to achieve optimized performance in the face of dynamic workloads that are changing frequently,” said Saurabh Bagchi, a Purdue professor of electrical and computer engineering and computer science (by courtesy). “The final component then takes all of that information to determine the best times to reconfigure the database parameters to achieve maximum success.”

The Purdue team benchmarked the technology using Cassandra and Redis, two well-known noSQL databases, a major class of databases that is widely used to support application areas such as social networks and streaming audio-video content.

“Redis is a special class of noSQL databases in that it is an in-memory key-value data structure store, albeit with hard disk persistence for durability,” Chaterji said. “So, with Redis, SOPHIA can serve as a way to bring back the deprecated virtual memory feature of Redis, which will allow for data volumes bigger than the machine’s RAM.”

The lead developer on the project is Ashraf Mahgoub, a Ph.D. student in computer science. This summer he will go back for an internship with Microsoft Research, and when he returns this fall, he will continue to work on more optimization techniques for cloud-hosted databases.

The Purdue team’s testing showed that SOPHIA achieved significant benefit over both default and static-optimized database configurations. This benefit stays even when there is significant uncertainty in predicting the exact job characteristics.

The work also showed that Cassandra could be used in preference to the recent popular drop-in ScyllaDB, an auto-tuning database, with higher throughput across the entire range of workload types, as long as a dynamic tuner, such as SOPHIA, is overlaid on top of Cassandra.

SOPHIA was tested with MG-RAST, a metagenomics platform for microbiome data; high-performance computing workloads; and IoT workloads for digital agriculture and self-driving cars.

The team worked with the Purdue Research Foundation Office of Technology Commercialization to patent the technology. The office recently moved into the Convergence Center for Innovation and Collaboration in Discovery Park District, adjacent to the Purdue campus.

The National Institutes of Health provided support for some of the research through an NIH-R01 grant. The Purdue team worked with Folker Meyer, a computational biologist from Argonne National Laboratory and a professor of bioinformatics in the Department of Medicine at the University of Chicago, for the metagenomics application MG-RAST.

The creators are looking for partners to commercialize their technology. For more information on licensing this innovation, contact Matt Halladay of OTC at mrhalladay@prf.org and mention track code 2019-BAGC-68646.

About Purdue Research Foundation Office of Technology Commercialization

The Purdue Research Foundation Office of Technology Commercialization operates one of the most comprehensive technology transfer programs among leading research universities in the U.S. Services provided by this office support the economic development initiatives of Purdue University and benefit the university's academic activities through commercializing, licensing and protecting Purdue intellectual property. The office recently moved into the Convergence Center for Innovation and Collaboration in Discovery Park District, adjacent to the Purdue campus. In fiscal year 2019, the office reported 136 deals finalized with 231 technologies signed, 380 disclosures received and 141 issued U.S. patents. The office is managed by the Purdue Research Foundation, which received the 2019 Innovation and Economic Prosperity Universities Award for Place from the Association of Public and Land-grant Universities. In 2020, IPWatchdog Institute ranked Purdue third nationally in startup creation and in the top 20 for patents. The Purdue Research Foundation is a private, nonprofit foundation created to advance the mission of Purdue University. Contact otcip@prf.org for more information.      

About Purdue University

Purdue University is a top public research institution developing practical solutions to today’s toughest challenges. Ranked the No. 6 Most Innovative University in the United States by U.S. News & World Report, Purdue delivers world-changing research and out-of-this-world discovery. Committed to hands-on and online, real-world learning, Purdue offers a transformative education to all. Committed to affordability and accessibility, Purdue has frozen tuition and most fees at 2012-13 levels, enabling more students than ever to graduate debt-free. See how Purdue never stops in the persistent pursuit of the next giant leap at purdue.edu.

Writer: Chris Adam, 765-588-3341, cladam@prf.org 

Sources:
Somali Chaterji, schaterji@purdue.edu

Saurabh Bagchi, sbagchi@purdue.edu


Research Foundation News

Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, (765) 494-4600

© 2015-20 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by Office of Strategic Communications

Trouble with this page? Disability-related accessibility issue? Please contact News Service at purduenews@purdue.edu.