Data science pairs with cancer research for better diagnostics, therapies

The next generation of treatments for cancer may be found, not by scientists peering through microscopes, but by computer scientists crunching numbers. Thanks to unprecedented amounts of data, Purdue University researchers from the College of Science are using innovative data science techniques to better understand the genetics and cellular biology of cancer cells and tumors.

This work allows them to pioneer new diagnostic tools, generate novel therapeutic treatments and significantly advance the fight against cancer. Some of these advances may even allow oncologists to harness a patient’s own immune system to fight off cancer.

Previously, scientists had to rely on small sample sizes, case studies and, in some lucky cases, genetic or DNA analyses of tumors. Now, they can draw from enormous publicly available databases that include an almost mind-numbing amount of data: information on people with different types of cancer across an enormous spectrum of continents, races, cultures, genders and age, as well as the genetics of hundreds of thousands of individual cells that make up tumors and other tissues. There is so much data, in fact, that traditional analytical tools fail.

That is where data science comes in.

Data science is a field of science that uses advanced computer modeling and mathematics to analyze complex sets of data: data sets that are enormous and even those that include different kinds of data. It allows scientists to better understand problems and to find paths through the chaos.

Andrew Mesecar, Purdue’s Walther Professor in Cancer Structural Biology, is deputy director of the Purdue University Center for Cancer Research (PCCR) where he helps lead interdisciplinary teams of computational biologists, biochemists, statisticians, computer scientists and immunologists. An $11 million endowment from the Walther Cancer Foundation helps support the work.

“We have to figure out how to mine all of this data for meaning. Contained within all this genetic information are potential new molecular targets for cancer therapies and new biomarkers to detect and track cancer. We are pioneers. We are making the big leaps,” Mesecar said.

“In the future, when a cancer patient comes in, we are going to be able to monitor the genomics of their cancers in real time, to make predictions about the course a cancer might take and to make real-time decisions about what therapeutics to use. We are not going to have to wait and see if they respond to the drugs first and then change when they are too far along in treatment. What if they die? What if you put them through side effects without reducing their cancer?”

That worry — that without the right treatment, or the right treatment at the right time, patients could die — is at the heart of what Purdue’s Center for Cancer Research does. It is what drives the researchers, what inspires the lab work and what keeps scientists at their supercomputers and their lab benches, and what keeps them working together and learning from each other. With the rest of the College of Science, these researchers are committed to the persistent pursuit of the mathematical and scientific knowledge that forms the very foundation of innovation and pave the way for practical solutions to today’s challenges.

A doctor crunching the numbers

Min Zhang began her career as a physician treating cancer patients in a hospital. After 12 years in medical school, she decided the answers to cancer might lie in the numbers. Now a statistics professor and associate director of data science at PCCR, Zhang develops statistical methods and applies them to cancer data, hoping to glean new insights into the early detection and diagnosis of cancer.

“We started to have a lot of data generated by the labs and clinics,” Zhang said. “The data are all mixed, all different kinds. We had to figure out how to extract information from these data and translate it into knowledge that people can use.”

Researchers have studied products of metabolism in the body — sugars, amino acids and other molecules called metabolites — to attempt to predict whether a patient has or will get cancer. But looking at one metabolite at a time did not show any strong patterns. When Zhang and her collaborators began using data science techniques to analyze groups of biologically related metabolites, however, they found a different story.

“The metabolites do not act in isolation; they work together to perform specific functions,” Zhang said. “When we look at groups of metabolites, we gain significantly more statistical power. When we look at the individual metabolite, only one is marginally significant. But when we studied them all together, there were very significant results.”

Using this method will provide more reliable biomarkers that could allow doctors to do things like screen patients for colorectal cancer, or even polyps, using a blood sample rather than an invasive procedure like a colonoscopy.

Zhang and her collaborators also developed machine-learning methods to study ways that the genes regulate each other on a genomewide scale as cancer progresses. Understanding how individual genes change and interact with others is key for treating cancer.

“When we treat patients with chemotherapy drugs at the very beginning, sometimes they respond but eventually they stop responding,” Zhang said. “If you target one gene, the cancer cells can adapt and take another route that allows them to keep growing. If we can target the whole network of genes and design a combination therapy, there is no way for cancer cells to survive. The genomewide causal gene regulatory networks constructed using newly developed machine-learning tools will provide multiple targets for novel therapeutic approaches development.”

Computer scientists hit the lab bench

Majid Kazemian started working as a computer scientist. But when he began working to help analyze cancer research data, he got curious about how his models played out in real life. Now an assistant professor of biochemistry and computer science, his Purdue lab is evenly divided between biologists and computer scientists.

“In the past few years, the amount of publicly available cancer data has increased exponentially,” Kazemian said. “We have advanced to being able to study cancer at a cellular level, cell by individual cell. We can now mine this data for patterns to generate novel hypotheses that we never have thought of before. Then we can go back into the lab and test the hypotheses.”

This method allows Kazemian and his lab to give new life to an old idea: What if they could get the patient’s own immune system to fight the cancer, without the need for drugs or therapies with the possibility of devastating side effects?

Some of the scariest cancers — including some forms of cervical cancer, liver cancer, non-Hodgkin’s lymphoma, stomach cancer — are caused by viruses. But once the virus causes the cancer, the cancer negates the virus so that it is no longer harmful. Kazemian’s lab is trying to use data science approaches to figure out how to reactivate the virus in the living cells, alerting the cancer patient’s immune system and allowing it to better fight the cancer cells — weaponizing the actual virus that caused the problem and the patient’s own body to defeat the cancer.

The team also is looking at other ways to harness and boost patients’ immune systems to combat cancer.

“The majority of the success we have seen in cancer biology is a range of concepts related to immunotherapy, drugs that can boost your immune system itself to fight back the cancer,” Kazemian said.

The idea is not new, but the approach would be impossible without the partnership between cancer research and data science.

Saving lives with numbers

Nadia Lanman, another scientist who began working with computers and large data sets and ended up focusing on cancer data, is a research assistant professor of comparative pathobiology. She uses Purdue’s network of supercomputers to enable machine-learning projects that help sort and analyze data. She is helping scientists analyze data in new ways, allowing them insights into the data and better pathways forward.

“When someone comes in with cancer, we don’t know how they’re going to respond to different types of treatment, and we don’t know how sick they’re going to get from potential side effects,” Lanman said. “If we can tease these things out using machine learning and these massive data sets, we could imagine a world were when a cancer patient comes in; we could collect data and use data science to help oncologists make recommendations.”

Lanman reiterated her mission and the mission of the cancer center: to make discoveries that will build the foundation for innovative cancer solutions. The cure for cancer is not in one field or the other; it is in experts from all fields working together using the most up-to-date data and analytical techniques.

“I love the work I do — that we do — at the cancer center,” Lanman said. “I love that we are really trying, every day, to make the world a better place for patients.”

About Purdue University Center for Cancer Research

At the Purdue University Center for Cancer Research, our mission is basic discovery — discovery that is the foundation for innovative cancer solutions. PCCR leverages Purdue’s strengths in engineering, veterinary medicine, nutrition science, chemistry, medicinal chemistry, pharmacy, structural biology and biological sciences to establish its foundational base. As a National Cancer Institute-designated cancer center, PCCR is making significant contributions to emerging technologies, such as cancer vaccines and combination chemotherapy. We specialize in translational research that saves lives by translating laboratory findings into new and innovative therapies as quickly as possible. Our mission is discovery. Our goal is to cure cancer.

About Purdue University

Purdue University is a top public research institution developing practical solutions to today’s toughest challenges. Ranked the No. 5 Most Innovative University in the United States by U.S. News & World Report, Purdue delivers world-changing research and out-of-this-world discovery. Committed to hands-on and online, real-world learning, Purdue offers a transformative education to all. Committed to affordability and accessibility, Purdue has frozen tuition and most fees at 2012-13 levels, enabling more students than ever to graduate debt-free. See how Purdue never stops in the persistent pursuit of the next giant leap at

Writer, Media contact: Brittany Steff, 765-494-7833, 

Sources: Andrew Mesecar,

              Min Zhang,

              Majid Kazemian,

              Nadia Lanman,

scientist in lab
Scientists at Purdue University are using data science to help discover how the next generation of treatments for cancer may be found. (Purdue University photo/Rebecca McElhoe)