Title: Patient Equations and the Future of Life Sciences
Abstract: How must the evaluation of safety and efficacy in research change as therapies become more precise and personalized? How can problems of equity and access in medicine be solved? My combining traditional clinical trial designs with more modern mathematical approaches, predictive models for the best therapies for any given patient at a particular time can and will improve.
Title: Predicting drug response and synergy using deep learning models of human cancer
Abstract: Most drugs entering clinical trials fail, often related to an incomplete understanding of the mechanisms governing drug response. Machine learning techniques hold immense promise for better drug response predictions, but most have not reached clinical practice due to their lack of interpretability and their focus on monotherapies. To address these challenges I will describe development of DrugCell, an interpretable deep learning model of human cancer cells trained on the responses of thousands of tumor cell lines to thousands of approved or exploratory therapeutic agents. The structure of the model is built from a knowledgebase of molecular pathways important for cancer, which can be drawn from literature or formulated directly from integration of data from genomics, proteomics and imaging. Based on this structure, alterations to the tumor genome induce states on specific pathways, which combine with drug structure to yield a predicted response to therapy. The key pathways in capturing a drug response lead directly to design of synergistic drug combinations, which we validate systematically by combinatorial CRISPR, drug-drug screening in vitro, and patient-derived xenografts. We also explore a recently developed technique, few-shot machine learning, for training versatile neural network models in cell lines that can be tuned to new contexts using few additional samples. The models quickly adapt when switching among different tissue types and in moving to clinical contexts, including patient-derived xenografts and clinical samples. These results begin to outline a blueprint for constructing interpretable AI systems for predictive medicine.
Title: Embracing ambiguity when characterizing microbes
Abstract: Determining the identity of an organism in a sample and determining the function of a gene within an organism are key steps in analytical pipelines used in both basic research and clinical applications. A range of computational techniques are used to perform these tasks, including: database searches, machine learning, protein structure prediction, etc. The majority of the approaches used today aim to provide a definitive answer, perhaps with an associated confidence estimate. In my talk, I will argue that it is often valuable to report a broader set of plausible answers, approach that can reveal information about the structure of reference databases and can provide a more nuanced analysis of sequences that have not been previously characterized. My presentation will focus on taxonomic annotation using our software Atlas that relies on a new approach for searching biological databases.
Title: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
Abstract: With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions. Explanations for black box models are not reliable, and can be misleading. If we use interpretable machine learning models, they come with their own explanations, which are faithful to what the model actually computes. I will discuss examples related to seizure prediction in ICU patients and digital mammography.
Title: Stable Discovery of Interpretable Subgroups via Calibration in Causal Studies
Abstract: Building on our predictability, computability and stability (PCS) framework (Yu and Kumbier, 2020) and for randomised experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re-analysis of the 1999–2000 VIGOR study, an 8076-patient randomised controlled trial that compared the risk of adverse events from a then newly approved drug, rofecoxib (Vioxx), with that from an older drug naproxen. Vioxx was found to, on average and in comparison with naproxen, reduce the risk of gastrointestinal events but increase the risk of thrombotic cardiovascular events. Applying StaDISC, we fit 18 popular conditional average treat- ment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well-calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the gastrointestinal outcome (totalling 29.4% of the study size) and the thrombotic cardiovascular outcome (totalling 11.0%). Complementary analyses of the found subgroups using the 2001–2004 APPROVe study, a separate independently conducted randomised controlled trial with 2587 patients, provide further supporting evidence for the promise of StaDISC.
PCS paper can be found at https://www.pnas.org/content/117/8/3920
StaDISC paper can be found at https://onlinelibrary.wiley.com/doi/abs/10.1111/insr.12427