Bindley Bioscience Center

Computational Life Sciences & Informatics

Overview

tech_clsiWhat is Computational Life Sciences and Informatics (CLSI)?
Computational Life Sciences and Informatics is one of the most important and exciting areas in all of science and technology, as it is positioned at the intersection of modern biology, quantitative modeling and high performance computing. It focuses on the development and application of computational tools and techniques to solve complex problems in biosciences. CLSI helps provide fundamental understanding of complex biological systems and offers the potential to significantly impact a wide variety of technologies, including drug discovery, novel therapies for human, animal and plant diseases, metabolic engineering and efficient production of traditional and high-value foodstuffs. Research in Computational Life Sciences and Informatics (CLSI) at Bindley Bioscience Center focuses on understanding (and predicting) life using a "Systems Biology" approach. Systems Biology aims at system-level understanding of biological systems, through which the "group of parts" that make up "the whole" are connected one to another and work together. The ultimate goal of Systems Biology is to develop in-silico bio systems. As a complex discipline, Systems Biology acquires data from all biological fields, including genetics, biochemistry, structural biology, cell biology, physiology, biophysics, and more.

Computer science research contributes methods and tools for acquiring, storing, organizing, archiving, analyzing and visualizing biological data and phenomena. In addition, the management and mining of large databases of bioinformatics data must also be achieved. Computational Life Sciences and Informatics can be divided into three major categories:

  • Bioinformatics: The research, development or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
  • Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
  • Systems Biology: The development of quantitative, mechanistic based models of the whole cell, collections of cells or large pieces of the cellular machinery, where the objective is an integrated picture that compliments the reductionist viewpoint of molecular biology.

Efforts thus far have been focused on the Purdue Omics Discovery Pipeline project (omicsDP.org), see below.

pipeline


CLSI Software Development and Resources:

The Omics Discovery Pipeline

The Omics Discovery Pipeline (ODP) (Previously, Proteomics Discovery Pipeline or PDP) is a web-based analysis platform that provides proteomics and metabolomics data analysis without the requirement of specialized hardware or input from bioinformatics specialists for initial data analyses. Functionalities of the ODP include spectrum visualization, deconvolution, alignment, normalization, statistical significance tests, and pattern recognition. The ODP provides proteomic researchers with a user-friendly web-based data analysis package that can handle multiple file formats and facilitates data analysis from multiple proteomics technology platforms.

The Omics Discovery Pipline includes the following tools developed at the BBC:

  • XMass: Spectrum Deconvolution and Peak Picking
  • XAlign: Peak Alignment

More information about the Omics Diescovery Pipeline can be found on the project website omicsDP.org and through this journal publication.

Omics DP


BioNet

Current Release: BioNet Version 1.0 - Download

BioNet is an interactive visual data mining application for analyzing intermolecular correlations using various statistical methods. BioNet can perform interactive comparative, correlative, and time-course analysis of molecular expression data. Correlation analysis supports several visualization features such as Lamada-Kuwai, Frutchterman-Reingold, and Spring embedding layouts. It also implements essential operations to intuitively manipulate the data during visualization such as selecting and hiding nodes, highlighting neighbors, and selecting sub-networks. As the elements of the visualized network are manipulated, BioNet continuously displays and updates a rich network information panel that includes molecule/correlation details, topological information, node-degree distribution, and neighborhood connectivity. Search could also be performed to filter out molecules from selection. BioNet also provides comparative analysis of molecular expression data, through analyzing expression-data distribution across samples, groups, and time points with boxplot display, and clustering data based on molecular concentration using Self Organizing Maps (SOM), K-Means, and other algorithms.


MetAlign

MetAlign Mass Spectrometry Dataset Analysis


MetPP

Metabolomics Profiling Pipeline


Infrastructure

Life sciences research within Purdue and with external collaborators is supported by the Bindley Bioscience Center with the Computational Life Sciences Data System (CLSDS).  This system is designed to:

  • Capture all information from wet experiments to data analysis;
  • Automatically transfer data into a central data archival system;
  • Enable web access for data mining; and
  • Gain data format standardization.

The BBC network infrastructure includes two major interconnected areas: an instrument farm and a data farm.  The instrument farm consists of research instruments and computers used for instrument control and data collection.  Data gathered from research instruments is automatically stored on BBC servers and the BBC Storage Area Network (SAN).  This data farm is the back end of the BBC network that handles data storage and backup, as well as data access.  The data farm provides a centralized storage location of research data that can be accessed by BBC scientists as well as collaborators from other universities and institutions.  Each server and device in the data farm is covered by a backup solution that provides multiple copies of research data in the event of data loss or corruption.  The data farm is also accessible via a number of analysis workstations.  These workstations are used to process and analyze the gathered data using the Omics Discovery Pipeline and other statistical tools.


CLSI Projects


Contact Information


Erik Gough
Informatics
Office: BIND 119