Big Data Training for Cancer Research Purdue University Center for Cancer Research

Course Topics

The format of the workshop includes a combination of short instructional videos, special guest lectures, real-data analyses practices and live demonstration sessions. The workshop aims to facilitate the learning of practical bioinformatics skills and build familiarity and basic competency. Using established tools and publicly available resources, we will focus on the analyses and interpretation of genomic and genetic data, making it more suitable for researchers with limited big data analytical skills.

Unit 1: Transcriptomic Analyses

This session of the course will cover bulk and single-cell RNA-seq analyses. It will cover experimental design, quality control, read mapping, differential expression analyses, as well as pathway and enrichment analyses. The single-cell RNA-seq part will also cover methods for unsupervised clustering and detection of subpopulations of cells. 


Unit 2: Epigenomic Analyses

This session of the course will cover ChIP-seq analyses, which is commonly used for epigenomic analysis. It will cover experimental design, quality control, read mapping, identification of peaks, differential binding assays, and motif identification. Lectures will additionally cover theoretical aspects of peak identification as well as survey a number of commonly employed programs.


Unit 3: Genome-wide Association Study

This session will focus on the single-nucleotide polymorphism (SNP) based genome wide association analysis. Topics include sample and SNP quality control, association tests, logistic regression for case-control studies, linear regression for continuous traits, gene-gene and gene-environment interactions. Lectures will also cover how to visualize the data and analysis results using popular packages. 


Unit 4: Network Analysis

In this session, we will introduce the basic concepts and general ideas in constructing gene regulatory networks (GRNs), and focus on genomics studies for integrative analysis of transcriptomic and genomic data. With a case study in cancer research, we will go through cis-eQTL analysis and a state-of-the-art parallel algorithm 2SPLS to construct genome-wide GRNs. In addition, lectures will cover exploring the results by popular bioinformatics tools including STRING and Ingenuity Pathway Analysis.