CDS Core Courses
To fulfill the 2 Core Courses, take two Core Courses from two different Core Areas.
To fulfill the 2 Core Courses, take two Core Courses from two different Core Areas.
Computational Data Science (CDS) Core Courses are classified into five areas: Data Analytics and Visualization, Data Science Applications, Intelligent Computing, Probability and Statistics, and Programming and Computing.
The two Core Courses MUST BE selected from two different Core Areas. If two Core Courses are taken from the same area, only one of them will be counted.
CGT 57500 – Data Visualization Tools and Applications
This course provides hands-on experience in data visualization tools and applications. The course is designed for students with little or no background in Data Visualization. It introduces students to design principles for creating meaningful displays of quantitative and qualitative data to facilitate insight and decision-making. The goal is to introduce visualization as a tool, explore and identify which visualization tools are better suited to visualize different types of data, and understand the role visualization plays in understanding what the data represent. This course gives students an in-depth view of the various branches of visualization and the visualization tools in each area. After taking the course students will be able to evaluate data visualization tools and determine which tool to use for different types of data. This course is targeted towards students interested in using visualization in their own work and future academic courses.
CGT 58100 – Medical Image Processing and Visualization
Advanced study of technical and professional topics. Emphasis is on new developments relating to technical, operational, and training aspects of industry and technology education.
CGT 67000 – Applications in Visual Analytics
Visual Analytics (VA) provides a fast way for people to make sense of large number of data, and has applications in many sectors. This course will introduce Visual Analytics through foundational theories and a broad range of techniques and tools, focusing on using visualization methods to reason and solve complex problems in a wide variety of applications. Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces that synthesize human and computational ability to attack large complex problems. It is concerned with analytical reasoning, interaction, data transformations, data visualization, analytic reporting, and technology transition. While the different visual analytics applications share common theories and strategies, each of them has its unique data composition, visual representations, and analytical needs and strategies. Through survey and study a broad range of visual analytics applications, students will be able to apply visual analytics on their own applications, analyze and break down a complex analytical problem into proper components and steps, evaluate different visual analytic techniques and strategies, and finally design and develop an effective visual analytics solution toward the problem.
CNIT 58100 – Data Analysis
Advanced study of technical and professional topics. Emphasis is on new developments relating to technical, operational, and training aspects of industry and technology education.
CS 53000 – Introduction to Scientific Visualization
Teaches the fundamentals of scientific visualization and prepares students to apply these techniques in fields such as astronomy, biology, chemistry, engineering, and physics. Emphasis is on the representation of scalar, vector, and tensor fields; data sampling and resampling; and reconstruction using multivariate finite elements (surfaces, volumes, and surfaces on surfaces).
CS 57300 – Data Mining
Data Mining has emerged at the confluence of artificial intelligence, statistics, and databases as a technique for automatically discovering summary knowledge in large datasets. This course introduces students to the process and main techniques in data mining, including classification, clustering, and pattern mining approaches. Data mining systems and applications are also covered, along with selected topics in current research.
CS 59300 – Topological Data Analysis
A variable title course for topics not currently covered in the CS graduate curriculum. Each offering follows a traditional course structure with textbook(s), assignments, exams, and week-by-week content synopsis described in a syllabus.
ECE 66100 – Computer Vision
This course deals with how an autonomous or a semi-autonomous system can be endowed with visual perception. The issues discussed include: sampling from a topological standpoint; grouping processes; data structures, especially hierarchical types such as pyramids, quadtrees, octrees, etc.; graphic theoretic methods for structural description and consistent labeling; issues in 3-D vision such as object representation by Gaussian spheres, generalized cylinders, etc.
ECET 54900 – Advanced Applied Computer Vision for Sensing and Automation
This course focuses on advanced issues related to an integrated computer vision system for sensing, quality control and automation applications.
STAT 65600 – Bayesian Data Analysis
Bayesian data analysis refers to practical inferential methods that use probability models for both observable and unobservable quantities. The flexibility and generality of these methods allow them to address complex real-life problems that are not amenable to other techniques. This course will provide a pragmatic introduction to Bayesian data analysis and its powerful applications. Topics include: the fundamentals of Bayesian inference for single and multiparameter models, regression, hierarchical models, model checking, approximation of a posterior distribution by iterative and non-iterative sampling methods, and Bayesian nonparametrics. Specific topics and the course outline are subject to change as the semester progresses. All topics will be motivated by problems from the physical, life, social, and management sciences. Conceptual understanding and inference via computer simulation will be emphasized throughout the course.
AAE 59000 – Data Science Applications in Mechanics of Materials
Topics vary – projects in Aeronautical Engineering. Permission of instructor required.
ABE 53100 – Instrumentation and Data Acquisition
This course educates students in the use, selection, and design of instrumentation and data acquisition for agricultural, food, environmental, and biological systems. Emphasis is on measurement of position (GPS), force, pressure, power, torque, flow, and temperature along with environmental sensors. Labs focus on building and using measurement systems and programming PC computers for data acquisition and analysis.
AT 60700 – Aviation Applications of Bayesian Inference
This course provides a critical foundation necessary for understanding Bayesian theory and employing that theory in the analysis of typical data generated in industry. The course will focus specifically on solutions of aviation-related problems through data analysis. Material includes overview and implementation of relevant statistical software applications. Practical skills in presenting advanced analyses to both professional and scientific audiences is a key component of the course.
BCHM 61200 – Bioinformatic Analysis of Genome Scale Data
This course provides a hands-on experience for life science researchers in the bioinformatic analysis of genome-scale data. The various disciplines in the life sciences are generating a wealth of experimental and annotation data. Today’s graduate students need experience with modern tools that can help them to access, explore, analyze, interpret and manage the data that they generate in the lab. Students will use the R programming language and packages from Bioconductor, the R bioinformatics project, as their principal tools for this course. Students will develop workflows in R that bridge established algorithms for bioinformatics such as limma, edgeR or DESeq2, incorporating methods to import, QC, transform and visualize genome-scale datasets derived from next generation sequencing experiments. A critical aspect of bioinformatics that is often inadequate is workflow documentation. This course will use Rmarkdown to integrate computer code, data and results to manage complex bioinformatics projects. The class has lecture, lab and distance components. Lectures will focus on the theoretical and biological aspects of bioinformatics analysis using recent examples from the literature. In lab, students will work on programming exercises or projects using published datasets. Advanced students will also have the opportunity to work with their own data. Distance instruction will include R tutorials and videos that students can work through at their own pace (subject to completion deadlines). Particular emphasis will be placed on the theoretical and practical limitations of next generation sequencing data. No prior computer programming experience is required, but it is assumed that students have a firm grasp of the fundamental principles of molecular biology and how they relate to complex processes such as gene expression and genome organization.
CE 50701 – Geospatial Data Analytics
The course will introduce fundamental theories, analytical methods and programming skills that are needed to work with geospatial data. Students will learn the theories, methods, and techniques to visualize, analyze and model various geospatial data through hands-on computer programming practice based on various open source geospatial libraries. To be specific, the course will use R and its related packages as the basic tool for implementation. The goal is to enable the learners to develop their own geospatial analytical applications.
CE 59700 – Image-based Sensing
Hours and credits to be arranged. Permission of instructor required.
EAPS 50700 – Introduction to Analysis and Computing with Geoscience Data
Course teaches computing techniques including error analysis, line and surface fitting, interpolation, map projections, geospatial and temporal correlations, signal processing, and visualization with discussions on specific and practical geoscience applications. Lectures with computer exercises and team project reporting using open-source computer software
EAPS 51500 – Geodata Science
Course covers a range of topics with applications of mathematical, statistical, numerical, and distributed parallel computing methods for modeling and understanding complex and large spatio-temporal geoscience datasets in the formats common to in-situ observations, asynoptic remote sensing data, volumetric gridded analysis, etc.
FNR 57400 – Big Data, AI, and Forests
This course is focused on introductory big data analysis, artificial intelligence, and associated applications in large-scale forest research. The lecture will cover the challenges we encounter in big data ecological research, and the approaches to overcome these challenges. Real-time forest inventory and wildlife survey data at national and continental levels will be utilized in this course, and actual high-impact research projects will be introduced as case studies to inform students of the state-of-the-art in this subject area. High-performance computing clusters will be utilized for big data analysis. This course is also open to non-forestry majors. We will introduce basic machine learning techniques that are applicable to other subject areas. Guest lectures may cover big data analyses in different fields, internet-of-things, and/or data management and optimization/decimation for collaborative Virtual Reality experiences. The class will be evaluated through a final project, for which students will work independently or in a group setting to develop a ‘mini’ research manuscript with a title of their own selection. All the groups are encouraged to submit their manuscript for publication at peer-reviewed journals, and those whose manuscripts have passed the initial journal screening will get extra bonus points.
HSCI 52500 – Statistics and Computational Approaches for Health Sciences
Statistical methods are important for data analysis and understanding the trends in the dataset. This course will provide an introduction to the analysis of biological data in a statistical framework using standard computational methods. The course will have a lecture and hands-on component to introduce students to topics and then utilize them to solve typical problems in Health Sciences. In addition to learning the statistical concepts, the students will also be introduced to computational approaches for data analysis. The combination of statistical and computational concepts along with the hands-on experience will help students in their research projects. The topics covered include data representation, sample statistics, probability, common discrete and continuous distribution, confidence interval estimation, experimental design, analysis of variance, statistical methods for hypothesis testing, linear and logistic regression, correlation, power analysis, graph theory, network analysis, omics-based analysis, and data visualization. The course is ideal for Health Sciences students who perform data analysis and are interested in implementing these approaches in their research.
TDM 51100 – Corporate Partners
Students in The Data Mine Corporate Partners Learning Community will work in groups with Corporate Partner mentors on a variety of projects. They will analyze real data related to questions that the Corporate Partner proposes. Most projects will last for a full academic year (late August through late April), with multiple reports and presentations given more frequently. The mentor is expected to meet with the students weekly by Microsoft Teams, or (more rarely) in person. Students are expected to actively participate in these meetings and in all individual and group work. The goal of the course is to help students build impactful industry-related skills in data science, visualization, and data engineering. The Data Mine staff also has data scientists who can assist students with technical questions focused on the skills being built and the research conducted. Students can work on real-world industry facing issues that have a high value add for the corporate partner.
BME 64600 – Deep Learning
This course teaches the theory and practice of deep neural networks from basic principles through state-of-the-art methods. The class blends hands-on programming, using a variety of state-of-the-art programming frameworks, with theoretical treatment based on current literature. Implementation will emphasize the use of the Pytorch language and the use of dynamic computational graphs. Some previous experience with optimization techniques is important for success in the course.
CS 57100 – Artificial Intelligence
Artificial Intelligence (AI) systems are increasingly being deployed in many real-world tasks. This course provides an introduction to the fundamental principles and applications of AI. The course covers classic material including search-based methods, probabilistic reasoning, game playing, decision making, exact and approximate inference, causal learning, and reinforcement learning as well as selected advanced topics. The focus of the course is on foundational methods and current techniques for building AI systems that exhibit ‘intelligent’ behavior and can ‘learn’ from experience. The course assumes students are familiar with basic concepts in analysis, linear algebra, optimization, discrete mathematics, elementary probability, statistics, data structures, and algorithms. Students are expected to have good programming and software development skills and have a working knowledge of Python and Java.
CS 57700 – Natural Language Processing
This course will cover the key concepts and methods used in modern Natural Language Processing (NLP). Throughout the course several core NLP tasks, such as sentiment analysis, information extraction, syntactic and semantic analysis, will be discussed. The course will emphasize machine-learning and data-driven algorithms and techniques, and will compare several different approaches to these problems in terms of their performance, supervision effort and computational complexity.
CS 57800 – Statistical Machine Learning
This introductory course will cover many concepts, models, and algorithms in machine learning. Topics include classical supervised learning (e.g., regression and classification), unsupervised learning (e.g., principle component analysis and K-means), and recent development in the machine learning field such as variational Bayes, expectation propagation, and Gaussian processes. While this course will give students the basic ideas and intuition behind modern machine learning methods, the underlying theme in the course is probabilistic inference.
CS 58700 – Foundations of Deep Learning
This course provides an integrated view of the key concepts of deep learning (representation learning) methods. This course focuses on teaching principles and methods needed to design and deploy novel deep learning models, emphasizing the relationship between traditional statistical models, causality, invariant theory, and the algorithmic challenges of designing and deploying deep learning models in real-world applications. This course has both a theoretical and coding component. The course assumes familiarity with coding in the language used for state-of-the-art deep learning libraries, linear algebra, probability theory, and statistical machine learning.
CS 59300 – Computer Vision with Deep Learning
A variable title course for topics not currently covered in the CS graduate curriculum. Each offering follows a traditional course structure with textbook(s), assignments, exams, and week-by-week content synopsis described in a syllabus.
CS 59300 – Machine Learning Theory
A variable title course for topics not currently covered in the CS graduate curriculum. Each offering follows a traditional course structure with textbook(s), assignments, exams, and week-by-week content synopsis described in a syllabus.
ECE 57000 – Artificial Intelligence
Introduction to the basic concepts and various approaches of artificial intelligence. The first part of the course deals with heuristic search and shows how problems involving search can be solved more efficiently by the use of heuristics and how, in some cases, it is possible to discover heuristics automatically. The next part of the course presents ways to represent knowledge about the world and how to reason logically with that knowledge. The third part of the course introduces the student to advanced topics of AI drawn from machine learning, natural language understanding, computer vision, and reasoning under uncertainty. The emphasis of this part is to illustrate that representation and search are fundamental issues in all aspects of artificial intelligence.
ME 53900 – Introduction to Scientific Machine Learning
Introduction to the fundamentals of predictive modeling for advanced undergraduates and graduate science and engineering students that work in the intersection of data and theory.
CNIT 60100 – Applied Statistics in Information Technology
This course will survey the field of applied statistics in information technology. Students will gain hands-on experience running statistical analyses on samples from a variety of populations. Students will learn the process of data cleaning, data transforming/coding, identifying the appropriate statistical analyses (descriptive and inferential), as well as writing and interpreting the results. Specifically, this course will survey the following statistical approaches: correlations, t-test, analysis of variance, analysis of co-variance, factorial ANOVA, simple regression, multiple regression, logistic regression, chi-square analysis, factor analysis, and nonparametric tests. In this course, students will be able to differentiate statistical analyses and identify appropriate statistical analyses depending on the research question and variables of interest.
MA 51900 – Introduction to Probability
Algebra of sets, sample spaces, combinatorial problems, independence, random variables, distribution functions, moment generating functions, special continuous and discrete distributions, distribution of a function of a random variable, limit theorems.
MA 53200 – Elements of Stochastic Process
(STAT 53200) A basic course in stochastic models, including discrete and continuous time Markov chains and Brownian motion, as well as an introduction to topics such as Gaussian processes, queues, epidemic models, branching processes, renewal processes, replacement, and reliability problems.
STAT 51200 – Applied Regression Analysis
Inference in simple and multiple linear regression, residual analysis, transformations, polynomial regression, model building with real data, nonlinear regression. One-way and two-way analysis of variance, multiple comparisons, fixed and random factors, analysis of covariance. Use of existing statistical computer programs.
STAT 52500 – Intermediate Statistical Methods
Statistical methods for analyzing data based on general/generalized linear models, including linear regression, analysis of variance (ANOVA), analysis of covariance (ANCOVA), random and mixed effects models, and logistic/loglinear regression models. Application of these methods to real-world problems using SAS statistical software.
STAT 52600 – Advanced Statistical Methods
As a sequel to STAT 52500, this course introduces statistical modeling tools for situations where standard least-squares techniques may not apply. This includes an extensive coverage of generalized linear models (GLM) for non-Gaussian responses, mixed effects models to describe correlated data, nonparametric regression, and lastly, parametric and nonparametric survival models for the analysis of (possibly censored) time-to-event data. Among issues to be discussed are the estimation of the models, the testing of hypotheses, and the checking of model adequacy. Data examples will be used throughout the course to illustrate the methodologies and the related software tools in R.
STAT 52900 – Applied Decision Theory and Bayesian Statistics
Bayesian and decision-theoretic formulation of problems; construction of utility functions and quantifications of prior information; methods of Bayesian decision and inference, with applications; empirical Bayes; combination of evidence; Bayesian design and sequential analysis; comparisons of statistical paradigms.
STAT 54500 – Introduction to Computational Statistics
This introductory course covers the fundamentals of computing for statistics and data analysis. It starts with a brief overview of programming using a general purpose compiled language (C) and a statistics-oriented interpreted language (R). The course proceeds to cover data structures and algorithms that are directly relevant to statistics and data analysis and concludes with a computing-oriented introduction to selected statistical methods. A significant part of the course involves programming and hands-on experimentation demonstrating the covered techniques, ration, and Markov chain Monte Carlo methods.
STAT 54600 – Computational Statistics
The course focuses on two fundamental aspects in computational statistics: (1) what to compute and (2) how to compute. The first is covered with a brief review of advanced topics in statistical inference, including Fisher’s fiducial inference, Bayesian and frequentist methods, and the Dempster-Shafer (DS) Theory. The second is discussed in detail by examining exact, approximation, and interactive simulation methods for statistical inference with a variety of commonly used statistical models. The emphasis is on the EM-type and quasi-Newton algorithms, numerical differentiation and integration, and Markov chain Monte Carlo methods.
BIOL 59500 – Practical Biocomputing
Special work, such as directed reading, independent study or research, supervised library, laboratory, or field work, or presentation of material not available in the formal courses of the department. The field in which work is offered will be indicated in the student’s record.
CS 50100 – Computing for Science and Engineering
Computational concepts, tools, and skills for computational science and engineering scripting for numerical computing, scripting for file processing, high performance computing, and software development.
CS 51400 – Numerical Analysis
(MA 51400) AIternative methods for solving nonlinear equations; linear difference equations, applications to solution of polynomial equations; differentiation and integration formulas; numerical solution of ordinary differential equations; roundoff error bounds.
CS 51500 – Numerical Linear Algebra
Direct and iterative solvers of dense and sparse linear systems of equations, numerical schemes for handling symmetric algebraic eigenvalue problems, and the singular-value decomposition and its applications in linear least squares problems.
CS 52500 – Parallel Computing
Parallel computing for science and engineering applications: parallel programming and performance evaluation, parallel libraries and problem-solving environments, models of parallel computing and run-time support systems, and selected applications.
IE 54100 – Nature-Inspired Computing
This course is about algorithms that are inspired by naturally occurring phenomena and applying them to optimization, design and learning problems. The focus is on the process of abstracting algorithms from the observed phenomenon, their outcome analysis and comparison as well as their “science”. This will be done primarily through the lens of evolutionary computation, swarm intelligence (ant colony and particle-based methods) and neural networks.
MA 57400 – Numerical Optimization
Convex optimization algorithms using modern large-scale algorithms for convex optimization, with a heavy emphasis on analysis including monotone operator, fixed point iteration and duality in splitting methods. The course will cover and focus on the following three parts: smooth optimization algorithms, nonsmooth convex optimization algorithms, and stochastic and randomized algorithms.
STAT 50600 – Statistical Programming and Data Management
Use of the SAS software system for managing statistical data. How to write programs to access, explore, prepare, and analyze data. Using the DATA step and procedures to access, transform, and summarize data. Introduction to the SAS macro language. Prepares students for the base SAS certification exam.
STAT 52700 – Introduction to Computing for Statistics
This course provides a thorough introduction to the R programming language, and its use for statistical computing and data science. The course will first look at the fundamentals of R, including different data-structures, control-flow, and the basic vocabulary. An emphasis will be placed on learning idiomatic and efficient R, covering ideas such as recycling, vectorization and functional programming. The course will then look at principles and tools for tasks like organizing data (‘tidy data’), manipulating data (‘data carpentry’), querying data (through topics like regular expressions) as well as visualizing data (including interactive visualizations). The material and the homework will encourage development of modular reusable code and reproducible research through ideas such as object-oriented programming and dynamic documents in R Markdown. The last part of the course will study statistical procedures such as least-squares regression, LASSO, Monte Carlo sampling and Markov chain Monte Carlo.