About the MCAP Summer Institute (MCAP-SI)
Our week-long Summer Institute on Longitudinal Data Analysis is designed to meet the needs of 50 participants each year, welcoming individuals from all career stages and backgrounds (ie. graduate students, post-docs, faculty, and industry researchers) — all of whom are eager to enhance their knowledge in longitudinal data analysis. Our Summer Institute will be held at Purdue University in West Lafayette, IN and we will provide meals and lodging in Purdue dorms for those selected to receive travel funding. The Summer Institute is ideal for individuals with a foundational understanding of statistics who seek to learn and apply longitudinal methods in their work. We specifically encourage applicants who are not already experts in longitudinal data analysis but who see the potential for these skills to enhance their research or professional contributions.
Contemporary large-scale NIH initiatives have led to the emergence of many high-quality publicly available longitudinal datasets that that include complex data of various types, sources, and domains (e.g., biological, social, individual, family, neighborhood, etc.). However, use of these datasets without training can lead to scientific setbacks, including work that is imperfect, misleading, or even incorrect. There is an urgent need for educational programming to train researchers both within and outside of academic careers on the innovative and responsible use of publicly available, large, and complex longitudinal datasets. This R25 grant develops and offers an “Interdisciplinary Summer Institute on the Analysis of Complex, Large-Scale Longitudinal Data”, refining it each year based on evaluation data (aim 1). We will also leverage this program to train graduate students to teach advanced longitudinal methods to participants from multiple disciplines (aim 2). Thus, we will serve two groups: program participants (aim 1), and Purdue graduate student teaching assistants (TAs, aim 2). During an immersive week-long summer institute each year, we will train 50 interdisciplinary participants including students, postdocs and faculty across academic institutions (Y1-Y3), expanding to also include professionals in non-profits, governmental agencies, and industries (Y2, Y3). The course is organized in 10 topics: publicly available longitudinal data sources, introduction to longitudinal data analytic methods, data visualization, missing data, longitudinal categorical data analysis, sampling weights and clustering/ stratification, time varying and time-invariant covariate inclusion, combining multiple data sources, embedded family-based designs, and an intro to sociogenomics—emphasizing cross-cutting themes of data management, visualization and communication, causal inference, measurement and modeling decisions, meaningful effect sizes, and representativeness. Lecture examples and assignments will focus on substance use and associated factors and will use the Adolescent Brain and Cognitive Development study data, although participants will be encouraged to use whatever dataset is most relevant to their own research interests. The summer institute will also feature TAs and additional faculty instructors circulating the room in each session to support students in need of extra assistance in real-time, as well as review and office hour sessions, experience in interdisciplinary environments, networking, and joint practice opportunities to help establish collaborations. We will also train 6 graduate student TAs each year, who will gain supervised experience in content development, instruction (via review sessions), consulting, course evaluation, and leadership within interdisciplinary environments. We have carefully designed recruitment strategies to train a diverse (e.g., under-represented groups, discipline, and career stage and path) workforce, and a multi-pronged evaluation plan. Our program faculty includes 8 faculty experts in longitudinal data analysis and instruction, representing different fields, genders, and career stages.
For Participants
- When is the course?
- July 13th-July 18th, 2025
- Where?
- Purdue University
- How much?
- $2,000 for the Registration Fee. Registration is now closed. Applications will reopen Spring 2026.
- In-person support
- With generous support from the National Institute on Drug Abuse (R25 DA061822), we are able to provide 30 participants with full support for living expenses, registration fees, and travel costs. We aim for these funds to cover all costs of attending the summer institute.
- With generous support from the National Institute on Drug Abuse (R25 DA061822), we are also able to provide support for registration fees for an additional 20 participants. These are typically Purdue affiliated individuals and/or those local who do not require lodging or travel expenses.
- Learning Outcome Goals
- 1) Expand interest and comfort applying longitudinal data to health and social science questions
- 2) Increase understanding of mastery of longitudinal data and models, including developing skills to make justified measurement and modeling decisions
- 3) Provide tools for data visualization for broad application of longitudinal analysis and dissemination of findings to interdisciplinary audiences
Schedule: Summer Institute 2025
| Day | Topic | Instructor | Session Goals / Learning Objectives |
| Sunday | Arrival | ||
| 12:00-4:00 2:00PM-2:15 break WALC 1132 | 0: Intro to R for longitudinal data management (optional) | Dr. Katie Thompson | Introduction to the tools for efficiently managing data, unique data structures for repeated measures, tutorial for using various identifiers, transposing between long and wide formats |
| 5:00-7:00 Purdue Memorial Union (PMU), West Faculty Lounge | Welcome Reception | ||
| Monday | Introduction | ||
| 8:00-8:30 WALC 1132 | Optional: Optimize your course experience | TA: Yi Zhu | How to find an appropriate dataset, help downloading data for this course |
| 8:30-9:00 WALC 1132 | Welcome | Dr. Kristine Marceau | Overview of course structure, events, and opportunities |
| 9:00-11:45 10:15-10:30 break WALC 1132 | 1: Longitudinal Data: Data Sources, Resources, and Structure | Dr. Sharon Christ | Overview of key publicly available data sources from current NIH initiatives, how to understand critical design components of an existing longitudinal study, overview of specific considerations to search for when learning a new dataset |
| 11:45-1:15 | Lunch Break | ||
| 1:15-4:00 2:30-2:45 break WALC 1132 | 2: Introduction to longitudinal data analytic methods | Dr. Rob Duncan | Broad goals and types of research questions and hypotheses applicable to large, complex longitudinal data, temporal ordering and causal inference, understand the classes of common longitudinal data analysis techniques |
| 4:00-5:00 WALC 1132 | Office hours/Open Consulting | Faculty instructors and TA | Practice: identify considerations for your data, research question, and analytic plan |
| Tuesday | Understanding the data | ||
| 8:30-9:00 WALC 1132 | Optional review | TA: Mallory Bell | Review key considerations for longitudinal data, mapping theory to analysis |
| 9:00-11:45 10:15-10:30 break WALC 1132 | 3: Visualization Part 1: getting to know your data | Dr. Trent Mize | Overview of best practices, visualizing raw data (e.g., distributions, missing data, etc.), resources for model presentation techniques; specific applications to longitudinal analyses and developmental processes |
| 11:45-1:15 | Lunch Break | ||
| 1:15-4:00 2:30-2:45 break WALC 1132 | 4: Missing data, including specific considerations for longitudinal data | Dr. James McCann | Sources of missing data in longitudinal studies (e.g., attrition, measure-specific), how to assess missing data patterns· Introductory overview of missing data techniques available for longitudinal data |
| 4:00-5:00 WALC 1132 | Office hours/Open Consulting | Faculty instructors and TA | Practice: visualize and communicate missing data patterns and plan for your data |
| Wednesday | Measurement and Modeling Decisions | ||
| 8:30-9:00 WALC 1132 | Optional review | TA: Bing Han | Review best practices for data visualization; sources of missing data |
| 9:00-11:45 10:15-11 break WALC 1132 | 5: Longitudinal categorical data analysis (binary, nominal, ordinal, and count dependent variables) | Dr. Trent Mize | Problems with using linear modeling techniques for categorical outcomes, methodological solutions and current best practices (e.g., marginal effects and sample-averaged predictions), visualizing results from categorical longitudinal models |
| 11:45-1:15 | Lunch Break | ||
| 1:15-4:00 2:30-2:45 break WALC 1132 | 6: Applying survey/sampling weights and clustering/stratification | Dr. Donna Xu | How to identify and apply sampling weights available in a dataset, how to identify and apply corrections for clustering/stratification, visualization of data structure? |
| 4:00-5:00‘ WALC 1132 | Office hours/Open Consulting | Faculty instructors and TA | Practice: analyze categorical outcomes, apply sampling weights |
| Thursday | Socioenvironmental and Demographic Factors | ||
| 8:30-9:00 WALC 1132 | Optional review | TA: Susmita Ghosh | Review issues and solutions regarding categorical outcomes and nested data |
| 9:00-11:45 10:30-10:45 break WALC 1132 | 7: Model specification / causal inference | Dr. Shawn Bauldry | Time varying vs. time-invariant covariates, best practices for when to include vs. exclude covariates for causal inference |
| 11:45-1:15 | Lunch Break | ||
| 1:15-4:00 2:30-2:45 break WALC 1132 | 8: Mapping other data sources to study multi-level determinants of health | Dr. Kristine Marceau | Overview of types of data that can be mapped, strategies for how to convert outside data to a usable dataset and merge it with a dataset of interest. |
| 4:00-5:00 WALC 1132 | Office hours/Open Consulting | Faculty instructors and TA | Practice: explore time-varying covariates and/or identify data to map to your data |
| Friday | Behavioral Genetic and Sociogenomics Resources | ||
| 8:30-9:00 WALC 1132 | Optional review | TA: Catalina Vega Mendez | Review best practices for covariate inclusion |
| 9:00-11:45 10:30-10:45 break WALC 1132 | 9: Embedded family-based designs (what to do with family members and how to leverage those designs) | Dr. Kristine Marceau | Understand that many large-scale longitudinal studies include embedded family-based designs (e.g., twins/siblings; data collected on parents and children), gain the tools to avoid the issue (e.g., by selecting an independent subsample), gain an introductory understanding of options for leveraging these subsets to inform research questions along with resources for more in-depth instruction. |
| 11:45-1:15 | Lunch Break | ||
| 1:15-4:00 2:30-2:45 break WALC 1132 | 10: Sociogenomics Resources | Dr. Robbee Wedow | Understand that many large-scale longitudinal studies now include genomic data, gain an introductory understanding of the responsible use of genomic data, gain a primer on current best-practices for generating and using polygenic scores, understand what not to do, and what constitutes irresponsible and unethical use of genomic data, gain resources for more in-depth instruction on genomic methods (e.g., other training opportunities; how to find collaborators). |
| 4:00-5:00 WALC 1132 | Office hours/Open Consulting | Faculty instructors and TA | Practice: identify relevant use of behavioral genetic data for your research and/or open practice |
| 5:00-7:00 Purdue Memorial Union (PMU), West Faculty Lounge | Closing Reception | ||
- CITI Training (Human Research Protection Program)
- If you have not taken it before or yours is expired: complete the Biomedical Research for Investigators or Social Behavioral Research group, and then the Human Subjects Research – Initial (Basic) course. If you completed training within the past 4 years, you may take a refresher. If your certificate is current, no action is required.
- Office Hours: Each day from 4:00–5:00pm, we will hold in-person office hours with two dedicated tables:
- questions on assignments / course content
- consulting on individual projects
- These sessions will be staffed by our faculty instructors and teaching assistants. Feel free to drop by whichever table aligns with your needs!
- Slack:
- We will be actively monitoring Slack during the course and in the evenings to answer any questions. We encourage participants to post R-related and assignment questions on Slack so as not to interrupt class sessions. You can join our Slack workspace using the link provided in your registration email.
- GENERAL EVENT INFORMATION
Campus Parking | Purdue Parking Map
Northwestern Parking Garage
This garage has limited Parkmobile spots or attendees can purchase their own A-Permit during their stay.
Grant Street Garage
This garage is ticketed parking – attendees pay for parking when exiting the garage. *NOTE – For those staying in the Residence Hall: Parking is located on the top floor of the parking garage in First Street Towers. These spots are free of charge.
Resident Hall Dining | Earhart Dining Court
1275 1st Street, West Lafayette, IN 47906
*NOTE – If you are not staying at the dorms during your visit, you may eat at the dining hall. You will pay at the door for your meal every time you go to Earhart Dining Court.
Dining Hours
Breakfast: 7:00 am – 8:30 am
Lunch: 11:00 am – 1:30 pm
Dinner: 5:00 pm – 7:00 pm
Find Information Regarding Dietary Restrictions HERE.
Classroom Space | Wilmeth Active Learning Center (WALC), Room 1132
340 Centennial Mall Dr., West Lafayette, IN 47907
Reception Location | Purdue Memorial Union, West Faculty Lounge
201 Grant St., West Lafayette, IN 47906
WIFI | AT&T WIFI & Eduroam
AT&T WIFI – This is a free connection that does not require any credentials to sign into and can be used by anyone on campus
Eduroam – A secure, world-wide roaming access service developed for education and research communities. The credentials to sign into Eduroam are generally your home Universities account.
Registration | Locations & Times
Sunday, July 13 (11:30-12:30PM) | WALC 1132
Sunday, July 13 (4:30-5:30PM) | Purdue Memorial Union, West Faculty Lounge
Monday, Jul 14 (7:45-10AM) | WALC 1132
| Lodging | First Street Towers |
| 1250 1st Street, West Lafayette, IN 47906 |
| PARKING | McCutcheon Drive Parking Garage McCutcheon Drive Parking Garage |
| *NOTE – Residence Hall Parking is located on the top floor of the parking garage. Please park in one of these parking spaces. These spots are free of charge but limited. Attendees are able to park in any residence hall parking spots. |
| View First Street Towers Orientation HERE before your check-in date. |
| DORM CHECK-IN | 11AM-5PM EST |
| DORM CHECK-OUT | 12PM EST |
IMPORTANT INFORMATION The front desk will be staffed for a majority of the week, however it is not a guarantee that someone will be at the desk 24/7. If you are needing to check-in after hours and someone is not at the front desk, please direct yourself to Meredith South – staff will be on site to assist. You may also call (765)496-5150. |
Meredith South Residence Hall 1225 1st Street, West Lafayette, IN 47906 Main entrance doors to First Street Towers lock at 11PM EST and unlock at 6AM EST every day. To enter the dorm after hours, please use your key card. |
| Be sure to pick up your folder at the front desk when checking-in! |
| Explore the Greater Lafayette Area! |
| Events Calendar – Purdue University Events Calendar Local Eateries and Activities – Home of Purdue Purdue Memorial Union Dining | there are several options for dining in the PMU. See the campus options HERE. |
Featured Faculty ANd Teaching Assistants
- Dr. Kristine Marceau (MCAP Co-Director): Dr. Marceau is an Associate Professor of Human Development and Family Science who specializes in longitudinal methods emphasizing both developmental change and variability across multiple time-scales using and integrating SEM and multilevel modeling techniques. She frequently uses family-based designs and large datasets to explore developmental and behavioral trajectories. Dr. Marceau regularly teaches multilevel modeling and inferential statistics, and trains students in longitudinal data analysis.
- Dr. Trenton D. Mize (MCAP Co-Director): Dr. Mize is the Dean’s Associate Professor of Sociology and Statistics (by courtesy) and a quantitative methodologist with expertise in categorical data analysis, latent variable modeling, and data visualization. His research develops and applies innovative methods for analyzing complex social data, and he regularly teaches graduate courses on categorical data and experimental design.
- Dr. James A. McCann (MCAP Co-Director): Dr. McCann is a Professor of Political Science with expertise in longitudinal survey analysis and latent variable modeling. He has led multiple large-N longitudinal studies on political behavior and representation and regularly applies advanced econometric and multilevel techniques in his research. Dr. McCann teaches graduate seminars on research design and quantitative analysis, focusing on panel data and survey methodologies.
- Dr. Sharon Christ (MCAP Co-Director): Dr. Christ is an Associate Professor of Human Development and Family Science specializing in emergent statistical models, particularly structural equation modeling (SEM) and complex sample designs. Her expertise in multilevel modeling, SEM, and growth models has been applied across numerous large-scale cohort studies. She has taught graduate-level courses on sample design, inferential statistics, and SEM.
- Dr. Robert Duncan: Dr. Duncan is an Associate Professor of Human Development and Family Science at Colorado State University with expertise in advanced longitudinal data analysis, including multilevel modeling, structural equation modeling (SEM), and growth curve modeling. His work focuses on children’s development within multilevel contexts like classrooms.
- Dr. Dongjuan Xu: Dr. Xu, an Associate Professor in the School of Nursing, specializes in longitudinal cohort studies that evaluate the quality of care and outcomes for older adults. Her expertise spans applied biostatistics, epidemiological methods, and outcome evaluation. She regularly teaches graduate courses in these areas, incorporating advanced quantitative techniques into her instruction, such as weighting methods and sampling designs.
- Dr. Shawn Bauldry: Dr. Bauldry, a Professor of Sociology at Purdue, specializes in quantitative methods and statistics, primarily focusing on the development of structural equation models, a broad class of statistical models with wide applicability in the social sciences.
- Dr. Robbee Wedow: Dr. Wedow is an Assistant Professor of Sociology and Data Science at Purdue University, with expertise in statistical genetics and sociogenomics. His research applies advanced statistical methods, including gene-environment interaction models, to large-scale genetic datasets to investigate social and health outcomes.
- Dr. Katie Thompson: Dr. Thompson is a postdoctoral researcher in the Department of Sociology. Her work intersects psychiatry, genomics, and sociology, using innovative statistical approaches to integrate large-scale longitudinal data to better understand mental health. She has specialized in complex longitudinal designs using structural equation models, multilevel and matrix-based mixed models, and genetic and family data. Dr Thompson has taught on MSc statistics courses and led multiple R intensive workshops focused on family data at King’s College London. She has led on projects using multiple longitudinal cohort studies across the USA and UK and has focused on creating open and reproducible code and analytical pipelines.
- Mallory Bell (Sociology): Mallory Bell is a dual-title PhD candidate in Sociology and Gerontology at Purdue University. Her research uses longitudinal data analysis to examine how social determinants of health help shape trajectories of well-being in later life.
- Bing Han (Sociology): Bing Han is a dual-title Ph.D. candidate in Sociology and Gerontology at Purdue University, where she has also earned graduate certificates in Applied Statistics and Advanced Methodology. Her research focuses on health behaviors and lifestyles, stigma, and aging, employing a wide range of methodological approaches, including machine learning, categorical data analysis, longitudinal modeling, latent variable analysis, and experimental design.
- Susmita Ghosh (Public Health): Susmita Ghosh is a PhD candidate in the Department of Nutrition Science at Purdue University, specializing in nutritional epidemiology with a focus on maternal and infant nutrition and food environments. Her research integrates advanced statistical techniques—including multilevel modeling, longitudinal analysis, and causal inference methods—to evaluate randomized controlled trials and social and behavior change interventions aimed at improving health and nutritional outcomes in low-resource settings.
- Yi Zhu (Education): Yi Zhu is a fifth-year PhD candidate in Mathematics Education at Purdue University. Her research focuses on early mathematics learning, spatial reasoning, and game-based learning environments, employing both quantitative and qualitative (mixed-methods) approaches to understand how children develop mathematical thinking.
- Amy Loviska (HDFS): Amy Loviska is a PhD candidate in the Human Development and Family Science at Purdue University. Their research program applies advanced quantitative longitudinal methods alongside community-engaged qualitative work to understand effects of individual biology (i.e., hormones, genetics), proximal environments (i.e., prenatal, parents, peers), and sociocultural macroenvironments on adolescent substance use progression for diverse gender and race-ethnic background youth.
- Catalina Vega Mendez (Political Science): Catalina Vega Méndez is a Ph.D. Candidate in the Department of Political Science at Purdue University. Her research focuses on comparative political behavior and migration policy, with a regional emphasis on Latin America. She studies public attitudes and policy responses to international migration using a range of methodological tools with expertise in difference-in-differences designs, as well as the analysis of international longitudinal survey and panel data.