Short Courses are offered as full- or half-day courses. The extended length of these courses allows attendees to obtain an in-depth understanding of the topic. These courses often integrate seminar lectures covering foundational concepts with hands-on lab sessions that allow users to implement these concepts into practice.
Sunday, March 15 | 8:00 am – 5:00 pm
SC1 | Deep Learning Methods in Advanced Statistical Problems
Instructors:
Hongtu Zhu, University of North Carolina at Chapel Hill
Xiao Wang, Purdue University
Runpeng Dai, University of North Carolina at Chapel Hill
Course Description:
This short course is designed for researchers in statistics and data analysis who are eager to explore the latest trends in deep learning and apply these methods to solve complex statistical problems. The course delves into the intersection of deep learning and statistical analysis, covering topics familiar to statisticians such as time series analysis, survival analysis, and quantile regression. Additionally, it addresses cutting-edge topics in the deep learning community, including transformers, diffusion models, and large language models. In this one-day short course participants will gain hands-on experience in exploring and applying deep learning methodologies to tackle various statistical challenges. Basic knowledge of Python programming will be helpful but not necessary.
Statistical/Programming Knowledge Required:
Basic knowledge of Python programming
Instructor Biographies:
Hongtu Zhu is Professor of Biostatistics, Statistics, and Computer Science at the University of North Carolina at Chapel Hill. He earned his PhD in Statistics from the Chinese University of Hong Kong in 2000. He was the chief scientist of statistics in DiDi Chuxing. Zhu's expertise spans statistics, medical imaging, genetics, artificial intelligence, and computational neuroscience. He is particularly noted for his work in neuroimaging data analysis and causal reinforcement learning in two-sided markets. He is the fellow of IMS and ASA. He also serves as the editor of JASA applications and case studies and the coordinating editor of JASA.
Xiao Wang obtained his Ph.D. in Statistics from University of Michigan. He is Head and J.O. Berger and M.E. Bock Professor of Statistics at Purdue University. His research expertise lies at the intersection of modern statistics and AI. Dr. Wang's work has been featured in leading statistical journals and machine learning conferences, and he is a fellow of the Institute of IMS and ASA. He also serves as associate editor of Journal of the American Statistical Association, Technometrics, and Lifetime Data Analysis.
Runpeng Dai obtained his B.S in statistics from Shanghai University of Finance and Economics and is now a PhD candidate in Department of Biostatistics at University of North Carolina at Chapel Hill. His research interest lies in Reinforcement learning and Large language model.
Sunday, March 15 | 8:00 am – 5:00 pm
SC2 | Transfer Learning and AI-Assisted Synthetic Data Generation in Survival Analysis
Instructors:
Kevin He, Department of Biostatistics, University of Michigan
Di Wang, Division of Biostatistics, Medical College of Wisconsin
Course Description:
This course presents modern advancements in survival analysis, emphasizing two cutting-edge strategies: transfer learning and AI-assisted synthetic data generation. Designed for graduate students, data scientists, and researchers working with time-to-event data in healthcare, biomedical research, and related fields, the course addresses key challenges such as data heterogeneity, sharing limitations, and computational privacy constraints. The course begins with core concepts in survival analysis, including time-to-event data structures, Cox and discrete-time models. Building on this foundation, students explore machine learning methods for survival prediction, such as penalized regression techniques (Lasso, Ridge, Elastic Net), XGBoost, and deep learning models like DeepSurv and DeepHit. A central focus is the use of transfer learning frameworks—specifically those based on Bregman divergence—to adapt various published models (e.g., Cox models, discrete-time survival models, and machine learning approaches) or risk score calculators (including clinically derived risk groupings) to new target populations, while accounting for data heterogeneity. In parallel, students will learn to generate synthetic survival data using AI-based tools, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and diffusion models, and evaluate their fidelity, utility, and privacy-preserving properties. The course also explores strategies for knowledge distillation and integrating synthetic data with transfer learning, enabling robust hybrid models that leverage both real and artificial data. These methods are demonstrated through case studies and applications in R and Python, with examples drawn from established benchmark datasets for survival prediction. Emphasis is placed throughout on practical implementation and reproducibility, particularly when working across institutional health systems or generating artificial patient records. By the end of the course, participants will have developed a rigorous, forward-looking toolkit for advancing survival modeling in complex, real-world environments.
Statistical/Programming Knowledge Required:
Basic probability, statistics and regression modeling; Experience with R or Python
Instructor Biographies:
Kevin He is an Associate Professor in the Department of Biostatistics and the Associate Director of the Kidney Epidemiology and Cost Center (KECC) at the University of Michigan. His primary research interests include survival analysis, healthcare provider profiling, risk prediction, data integration, machine learning, statistical optimization, causal inference and statistical genetics with application in organ transplantation, kidney dialysis, psoriasis, cancer and stroke. His work is driven by large and complex datasets, including national disease registries, administrative claims data, and high-throughput genomic, epigenomic, and transcriptomic data.
Di Wang is an Assistant Professor in the Division of Biostatistics at the Medical College of Wisconsin. She received her PhD in Biostatistics from the University of Michigan. Her research focuses on developing effective statistical learning methods to address emerging challenges in complex biomedical data and to improve clinical practice across diverse populations. Her methodological interests include data integration, machine learning, survival analysis, statistical genetics, statistical optimization, and high-dimensional data analysis. Her work has applications in risk prediction, precision medicine, health equity, polygenic scores, and clinical trials, with data sources such as electronic health records (EHR), biobanks, and national disease registries. She actively collaborates on studies related to cancer, diabetes, chronic kidney disease (CKD), organ transplantation, acute kidney injury (AKI), and immune checkpoint inhibitors (ICI).
Sunday, March 15 | 8:00 am – 5:00 pm
SC3 | Generative Models for Protein Structures and Biomedical Data
Instructors:
Anru Zhang, Duke University
Alex Tong, Duke University
Zhangzhi (Fred) Peng, Duke University
Course Description:
Generative models have become powerful tools in modern biomedical research, especially in structural biology for modeling and designing protein structures. This short course introduces key concepts and recent advances in generative modeling, with a particular focus on their applications to protein structure prediction and related biomedical data challenges. We begin with an overview of foundational methods, including variational autoencoders and generative adversarial networks, and progress to state-of-the-art frameworks such as diffusion models and flow matching. The course will demonstrate how generative models are used to generate accurate protein backbones and synthesize biomedical time series data with irregular sampling. We will also discuss how these approaches address common limitations of biomedical datasets, such as small sample sizes and sampling bias, while enabling data augmentation and simulation in protein modeling. The course is designed to be accessible yet rigorous, combining technical insights with practical examples and case studies from recent research. Participants will gain a conceptual understanding of how generative modeling techniques apply specifically to protein structures and biomedical data, as well as open research questions in this rapidly developing field.
Statistical/Programming Knowledge Required:
No prior experience with generative models is required, though familiarity with basic machine learning concepts will be helpful.
Instructor Biographies:
Anru Zhang is a primary faculty member jointly appointed by the Department of Biostatistics & Bioinformatics and the Departments of Computer Science and the Eugene Anson Stead, Jr. M.D. Associate Professor at Duke University. He obtained his Ph.D. from the University of Pennsylvania in 2015. His work focuses on tensor learning, generative models, high-dimensional statistics, and applications in electronic health records and microbiome data analysis. He won the IMS Tweedie Award, the COPSS Emerging Leader Award, and the ASA Gottfried E. Noether Junior Award. His research is currently supported by two NIH R01 Grants (as PI and MPI) and an NSF CAREER Award.
Alex Tong is an Assistant Professor (starting July 2025) in Computer Science, Cell Biology, and Biostatistics & Bioinformatics at Duke University. He completed his Ph.D. at Yale University in 2021 and a postdoc with Yoshua Bengio at Mila, where he focused on efficient ML algorithms for cell and molecular biology. His research centers on generative modeling, optimal transport, and causal discovery, with applications to cell development and protein design. He also co-founded Dreamfold to explore generative approaches for protein engineering.
Zhangzhi (Fred) Peng is a PhD student at Duke University working on generative models and protein design.
Sunday, March 15 | 8:00 am – 12:00 pm
SC4 | The Technology of Large Language Models in Biomedical Research
Instructor:
Junwei Lu, Harvard T.H. Chan School of Public Health
Course Description:
This course introduces the application of large language models (LLMs) in biomedical research, with a focus on electronic health records (EHRs) and genomics. Students will gain hands-on experience in coding for LLM-based biomedical data processing, including tokenization strategies tailored to structured and unstructured EHRs as well as genomic sequences. The course covers foundational model architectures, training on domain-specific corpora, and techniques for fine-tuning pretrained models for clinical and genomic tasks. Through projects and practical labs, students will explore how LLMs can extract insights, support prediction tasks, and enable novel discoveries in health and life sciences.
Statistical/Programming Knowledge Required:
Experience with PyTorch and basic knowledge of AI models including neural networks, transformers, and word embeddings are required.
Instructor Biography:
Junwei Lu is an Assistant Professor in the Department of Biostatistics at Harvard School of Public Health. His work focuses on the intersection of statistical artificial intelligence and translational clinical research by leveraging electronic health records and medical imaging data.
Sunday, March 15 | 1:00 pm – 5:00 pm
SC5 | Measurement Error Models in Action: The Latest Methods and Their Applications in Nutrition and Environmental Health
Instructors:
Donna Spiegelman, Department of Biostatistics, Yale School of Public Health
Molin Wang, Harvard School of Public Health
Raymond Carroll, Department of Statistics, Texas A&M University
Course Description:
It is well known that measurement error in exposures of great interest in public health, for example, diet and environmental exposures including air pollution and greenness of neighborhoods, leads to bias in estimating associations between exposures and health outcomes, usually towards the null, inducing false negative findings. Many methods have been developed to correct for covariate measurement error bias. This short course will begin with an overview of the impact of measurement error on estimation and inference, and give an overview of the methods available for addressing covariate measurement error. It will discuss the options for study designs that permit empirical adjustment for measurement error. We will focus on regression calibration, the most common approach for handling measurement error in continuous covariates, including its statistical framework, key assumptions, and empirical verification of these. Principles and application of methods for the design of main study/validation study designs and main study/reliability study designs, needed to identify parameters of interest, will be discussed.
Finally, the course will cover how modern machine learning and AI-driven methods (e.g., double debiased machine learning) can be integrated with measurement error correction methods for variable selection and robust confounding control, and their practical implementation. The strengths and limitations of machine learning methods for measurement error correction will be discussed. Hands-on experience will be provided using R. Real-world examples from the instructors’ real-world collaborations in nutritional and environmental epidemiology will be used to illustrate the concepts and demonstrate the practical utility of the methods.
Statistical/Programming Knowledge Required:
Basic knowledge of statistics and/or biostatistics, and R programming
Instructor Biographies:
Dr. Donna Spiegelman received a joint doctorate in biostatistics and epidemiology from the Harvard T.H. Chan School of Public Health (HSPH) in 1989. She then served on the faculty at Harvard until 2018, after which she joined the Yale School of Public Health (YSPH) as Professor of Biostatistics and as founding director of its Center on Methods for Implementation and Prevention Science. YSPH’s most highly cited scientist, Dr. Spiegelman collaborates with this course’s co-instructors, developing methods and software for estimation and inference in the presence of exposure and outcome errors.
With joint faculty appointments in Epidemiology and Biostatistics at HSPH, Dr. Molin Wang’s research focuses on statistical challenges encountered in epidemiology, including measurement error problems. She is the lead statistician for Harvard’s Nurses’ Health Study II, Health Professionals Follow-up Study, and the Pooling Project on Diet and Cancer in Women and Men.
Dr. Raymond Carroll is one of the leading statisticians of our time, having received every available award in the statistics profession. He was head of the Department of Statistics at Texas A&M from 1987-1990, was the founding director of the Texas A&M Center for Statistical Bioinformatics, and has been director of Texas A&M Institute for Applied Mathematics and Computational Science. He has served as editor of Biometrics and JASA. He has written a leading textbook on measurement error, an ongoing focus of his research.
Sunday, March 15 | 1:00 pm – 5:00 pm
SC6 | Adapting Generative AI for Biomedical Research: A Practical Course on Domain Adaptation and Task Automation
Instructors:
Yanxun Xu, Johns Hopkins University
Course Description:
This half-day short course is designed to equip statisticians with practical knowledge and skills to leverage cutting-edge generative AI effectively in biomedical research. As biomedical data grow in volume and complexity, there is an urgent need to adapt general-purpose AI tools to address domain-specific challenges. The course focuses on two core areas: domain adaptation and task automation, emphasizing both conceptual understanding and real-world applications. The first section demonstrates how to tailor general-purpose AI tools to the specialized context of biomedical studies. Participants will learn to construct private chatbots using open-source LLMs, progressively extending these into sophisticated multi-agent Retrieval-Augmented Generation (RAG) systems integrated with customizable biomedical databases via Model Context Protocol (MCP). Emphasis will be placed on strategies for effective adaptation of AI across diverse biomedical tasks, accounting for varying availability of biomedical research data.
The second part addresses task automation, illustrating how LLMs can streamline complex research workflows. Participants will explore principles for identifying tasks amenable to automation and gain hands-on experience with applications such as extracting individual patient data from Kaplan-Meier plots using multimodal AI, conducting landscape analyses, and implementing biomedical trial matching systems to enhance efficiency and accuracy. Throughout the course, responsible AI deployment will be a cross-cutting theme. Participants will be introduced to methodological frameworks and best practices for data privacy protection, including approaches to reduce exposure of sensitive information, generate synthetic data, and deploy secure, locally hosted AI solutions. By the end of the course, attendees will have a clear understanding of how to adapt and apply generative AI tools to real-world biomedical research, gaining practical experience in AI-driven data extraction and task automation while upholding ethical and privacy standards. This course will empower participants to accelerate research workflows, enhance decision-making, and lead the integration of advanced AI technologies in biomedical settings.
Statistical/Programming Knowledge Required:
Basic knowledge of Python is recommended.
Instructor Biographies:
Dr. Yanxun Xu is an Associate Professor and Joseph & Suzanne Jenniches Faculty Scholar in the Department of Applied Mathematics and Statistics, and Data Science and AI Institute at Johns Hopkins University. Her research develops Statistical theory and methods for sequential decision making, high-dimensional data analysis, and uncertainty quantification, with applications spanning electronic health records, cancer genomics, clinical trial designs, and precision medicine. A key focus of her recent work is the integration of generative AI and statistical learning to address challenges in biomedical research, including AI-augmented clinical trial design and multimodal data synthesis.
Monday, March 16 | 1:00 pm – 5:00 pm
SC7 | Retro (bio)statistics for a World Where AI Predicts Everything
Instructors:
Tyler McCormick, University of Washington
Kentaro Hoffman, University of Washington
Course Description:
We're experiencing a paradigm shift in modern (bio)statistics, moving from a world where data are collected intentionally and carefully for a specific research purpose to a landscape dominated by data obtained opportunistically and imputed by machine learning and AI models. This short course covers methods for adjusting for uncertainty when using data that combines observations and predictions from AI or machine learning models. We first cover methods for adjusting for statistical inference, reviewing methods from an active and growing literature on the topic. Then, we relate these newly developed methods to classical tools from (bio)statistics, including two-phase sampling and inverse probability weighting. Finally, students will implement these methods in the context of two examples. The first, from global health, covers estimating the distribution of deaths by cause for deaths that happen outside of healthcare settings using narratives coded using natural language processing systems. The second involves adjusting for the predicted adiposity to estimate trends in metabolic conditions over time. For both examples, we will use the ipdtools suite, which provides a unified interface for comparing multiple methods for adjusting outcomes predicted by AI/ML models.
Statistical/Programming Knowledge Required:
Students should have familiarity with R and RMarkdown, some experience fitting and interpreting generalized linear models, and high-level exposure to machine learning (e.g. concepts like testing, training, and validation sets, out-of-sample performance, etc.). The course will be hands-on and intended to be accessible to a wide range of backgrounds.
Instructor Biographies:
Tyler McCormick's work develops statistical models for inference and prediction in scientific settings where data are sparsely observed or measured with error. His recent projects include estimating features of social networks using data from standard surveys, inferring a likely cause of death (when deaths happen outside of hospitals) using reports from surviving caretakers, and quantifying & communicating uncertainty in predictive models for global health policymakers. He holds a Ph.D. in Statistics (with distinction) from Columbia University and is the recipient of the NIH Director's New Innovator Award, NIH Career Development (K01) Award, Army Research Office Young Investigator Program Award, and a Google Faculty Research Award. Currently, he is a Professor of Statistics and Sociology at the University of Washington. Tyler was previously a Visiting Faculty Researcher at Google People+AI Research (PAIR). Tyler is the former Editor of the Journal of Computational and Graphical Statistics (JCGS) and a Fellow of the American Statistical Association. More information is here: https://thmccormick.github.io
Kentaro Hoffman is a Postdoctoral Researcher in the Department of Statistics at the University of Washington. He also serves as an affiliate Postdoctoral Fellow at the eScience Institute and the Center for Statistics and the Social Sciences. His research focuses on developing robust statistical methodologies for inference on machine learning-generated data and on leveraging Rashomon sets for active learning, with applications spanning sociology, global health, and biostatistics.