Tutorials are roughly 2 hours in length and focus on a particular topic or software package. The sessions are more interactive than a standard lecture, often encouraging participants active engagement and hands-on participation.
Monday, March 16 | 8:30 am – 10:15 am
T1 | Design and Analysis of Biomarkers Studies for the AI-Augmented World
Instructor:
Douglas Landsittel, University of Buffalo, State University of New York
Course Description:
As AI-related methods become increasingly popular, so does the need for fundamentally sound statistical reasoning, study designs, and analysis methods. To accomplish these goals, this tutorial frames AI methods in the context of biomarker research, i.e., the study of objectively measured characteristics which are evaluated as indicators of normal or pathogenic processes, or pharmacologic response to treatment. By defining the output of AI methods in the context of biomarker studies, we leverage decades of research and extensive terminology on defining types of biomarkers, steps in the process of biomarker development, and optimal study designs for different phases in biomarker research. Optimal use of AI methods should also leverage strategies for rigor, reproducibility, and transparency that are now standard aspects of biomarker research.
Many fundamental aspects of biomarker research, and the associated statistical topics, are directly overlapping with current and unavoidable future challenges in AI research. Labeling AI methods and their applications in terms of an associated phase in biomarker research will facilitate the clear articulation of research goals, and the strengths and limitations for AI methods. As a for instance, AI methods are likely far more effective in diagnosing essentially deterministic problems based on highly complex data to discriminate between disease states. On the other hand, prognosis of a future outcome that depends on biological variability may be more accurately predicted using either a standard regression approach or other so-called machine methods, like classification trees, which are far more simplistic that the deep-learning neural networks used for AI. By framing the evaluation of AI approaches as a biomarker research problem, we can discuss strengths and weakness and evaluate key aspects of the model characteristics specific to the given phase of biomarker research. Without doing so, we run the risk (or likelihood) that AI methods will benefit from apples-to-oranges comparisons against their competitor methods.
We will use a 20-year ongoing observational study of imaging biomarkers for polycystic kidney disease to illustrate proper use of biomarker terms and concepts and provide statisticians the needed tools for communicating effectively with non-statisticians in the ongoing evolution of methods in this AI-augmented world.
Statistical/Programming Knowledge Required:
Basic knowledge of regression and some experience with R
Instructor Biography:
Dr. Douglas Landsittel serves as a Professor and Chair of Biostatistics at the University at Buffalo, State University of New York. His past roles include Chair of Epidemiology and Biostatistics at Indiana University-Bloomington, Director of Biostatistics for the Starzl Transplant Institute at the University of Pittsburgh, and Associate Director of both the Center for Research on Healthcare Data Center and the University of Pittsburgh Biostatistics Facility. He has over 25 years of experience as a biostatistician, with research across a wide range of disciplines in clinical research and public health. He has published over 170 peer-reviewed research articles and is a former permanent member of two study sections and chaired the CDC/NIOSH Safety and Occupational Health Study Section for 3 years. He has also served on >100 other ad hoc reviews. Dr. Landsittel is a Fellow of the American Statistical Association and received the University of Pittsburgh Alumni Award for Research in 2024. He also directs the Expanding National Capacity in PCOR through Training & Collaboration Network, which developed online training materials and trained 22 Fellows from Minority- and Hispanic-Serving Institutions through an AHRQ R25 grant.
Monday, March 16 | 10:30 am – 12:15 pm
T2 | A Statistician's Guide to Integrating Generative AI into Scientific Research
Instructor:
Zhenke Wu, University of Michigan
Course Description:
Generative AI (GenAI) has rapidly evolved from the initial curiosity sparked by ChatGPT into a transformative technology with implications for knowledge representation and scientific discovery. For the field of statistics, which is a foundational language for scientific inquiry, the thoughtful adoption of GenAI tools presents a significant opportunity for innovation, education, and enhanced impact. This tutorial will provide a comprehensive overview of this new landscape, beginning with the inherent complexities and ongoing ethical debates.
We will delve into critical issues such as data privacy, algorithmic bias, and quality monitoring, stressing the statistician's vital role in navigating these challenges to ensure the responsible and rigorous application of these powerful tools.
The session will highlight early successes that demonstrate GenAI's potential across key application areas. Examples include its use in medicine to accelerate drug discovery and enhance clinical trial design; its impact on biology in advancing genomic research and predicting protein structures; and its utility in healthcare for optimizing hospital operations and personalizing patient communication. We will outline best practices for statisticians to use GenAI tools effectively to enhance the quality and integrity of statistical work within large scientific teams.
The tutorial will feature a series of practical demonstrations illustrating the integration of GenAI into a statistician's research workflow. These hands-on examples will include leveraging GenAI for automated code generation and debugging, conducting intelligent and rapid literature reviews, and using AI-powered tools for enhanced data exploration and hypothesis generation. The session will culminate in a structured interactive discussion, creating a forum for attendees to share what specific advances they hope to see or make in their respective fields. By the end of this tutorial, attendees will have a deeper understanding of the potential and pitfalls of GenAI, a practical framework for its integration, and a clearer vision for how to contribute to its responsible use and development within the statistical and the broader scientific community.
Statistical/Programming Knowledge Required:
Access to one of the large language model client interface, e.g., ChatGPT, Gemini, Claude, etc. Need to have had experience in some applied data analysis projects.
Instructor Biography:
Dr. Zhenke Wu completed a BS in Math at Fudan University and a PhD in Biostatistics from the Johns Hopkins University and then stayed at Hopkins for his postdoctoral training. He is currently an Associate Professor of Biostatistics at University of Michigan, Ann Arbor, and a faculty affiliate in Michigan Institute for Data and AI in Society (MIDAS).
Dr. Wu's research is motivated by biomedical and public health problems and is centered on the design and application of statistical methods that inform health decisions made by individuals, or precision medicine. Towards this goal, he focuses on two lines of methodological research: a) structured Bayesian latent variable models for clustering and disease subtyping, and b) study design, causal and reinforcement learning methods for evaluating sequential interventions that tailor to individuals’ changing circumstances such as in interventional mobile health studies. His current projects center around AI for affordable and individualized healthcare, computational and interventional digital health.
Dr. Wu serves as an Associate Editor for Annals of Applied Statistics, Biostatistics, Journal of Royal Statistical Society: Series A (JRSS-A) and a Statistical Consultant and Reviewer for New England Journal of Medicine - Artificial Intelligence (NEJM-AI).
Monday, March 16 | 1:45 pm – 3:30 pm
T3 | Crafting Presentations That Connect: A Tutorial for Statisticians
Instructors:
Sarah Lotspeich, Wake Forest University
Ana M. Ortega-Villa, Biostatistics Research Branch
Course Description:
Communicating statistical research effectively is a crucial skill for engaging diverse audiences, from academic peers and clinical collaborators to industry partners and policymakers. This tutorial provides practical guidance on designing and delivering compelling presentations about methodological and applied statistics projects. We will cover best practices and tools for slide design, strategies for conveying complex concepts clearly, key principles for making talks accessible and engaging, and how to tailor presentations to different audiences.
Statistical/Programming Knowledge Required:
None
Instructor Biographies:
Dr. Sarah Lotspeich is an Assistant Professor in Statistical Sciences at Wake Forest University. Sarah completed a postdoctoral fellowship in Biostatistics at UNC Chapel Hill and earned her Ph.D. in Biostatistics from Vanderbilt University. She is enthusiastic about mentoring student research and co-leads various collaborative labs. Her research tackles challenges in analyzing error-prone observational data, focusing on international HIV cohorts, electronic health records, and neighborhood food environments.
Dr. Ana M. Ortega-Villa joined the Biostatistics Research Branch (BRB) in 2018 and serves as a mathematical statistician. Prior to joining the BRB, Dr. Ortega-Villa obtained her Ph.D. in Statistics from Virginia Tech and completed post-doctoral fellowships at both the Eunice Kennedy Shriver National Institute of Child Health and Human Development and the National Cancer Institute. Her interests include longitudinal data, mixed models, vaccines, immunology, research capacity building in developing countries, statistics education, and initiatives that foster a culture of belonging.
Monday, March 16 | 3:45 pm – 5:30 pm
T4 | Introduction to Explainable Machine Learning
Instructor:
Aramayis Dallakyan, StataCorp
Course Description:
Machine learning (ML) has become a powerful tool for modeling complex data and providing accurate predictions. However, the "black box" nature of many ML models often raises concerns about their explainability and trustworthiness. Explainable Machine Learning (XML) seeks to address these concerns by enhancing the transparency and understanding of ML predictions. This tutorial aims to provide a practical guide to XML techniques. The learning objectives of this tutorial are: a. Understand the difference between interpretable and explainable machine learning methods b. Develop a practical understanding of a range of global and local XML techniques c. Learn the advantages and limitations of XML methods and how to interpret their results d. Gain hands-on experience implementing XML techniques on real data using Stata The tutorial begins with an overview of ensemble decision tree models, such as random forests and gradient boosting, which are widely used but often difficult to interpret. I then introduce methods for explaining predictions using both global and local XML techniques. These include state-of-the-art approaches such as SHAP values, individual conditional expectation (ICE) plots, variable importance measures, partial dependence plots, and global surrogate models. Participants will gain hands-on experience through practical examples and case studies using Stata's h2oml suite of commands. No prior knowledge of Stata is required, although a basic familiarity with ML will be beneficial.
Statistical/Programming Knowledge Required:
No prior knowledge of Stata is required, although a basic familiarity with ML will be beneficial.
Instructor Biography:
Aramayis Dallakyan is a Senior Statistician and Software Developer at StataCorp LLC. Aramayis’s research interests are at the intersection of high-dimensional time series, Graphical Models, and Statistical/Machine Learning. Prior to joining StataCorp, he earned a PhD in statistics from Texas A&M University.
Tuesday, March 17 | 1:45 pm – 3:30 pm
T5 | Introduction to Prediction-based Inference: Methods & Applications
Instructors:
Jesse Gronsbell, University of Toronto
Stephen Salerno, University of Washington
Course Description:
Artificial intelligence and machine learning (AI/ML) have become essential tools in biomedical research, enabling large-scale analyses across diverse domains such as genomics, structural biology, and electronic health records-based research. Increasingly, researchers rely on model-generated predictions, rather than directly measured variables, as inputs for downstream statistical analyses. For example, predicted gene expression values or polygenic risk scores are often used in place of experimental assays, allowing researchers to expand cohort sizes and explore hypotheses when traditional data collection is infeasible, costly, or time-consuming.
While this practice of “using predictions as data” holds promise for accelerating scientific discovery, it presents significant challenges for statistical inference. When predicted values are used in place of true variables, the resulting estimates of association can be biased and misleading if uncertainty in the prediction step is not properly accounted for.
In this tutorial, we explore the consequences of inference on predicted data across several biomedical applications.
Drawing from classical approaches for measurement error and more recent developments in bias correction, we will present a suite of prediction-based inference methods that adjust for prediction-related uncertainty and improve inference validity and efficiency. We will also introduce {ipd}, a user-friendly Bioconductor R package that implements several of these correction methods through a unified interface. The package supports modular integration into existing workflows and includes tidy methods for model inspection and diagnostics.
Statistical/Programming Knowledge Required:
Basic R programming knowledge and statistics/biostatistics knowledge at a masters level.
Instructor Biographies:
Stephen Salerno is a postdoctoral researcher in Biostatistics at the Fred Hutchinson Cancer Center, working with Professor Jeff Leek on methods for drawing inference on AI/ML-generated outcomes and high-dimensional survival analysis. He is also interested in methods for addressing selection bias in biomedical data. Steve is passionate about data science education and pursuing data for good initiatives such as Statistics in the Community (STATCOM).
Jesse Gronsbell is an Assistant Professor in Statistical Sciences at the University of Toronto. Her research focuses on developing statistical learning and inference methods to address key challenges in analyzing modern observational health data, including extreme missing data, complex measurement error, and issues of bias and fairness. She also works in critical data studies, with a focus on the challenges of collecting sex and gender data in electronic health records.
Tuesday, March 17 | 3:45 pm – 5:30 pm
T6 | Evidence Synthesis Approaches to Accelerate Rare Disease Drug Development
Instructors:
Satrajit Roychoudhury, Pfizer Inc.
Wei Wei, Yale University
Course Description:
Scientists have identified more than 7000 rare diseases in total ranging from rare cancers to neurodegenerative diseases. Many of these conditions are life threatening. Collectively, these rare diseases affect more than 30 million people in the United States, and more than half of these people are children. The Orphan Drug Act was enacted in 1983 to support the development of rare diseases treatment by providing various incentives to qualified sponsors. Since then, the FDA has approved hundreds of drugs for rare diseases, however, most rare diseases still lack FDA-approved treatment. Drug development in rare disease area faces many challenges including the inherently small population sizes, the heterogeneity of treatment effects, and lack of reproducibility between studies conducted in different regions and different populations (e.g, pediatric vs. adult). As a result, clinical trials studying rare diseases are often underpowered and cannot generate sufficient statistical evidence required for regulatory approval. Recent development in Bayesian methodology provides a robust framework for synthesizing existing evidence for an investigational new drug (IND) by harnessing data from both internal and external sources (past and concurrent trials, real-world data). These Bayesian approaches can reduce the number of patients that needs to be enrolled, evaluate the totality of evidence regarding an IND, and accelerate the drug development and marketing approval process. This course will focus on Bayesian approaches for evidence synthesis in rare disease drug development. We will cover Bayesian techniques for borrowing external evidence including the robust meta-analytic-predictive priors(rMAP), the exchangeability-nonexchangeability (EXNEX) model, and the multisource-exchangeability models(MEMs). We will introduce novel strategies that incorporate causal inference technique in Bayesian evidence synthesis. We will also discuss the use of Bayesian machine learning method in transferring evidence learned from external sources to the current trial. This short course will introduce the general methodology framework for Bayesian evidence synthesis, provide step-by-step instructions for implementing these approaches and evaluating their frequentist operating characteristics. We will provide a detailed illustration with real life examples and provide necessary R scripts.
Statistical/Programming Knowledge Required:
This tutorial is for statisticians with MS and/or PhD degrees.
Instructor Biographies:
Dr. Satrajit Roychoudhury is an Executive Director and Head of the Statistical Research and Innovation group in Pfizer Inc. He has extensive experience in working with different phases of clinical trials for drug and vaccine. His research interest includes survival analysis, use of model-based approaches and Bayesian methods in clinical trials. Satrajit is an elected Fellow of the American Statistical Association and recipient of Royal Statistical Society(RSS)/Statisticians in the Pharmaceutical Industry (PSI) Statistical Excellence in the Pharmaceutical Industry Award in 2023 and Young Statistical Scientist Award from the International Indian Statistical Association in 2019.
Wei Wei is an Assistant Professor in the Department of Biostatistics at the Yale School of Public Health. He received his Ph.D. in Biostatistics from the Medical University of South Carolina and joined the Department of Biostatistics at YSPH in 2017 as an associate research scientist. Dr. Wei's research area focuses on development of early phase clinical trial designs, with particular interest in cancer targeted and immunotherapeutic agents. In addition to his research in cancer clinical trials, Wei’s research expertise also includes statistical genomics, biomarker discovery and neurophysiology.