Tutorials are roughly 2 hours in length and focus on a particular topic or software package. The sessions are more interactive than a standard lecture, often encouraging participants active engagement and hands-on participation.
Monday, March 24 | 8:30 am – 10:15 am
T1 | Statistical Methods for Biomarker Discovery Using Metabolomics
Instructors:
Ali Rahnavard, The George Washington University
Himel Mallick, Weill Cornell Medicine
Course Description: The affordability of metabolomics profiling has enabled extensive surveys of metabolites in human health, other hosts, and the environment on an unprecedented scale. Consequently, this surge in data has driven the development of new statistical and computational approaches to analyze and integrate diverse metabolomics data. However, despite sharing many similarities with conventional omics, routine analysis methods from the literature cannot be directly applied to metabolomics studies to achieve complete mechanistic insight without risking false positive or false negative results.
This challenge is amplified by the technical nature of metabolomics data, which are typically noisy, heterogeneous, and high-dimensional, often affected by platform-specific effects requiring specialized tools and methods for accurate analysis. From a practical standpoint, using generic downstream analysis software without understanding the inherent statistical properties of metabolomics data can lead to inconclusive and potentially misleading biological conclusions. Moreover, the abundance of available downstream analysis methods complicates the selection process for non-specialists and inexperienced researchers. Finally, identifying reproducible signals for clinical actionability necessitates well-powered experimental designs and meta-analysis across studies, both of which present significant challenges within current metabolomics analysis paradigms.
Our workshop will begin with a high-level introduction to computational multi-omics, focusing on the current state-of-the-art and addressing key challenges, particularly in downstream analysis methods for metabolomics data compared to other omics. Activities will include formulating biological hypotheses and exploring contemporary statistical methods to address them. The workshop will be project-focused and hands-on, encouraging participants to bring specific studies or projects for immediate application of the workshop content using real data.
Drawing upon our extensive experience in both industry and academia, we aim to provide a diverse perspective on the topic. This includes insights from drug discovery and basic science, enabling attendees to gain a holistic understanding of multi-omics and clinical data integration through advanced tools applied to relevant examples and case studies.
The workshop will start with an overview of the statistical challenges inherent in analyzing high-dimensional data typical of multi-omics studies. Introductory lectures will cover:
Statistical/Programming Knowledge Required: None
Instructor Biographies:
Dr. Ali Rahnavard is an Associate Professor of Biostatistics and Bioinformatics at George Washington University, specializing in applying advanced AI and machine learning techniques to derive biological insights from complex omics data. His research team develops robust statistical and machine learning methods, including large language models, to address complex data structures and limited sample sizes. Dr. Rahnavard uses metagenomics and metabolomics for biological and health inference. Dr. Rahnavard has developed and designed several workshops introducing techniques and tools to analyze omics data.
Dr. Himel Mallick is an Assistant Professor at Cornell University, a data scientist and computational biologist with over two decades of experience in statistics, biostatistics, and AI/ML across academia and industry. He has contributed to large multi-center studies, including the NIH Human Microbiome Project, and developed widely used tools like MelonnPan and MaAsLin2. With over 40 highly cited publications, his research focuses on Bayesian statistics, machine learning, and multi-omics integration. Dr. Mallick successfully coordinated several short courses together, which has prepared us well for this upcoming short course at ENAR 2025. We are excited to present at ENAR 2025 for the first time.
Monday, March 24 | 10:30 am – 12:15 pm
T2 | Modern Approaches for Identifying Heterogeneous Treatment Effects from Experimental and Observational Data
Instructor:
Ilya Lipkovich, Eli Lilly and Company
Course Description:
In this tutorial (based on our recent tutorial in Statistics in Medicine) we review recent advances in statistical methods for identification and evaluation of Heterogeneous Treatment Effects (HTE), including subgroup identification, estimation of conditional average treatment effects (CATE), and individualized treatment regimens (ITR) using data from randomized clinical trials and observational studies. We will present several classes of approaches including indirect approaches based on modeling response surface as a function of treatment and covariates, and various direct approaches targeting causal estimands of interest. We will illustrate selected approaches using available R packages with data sets mimicking randomized clinical trials and observational studies with non-random treatment assignment.
Statistical/Programming Knowledge Required:
Basic R programming and understanding of statistical concepts at master’s level.
Instructor Biography:
Dr. Ilya Lipkovich is an Executive Director at Eli Lilly and Company. He received his Ph.D. in Statistics from Virginia Tech in 2002 and has more than 20 years of statistical consulting experience in the pharmaceutical industry. He is an ASA Fellow and published on subgroup identification in clinical data, analysis with missing data, and causal inference.
Monday, March 24 | 1:45 pm – 3:30 pm
T3 | Statistical Methods for Phenotyping and Causal Inference on Longitudinal Administrative Data and Electronic Health Records: A Workshop with Implementation in R
Instructors:
Zihang Lu, Queen's University
Kuan Liu, University of Toronto
Course Description:
The increasing availability of longitudinal data in medical studies has enabled researchers to identify dynamic patient phenotypes and evaluate the time-varying intervention effect. Results generated from these studies often have major health and policy implications and can inform decision-making that benefits both the overall population and targeted study subpopulations. However, it is challenging to make inferences from longitudinal data due to its complex nature (e.g., missingness, informative censoring, correlated observations, and sparsity, etc.), making it difficult to identify meaningful patterns and causal relationships from the data. Statistical methods to overcome these data complexities have been developed, however, the uptake of these approaches, especially the newer ones are low in applications due to the limited availability of open-access coding examples, case studies, and tutorials.
The learning objectives of this tutorial are (a) gaining a practical understanding of various statistical techniques for phenotyping and causal analysis using longitudinal data, (b) being able to implement these methods to real data using R.
This tutorial includes two sessions. The first session (~ 50 mins) will be led by Dr. Zihang Lu (Instructor 1) and will focus on statistical learning methods for clustering longitudinal data. In this session, we will introduce the type of longitudinal data, methods (including model-based and algorithm-based methods) and R software packages for clustering longitudinal features. We will also discuss issues including model selection, evaluation and interpretation. Case studies using health datasets will be provided and discussed.
The second session (~ 50 mins) will be led by Dr. Kuan Liu (Instructor 2) and will discuss the causal framework and various causal estimation techniques for longitudinal observational studies. We will cover a selection of frequentists and Bayesian methods for causal inference with time-dependent treatment/exposure and time-varying confounding including frequentist and Bayesian marginal structural models, g-formula, and targeted maximum likelihood estimation and introduce R codes and software packages to conduct these approaches. Case studies will be provided to demonstrate the implementation of these methods in R.
Statistical/Programming Knowledge Required: An introductory level of statistics and programming skills using R will be required to understand the content of this tutorial.
Instructor Biographies:
Dr. Zihang Lu is a Biostatistician and Assistant Professor in the Department of Public Health Sciences at Queen’s University. Dr. Lu holds a PhD and an MSc in Biostatistics, both from the University of Toronto. Prior to his faculty appointment, Dr. Lu had six years of working experience as a Biostatistician at the Hospital of Sick Children. Dr. Lu’s research interests are in statistical learning methods for health data with complex structure, data integration and Bayesian modeling.
Dr. Kuan Liu is an Assistant Professor of Health Services Research and Biostatistics at the Dalla Lana School of Public Health (DLSPH), University of Toronto. Dr. Liu completed her PhD in biostatistics at the University of Toronto and her MMath in statistics with emphasis in biostatistics at the University of Waterloo. Motivated by applications in clinical and public health research, her areas of methodological interest include causal inference, Bayesian statistics, longitudinal data analysis, and sensitivity analysis. She has collaborated with an extensive network of clinical and public health researchers and worked professionally as a Biostatistician at several research institutions in Canada.
Tuesday, March 25 | 10:30 am – 12:15 pm
T4 | Novel and Easy-to-implement Power and Sample Size Calculations for Generalized Linear Models
Instructors:
Paul Rathouz, University of Texas at Austin
Amy Cochran, University of Wisconsin-Madison
Shijie Yuan, University of Texas at Austin
Course Description:
In this interactive tutorial, participants will study the framing and interpretation of power and sample size (PSS) calculations within the context of generalized linear models (GLMs) using new and more easily interpretable methods developed by our group. Our goal is to simplify the PSS calculation process for GLMs by introducing new general measures of effect (similar in spirit to R-squared and partial R-squared for linear models), which offer a unified interpretation across different types and classes of predictor, adjustor, and response variables. These measures accommodate single- or multiple-degree of freedom (df) tests, allow for general covariate adjustment via notions such as partial R-squared, and eliminate the need for distributional assumptions.
The problem of multiple df tests has been under-attended to in the domain of GLMs; two leading examples are: k-level categorical predictors, which yield (k-1)-df tests; and tests of a continuous predictor allowing for an interaction with another predictor, yielding a 2-df test. Of course, a third example arises when one wishes to avoid the multiple comparisons problem by conducting an omnibus, multiple-df, test of several predictors.
As such, the tutorial will cover three main topics: (1) Eliciting meaningful parameters from non-statistical investigators as inputs to PSS calculations; (2) An overview of the mathematical theory of the new methods from our group; and (3) Specification and interpretation of statistical software implementing these new methods. We will base the tutorial on one or two case studies.
By participating, you will gain a deeper understanding for how to translate study concepts into new general effect measures. Additionally, you will have the opportunity to gain hands-on experience using newly developed R-based software to perform PSS calculations for robust study design with GLMs. Finally, your feedback on the software and methods will be invaluable in further enhancing their usability. This workshop is tailored for statistical practitioners (at either the Master’s or Doctoral levels) who are involved in PSS calculations for study design, and who have some background in generalized linear models.
Statistical/Programming Knowledge Required: Generally, Master's level training in Statistics or Biostatistics. A person with an undergraduate degree in (Bio)Statistics with a strong course in generalized linear models may also benefit.
Instructor Biographies:
Paul Rathouz is Professor of Population Health and Director of the Biomedical Data Science Hub at the Dell Medical School at the University of Texas at Austin. Current areas of methodological interest include multivariate, longitudinal or clustered data, outcome-dependent sampling, and cluster randomized trials. GLMs are a central through-line in much of this work, and the novel PSS methods to be presented in this tutorial were directly motivated through Rathouz’ collaborative activity in clinical and population health biostatistics.
Amy Cochran is Assistant Professor at the University of Wisconsin-Madison, with joint appointments in Mathematics and Population Health Sciences. Her research focuses on mental health and is an NIMH K01 Career Development Awardee and a statistical editor for the American Journal of Psychiatry. Amy holds a PhD in Applied Math from Cornell and was a Hildebrandt postdoctoral fellow at the University of Michigan.
Shijie Yuan is a PhD student in Statistics and Data Sciences at the University of Texas at Austin. He has served as a statistical consultant and software developer, focusing on the design, implementation, and analysis of early-phase clinical trials, including innovative Bayesian Phase I-II designs, supporting successful submissions to the FDA and NMPA across multiple therapeutic areas. Currently, he is focusing on PSS methods for GLMs, continuing his commitment to advancing statistical methodologies in clinical and public health research.
Tuesday, March 25 | 1:45 pm – 3:30 pm
T5 | Introduction to Multi-Omics Data Integration: Concepts and Approaches
Instructors:
Joseph McElroy, The Ohio State University
Lianbo Yu, The Ohio State University
Course Description:
This tutorial offers an introduction to the rapidly evolving field of multi-omics data integration, focusing on key concepts, methods, and tools used to combine and analyze various types of omics data. As technological advances in genomics, transcriptomics, proteomics, and metabolomics continue to accelerate, researchers are increasingly faced with the challenge of integrating these diverse datasets from the same set of biological samples to gain a more complete understanding of complex biological systems. This session will provide participants with an overview of the principles and strategies necessary to meet this challenge.
Participants will be introduced to a range of approaches commonly employed in multi-omics integration. We will explore general data-driven methods, such as factor analysis and dimensionality reduction, which can help simplify the complexity of omics data by identifying key underlying patterns. Additionally, we will discuss various clustering techniques and integrative methods that allow researchers to group and compare data from different omics layers, uncovering shared biological insights across different data types.
This tutorial will also cover some of the most widely used tools and software for multi-omics integration, with a particular focus on R-based packages. These tools provide researchers with practical ways to implement the techniques discussed, enabling them to apply state-of-the-art methods to their own datasets. Whether participants are new to the field or looking to expand their existing knowledge, the session will offer insights into the capabilities and limitations of different multi-omics integration approaches, from simple statistical techniques to more advanced machine learning models.
Throughout the tutorial, we will emphasize the importance of proper data preprocessing, which is a critical step in ensuring the reliability of results when integrating multiple data types. By the end of the session, participants will have a solid foundation in multi-omics data integration, providing them with the tools and knowledge to explore and analyze complex biological systems in their own research.
Statistical/Programming Knowledge Required: Participants will benefit most if they have at least a basic understanding of statistics, ‘omics data analysis, and R programming.
Instructor Biographies:
Dr. Joseph McElroy, Assistant Professor in the Department of Biomedical Informatics and Dr. Lianbo Yu, Associate Professor in the Department of Biomedical Informatics, members of the Center for Biostatistics at The Ohio State University, both have a strong research focus on multi-omics analysis and cancer genomics. Over the course of their careers, they have applied advanced statistical techniques to analyze genomic, transcriptomic, epigenomic, and proteomic datasets, contributing to major insights into cancer biology, metabolic disorders, and complex diseases.
With expertise in biostatistics, Dr. McElroy and Dr. Yu have been involved in leading large-scale, collaborative projects that have uncovered novel biomarkers and contributed to precision medicine initiatives. Their work emphasizes the use of R-based tools and machine learning models to identify actionable biological insights from multi-dimensional datasets.
Tuesday, March 25 | 3:45 pm – 5:30 pm
T6 | Transforming High-Resolution Accelerometry Data for Analysis
Instructors:
John Muschelli, Johns Hopkins Bloomberg School of Public Health
Ciprian Crainiceanu, Johns Hopkins Bloomberg School of Public Health
Course Description:
In this tutorial, we will go through some standard processing steps in order to take raw accelerometer data and turn it into an analyzable endpoint. All the analysis will be done in R, along with some R packages that call Python using reticulate. A basic or intermediate knowledge of R will be necessary as most of the tutorial will be related to coding and application of functions.
We will focus on a single subject with an ActiGraph gt3x file. The pipeline will cover reading the data in, visualizing the data at a day level, performing gravity correction, resampling data, estimating non-wear time using established methods, and calculating activity measures such as Activity Counts and step counts, and summarizing them at a participant level. We will briefly go over analytic methods for these participant summary level data.
Materials will be given and will be made available after the Tutorial.
Statistical/Programming Knowledge Required: Basic/Intermediate R experience
Instructor Biographies:
Dr. John Muschelli is an expert R programmer and applied data analyst. He has authored a number of packages related to complex data, such as neuroimaging and accelerometry, and has co-led the development of Neuroconductor, a repository for imaging-related R packages. He has done other tutorials on data processing and analysis, such as Imaging in R (https://johnmuschelli.com/imaging_in_r/).
Dr. Ciprian Crainiceanu is a biostatistical expert in functional and neuroimaging data with methodological expertise in longitudinal, nonparametric, measurement error and Bayesian analysis. His focus is on two major areas of research: wearable biosensors and neuroimaging. I am one of the leaders in the design of experiments and Biostatistical analysis for very large wearable biosensor data including electroencephalograms (EEG), accelerometry, and heart monitors. He is a lead author on the "Functional Data Analysis with R" book (2024).