March 10-13, 2024

ENAR 2024 Educational Program | TUTORIALS

Tutorials are roughly 2 hours in length and focus on a particular topic or software package. The sessions are more interactive than a standard lecture, often encouraging participants active engagement and hands-on participation.


Monday, March 11 | 8:30 am – 10:15 am
T1 | Teaching RNA-velocity for Single-cell Trajectory Analysis and Discussing its Future Research Directions

Kevin Lin, PhD, Department of Biostatistics, University of Washington

Course Description:

Over the past decade, single-cell analyses in developmental biology have garnered tremendous interest from the bioinformatics, mathematics, and statistics communities. RNA velocity is a concept at the forefront of this area and has led to novel scientific findings regarding how cells develop and commit to various fates. These findings are made possible by marrying statistical ideas with core bioinformatics and cell biology concepts. However, this combination of various disciplines can intimidate many new researchers from entering this topic.

In this tutorial, we will introduce participants to the area of RNA velocity in a way that distills its complexity into its key ingredients and discusses its branching directions of ongoing research. The tutorial begins with a gentle introduction to single-cell data and the underlying bioinformatics principles of RNA velocity. Then, participants learn the statistical backbone of RNA velocity, experience coding demos demonstrating this method, and see what biological insights it unlocks. Finally, the tutorial discusses how current work has extended RNA velocity in bioinformatics, computational, statistical, or laboratory-technology-driven directions.

The course teaches participants about RNA velocity, a topic combining ideas in bioinformatics, dynamical systems, and statistics. Participants interested in biological research will conclude the tutorial with a concrete roadmap to assess if RNA velocity applies to their own research and, if so, how to deploy it on their data. Participants interested in bioinformatics research will conclude the tutorial by seeing how recent work has successfully extended RNA velocity by leveraging other omics, or new laboratory technology and computational tools. Participants interested in statistics research will conclude the tutorial by seeing which aspects of RNA velocity could benefit from statistical inference tools that currently do not exist.

Statistical/Programming Knowledge Required:
Basic familiarity with R and Python; and basic understanding of ordinary differential equations, the EM algorithm, manifold learning, and deep learning are desired.

Instructor Biography:

Dr. Kevin Lin’s work aims to advance single-cell biology by developing new statistical methods (such as network and dimension-reduction methods) and has previously studied the developing brain to investigate autism spectrum or cancer cells to investigate the mechanisms of therapy resistance. Lin received his PhD from the Department of Statistics and Data Science at Carnegie Mellon University and completed his postdoc at the University of Pennsylvania.

Monday, March 11 | 10:30 am – 12:15 pm
T2 | Statistical Methods for Neighborhood and Community Focused Impacts in Health (In)Equity Research

Loni Philip Tabb, PhD, Dornsife School of Public Health, Drexel University

Course Description:

Understanding the impact of neighborhoods and communities on health and social inequities continue to present as a public health challenge for researchers, policy makers, and (bio)statisticians alike. There is a myriad of statistical approaches that aim to truly understand the intersection of health and place and time – with an emphasis on statistical implications. This tutorial will focus on highlighting spatial and spatio-temporal methods for multi-level settings, where both frequentist and Bayesian statistical frameworks will be discussed. Empirical evidence of these various methods will be presented as motivating examples – many of which will focus on health and social inequities across the US. The intended audience for this tutorial includes graduate students, post-docs, and researchers interested in understanding more the statistical implications of incorporating a neighborhood and community based focused and understanding their impacts on health outcomes of interest, in addition to traditional individual-level risk factors.

Statistical/Programming Knowledge Required:
Completion of Graduate level linear and generalized linear modeling courses.

Instructor Biography:

Dr. Loni Philip Tabb is an Associate Professor of Biostatistics at Drexel University within the Department of Epidemiology and Biostatistics, Graduate Biostatistics Program Director, and Co-Lead of the Research and Data Core at the Urban Health Collaborative. Dr. Tabb has been on faculty at Drexel since 2009, and obtained tenure in 2017 – she was recently appointed Associate Dean of Faculty Affairs in 2023. Her research efforts focus on methods development and applications around the intersection of health and place, specifically as it applies to cardiovascular health. In particular, she uses novel spatial and spatio-temporal statistical methods to examine the local and national geographic patterns of Black-White inequities in this country – with an eye towards solutions at eliminating these inequities. Dr. Tabb has taught and developed a number of courses including Bayesian Data Analysis, Advanced Statistical Computing, and Linear Statistical Models. Her contributions in the field more broadly include leading efforts around the Fostering Diversity in Biostatistics Workshop (held annually at ENAR), teaching short courses at conferences like the Joint Statistical Meetings, and was the Inaugural Recipient of the Annie T. Randall Innovator Award through the ASA in 2021.

Monday, March 11 | 1:45 pm – 3:30 pm
T3 | An Introductory Tutorial on Structural Equation Modeling in R with Lavaan

Douglas D. Gunzler, PhD, Center for Health Care Research & Policy, Case Western Reserve University at MetroHealth Medical Center

Course Description:

Structural equation modeling (SEM) is a multivariate technique that allows relationships among variables to be examined. SEM is often used in practice to model and test hypothesized causal relationships among observed and latent (unobserved) variables, including in analysis across groups. It can be viewed as the merging of a conceptual model, path diagram, confirmatory factor analysis and path analysis. SEM emphasizes model evaluation.

While there are specialized commercial software for SEM, R statistical programming language is widely accessible and includes Lavaan. Participants will experience a mixture of lecture, discussion and hands-on participation. We will introduce basic concepts, theory and SEM vocabulary, give real-world examples, and conduct sample analyses using Lavaan. While no knowledge of SEM is required, a fundamental understanding of regression analysis and R is recommended for participants taking this tutorial.

Since SEM is an extremely broad set of methods, only foundational topics will be introduced in this tutorial. This tutorial is divided into three parts to introduce the basic vocabulary, concepts and usages of SEM to course participants, present the underlying statistical theory and then provide applications with R.

Statistical/Programming Knowledge Required:
While no knowledge of structural equation modeling is required, a fundamental understanding of regression analysis and R is recommended for participants taking this tutorial.

Instructor Biography:

Dr. Douglas Gunzler is a tenured Associate Professor of Medicine and Population and Quantitative Health Sciences in the Population Health and Equity Research Institute at the Center for Health Care Research and Policy, MetroHealth at Case Western Reserve University. He is the lead author of “Structural Equation Modeling for Health and Medicine” Chapman & Hall/CRC Biostatistics Series (publication date March 2021). He is a Biostatistician with specialties in structural equation modeling (SEM) and longitudinal data analysis. His research interests lie in the areas of psychometrics, factor analysis, mixture modeling, mediation analysis, age-period-cohort analysis and their application to both clinical trials and observational studies in health and medicine. In his research, he is advancing the use and interpretation of patient-reported outcomes. Dr. Gunzler received his PhD from the Department of Biostatistics & Computational Biology at the University of Rochester in 2011. He is the Chair 2024 for the Mental Health Statistics Section.

Monday, March 11 | 3:45 pm – 5:30 pm
CANCELLED T4 | Improving Data-Preprocessing and Machine Learning Workflows with Scikit-Learn’s ColumnTransformer and Pipeline Classes

Gul Inan, Department of Mathematics, Istanbul Technical University

Course Description:

Python’s scikit-learn is one of the commonly used machine learning libraries which provide a comprehensive set of tools for building and deploying several supervised and unsupervised machine learning algorithms. Among its many features, scikit-learn offers the ColumnTransformer and Pipelines classes, which are powerful tools for data pre-processing and organization of multiple steps in machine learning workflows.

The ColumnTransformer class in scikit-learn provides a flexible mechanism for applying different pre-processing approaches to specific features of a dataset. This flexibility is particularly useful when dealing with datasets that contain a mixture of numerical and categorical features and feature engineering tasks such as scaling for numerical features and one-hot encoding for categorical features is required.

Complementing the ColumnTransformer, the Pipeline module in scikit-learn allows users to chain multiple pre-processing approaches such as imputation of missing values prior to scaling the numerical features or encoding the categorical features. The Pipeline module further allows users to concatenate these pre-processing steps and machine learning algorithms into a single object. By encapsulating the entire workflow within a pipeline, users are able to apply data pre-processing steps to both training and test data without data leakage and are able to simplify model training and deployment.

In this tutorial, we will first provide an overview of the scikit-learn’s ColumnTransformer and Pipeline classes. We will then discuss how the ColumnTransformer class enables flexible pre-processing of different type of features including scaling numerical features, encoding categorical features, and handling missing values. Lastly, we will explore how the Pipelines class allows for the efficient chaining of pre-processing steps with several machine learning algorithms such as regression, classification, principal component analysis, and K-means clustering.

Statistical/Programming Knowledge Required:
Familiarity with feature pre-processing approaches such as scaling, normalization, one-hot encoding, and imputation. Familiarity with lasso regression, ridge regression, their logistic regression counterparts, principal component analysis, and k-means clustering algorithms. Prior experience with Python’s scikit-learn API and a computational document such as Jupyter Notebook or JupyterLab is required.

Instructor Biography:

Dr. Gul Inan is currently an Assistant Professor in the Department of Mathematics at Istanbul Technical University, Turkey. She obtained her B.Sc., M.Sc., and Ph.D. all in Statistics from Department of Statistics, Middle East Technical University, Turkey. She did post-doctoral research at School of Statistics, University of Minnesota and Department of Biostatistics, University of North Carolina-Chapel Hill. She is interested in teaching “Statistical Learning with Python” at undergraduate and graduate level and developing statistical learning algorithms and/or computational tools to analyze high-dimensional data.

Tuesday, March 12 | 1:45 pm – 3:30 pm
T5 | Mentoring in Statistical Writing: Tools for Teaching and Energizing the Statistical Writing of your Mentees

Nicole Dalzell, PhD, Department of Statistical Sciences at Wake Forest University
Tanya Garcia, PhD, Department of Biostatistics and a Provost Distinguished Leader at UNC Chapel Hill

Course Description:

There are many different mentoring roles that involve teaching mentees to write or refine scientific writing skills, but individuals with training in statistics may not have training in teaching others to write. The goal of the tutorial is to provide guidance, practical tips, and teaching tools for mentoring in statistical writing. We will also discuss ways to incorporate mentoring and teaching about scientific writing in statistics into your work with mentees in a sustainable way.

In this tutorial, we will:

Statistical/Programming Knowledge Required:

Instructor Biographies:

Dr. Nicole Dalzell is an Associate Teaching Professor in the Department of Statistical Sciences at Wake Forest University. She earned her Ph.D. in Statistics from Duke University, where she developed methods for record linkage with error prone linking variables. Currently, her work focuses on developing new pedagogical tools and techniques to help undergraduate and graduate students hone their statistical communication skills, as well as creating tools to help educators teach statistical writing. She has published in peer-reviewed statistical and data science education journals, and given presentations on teaching statistical writing, as well as creating inclusive classroom environments with specific focus on supporting students with disabilities and learning differences.

Dr. Tanya Garcia is an Associate Professor in the Department of Biostatistics and a Provost Distinguished Leader at UNC Chapel Hill. Dr. Garcia is passionate about training the next generation of (bio)statisticians to confidently develop statistical methods and communicate those methods in a clear and simple way. How she mentors this next generation is largely motivated by 500+ hours of grantsmanship and leadership training. She teaches her mentees to embrace a growth mindset and tackle obstacles without judgment or fear. Her desire for every mentee to achieve success and fulfillment drives her every leadership decision. These decisions have led Dr. Garcia to not only earn multiple grants as Principal Investigator from the National Institutes of Health, but also help other trainees and faculty members win their own grants from the National Institutes of Health, the National Science Foundation, and other non-profit organizations. Dr. Garcia was recently awarded the 2022 Carolina Women’s Leadership Council Faculty Mentoring Award and outside of UNC, she leads initiatives for national statistics organizations that foster the success of underrepresented and early career (bio)statisticians.

Tuesday, March 12 | 3:45 pm – 5:30 pm
T6 | Introduction to MSM with Applications in Drug Development

David A. James, Novartis
Sophie Sun, Novartis
Meng Cao, Novartis
Alex Ocampo, Novartis Pharma AG

Course Description:

Multistate models (MSMs) are statistical models that are used to analyze complex and dynamic processes in which individuals or objects transition between multiple states over time. They can be viewed as an extension to traditional survival analysis where only one event is of interest (e.g. death or progression). The models allow for the estimation of transition probabilities between states and can be used to evaluate the effects of interventions or covariates on these probabilities. With appropriate set up with states and transitions, the models allow a more comprehensive and in-depth exploration of the dynamics and interdependence of patients’ and product’s characteristics on patients’ trajectories.

In this tutorial we introduce MSMs and describe a workflow using MSM for analyzing patients’ disease states collected from clinical studies. In the first part, we introduce basic MSM concepts such as what is a MSM and what are states, etc. In addition, we illustrate key points about the analyses and design decisions in the estimand framework via several examples from drug development, which highlights the flexibility of MSMs to be applied in a variety of settings. In the second part, we describe analytic examples for applying these techniques to data collected from clinical trials, and they include (1) descriptive techniques that extend Kaplan-Meier survival estimation to multiple state probabilities, (2) semi-parametric and parametric transition models for fitting patients’ trajectories along the state space, and (3), provide interpretation of the results from multistate model, as well as their relationship to the traditional time-to-event models (e.g. how to obtain restricted mean time, probabilities of being at each state on specific time point). Finally, time permitting, we will show R code fitting the MSMs in the case studies.

Statistical/Programming Knowledge Required:
Basic knowledge of survival analysis, including, e.g., KM, log-rank test, and Cox model.

Instructor Biographies:

David A. James is a Fellow of the American Statistical Association and Senior Director in the Advanced Methodology and Data Science group at Novartis. His current interests include time to event modeling, pharmacometrics modeling, exploratory data analysis, statistical computing and visualization.

Sophie Sun received her PhD in Statistics from Purdue University, and she is currently a Data Scientist in Advanced Methodology and Data Science in Novartis Pharmaceutical Corporations. Her group focuses on consulting for clinical trial analysis, development of new methodology, education/training of different statistical/ML topics, and promoting the Good Data Scientist Practice to ensure the integration of varies projects. Her interested topics includes treatment effect heterogeneity, multistate survival analysis, causal inference and external controls.

Meng Cao holds a PhD in Statistics from Colorado State University, USA. She joined Novartis in 2020 after graduation. She supports hematology therapy drug developments at Novartis, such as supporting regulatory approvals, trial designs and post market activities.

Alex Ocampo is a Senior Principal Statistician with Novartis based in Basel, Switzerland. I obtained my Bachelor’s degree in Statistics from the University of Michigan and Ph.D. in Biostatistics from Harvard University in 2020 where my dissertation focused on statistical methods for dealing with missing data when the “Missing at Random” assumption does not hold. My current work at Novartis focuses on causal inference, multistate modeling, and Bayesian inference in pharmaceutical drug development.