Short Courses are offered as full- or half-day courses. The extended length of these courses allows attendees to obtain an in-depth understanding of the topic. These courses often integrate seminar lectures covering foundational concepts with hands-on lab sessions that allow users to implement these concepts into practice.
Sunday, March 19 | 8:00 am – 5:00 pm
SC1 | Introduction to Bayesian Nonparametric Methods for Causal Inference
Instructors:
Michael Daniels, University of Florida
Antonio Linero, University of Texas at Austin
Jason Roy, Rutgers University
Course Description:
Bayesian nonparametric (BNP) methods can be used to flexibly model joint or conditional distributions, as well as functional relationships. These methods, along with causal assumptions, can be used with the g-formula for inference about causal effects. This general approach to causal inference has several possible advantages over popular semiparametric methods, including efficiency gains, the ease of causal inference on any functionals of the distribution of potential outcomes, the use of prior information, and capturing uncertainty about causal assumption via informative prior distributions. In this workshop we review BNP methods and illustrate their use for causal inference in the setting of point treatments, mediation, and semi-competing risks. We present several data examples and discuss software implementation using R. The R code and/or packages used to run the data examples will be provided to the attendees at a specific github site.
Statistical/Programming Knowledge Required:
Statistical knowledge required: grad level probability, inference, and regression. We will have some examples in R, but knowing R isn’t necessary.
Instructor Biographies:
Michael Daniels, ScD is Professor, Andrew Banks Family Endowed Chair, and Chair in the Department of Statistics at the University of Florida. His research over the past 20 years has focused on Bayesian approaches for missing data and causal inference; in terms of the former, he co-authored a research monograph on Bayesian approaches for missing data in 2008.
Antonio Linero, PhD is Assistant Professor in the Department of Statistics and Data Sciences at the University of Texas at Austin. Dr. Linero’s work has broadly focused on developing flexible Bayesian methods for complex longitudinal data, as well as developing Bayesian nonparametric tools for model selection and variable selection.
Jason Roy, PhD, is Professor of Biostatistics and Chair of the Department of Biostatistics and Epidemiology at Rutgers School of Public Health, and Co-Director of the Center for Causal Inference. Dr. Roy’s work has focused on Bayesian nonparametric approaches to causal inference, especially in the area of pharmacoepidemiology. They are co-authors of the forthcoming book “Bayesian Nonparametric Methods for Causal Inference and Missing Data” (Chapman & Hall).
Sunday, March 19 | 8:00 am – 5:00 pm
SC2 | Informative Prior Elicitation Using Historical Data, Expert Opinion, and Other Sources
Instructors:
Joseph G. Ibrahim, University of North Carolina at Chapel Hill
Ethan M. Alt, Harvard Medical School
Course Description:
This full-day short course is designed to give biostatisticians and data scientists a comprehensive overview of informative prior elicitation from historical data, expert opinion, and other data sources, such as real-world data, prior predictions, estimates, and summary statistics. We focus both on Bayesian design and analysis and examples will be presented for several types of applications such as clinical trials, observational studies, environmental studies as well as other areas in biomedical research.
The first part of the course gives a brief but broad overview of Bayesian inference, examining concepts of Bayesian design and analysis such as i) Bayesian type 1 error and power, ii) calculation of posterior and predictive distributions, iii) MCMC sampling methods, iv) fundamental concepts in informative and non-informative prior elicitation, v) Bayesian point and interval estimation, and vi) Bayesian hypothesis testing. These topics will be presented in a general context as well in several contexts in regression settings.
The second part of the course will focus broadly on advanced methods for informative prior elicitation, including i) informative prior elicitation from historical data using the power prior (PP) and its variations including the normalized power prior, the partial borrowing power prior, the asymptotic power prior, and the scale transformed power prior (STRAPP).
In addition, ii) the Bayesian hierarchical model (BHM), commensurate prior, and the robust Meta-analytic Mixture Prior (MAP) will also be examined and the properties and performance of the four priors (BHM, PP, commensurate, robust MAP) will be analytically compared and studied via simulations and real data analyses of case studies. In addition, we will also examine iii) informative prior elicitation from predictions, including the hierarchical prediction prior (HPP), and the Information Matrix (IM) prior. We also examine iv) strategies for informative prior elicitation from expert opinion.
Finally, we discuss (v) synthesis of randomized controlled trial and real-world data using Bayesian nonparametric methods. For (i) – (iv), we will present examples both in the context of Bayesian design and analysis and demonstrate the performance of these prior through several simulation studies and case studies involving real data in the context of linear and generalized linear models, longitudinal data, and survival data. We will also demonstrate the implementation of these priors through the hdbayes and BayesPPD R packages, SAS, Nimble, and Stan.
Statistical/Programming Knowledge Required:
SAS and R
Instructor Biographies:
Dr. Joseph G. Ibrahim, Alumni Distinguished Professor of Biostatistics at the University of North Carolina at Chapel Hill, is principal investigator of two National Institutes of Health (NIH) grants for developing statistical methodology related to cancer, imaging, and genomics research. Dr. Ibrahim is the Director of the Biostatistics Core at UNC Lineberger Comprehensive Cancer Center. He is the biostatistical core leader of a Specialized Program of Research Excellence in breast cancer from NIH. Dr. Ibrahim's areas of research focus are Bayesian inference, missing data problems, cancer, and genomics. He received his PHD in statistics from the University of Minnesota in 1988.
With over 30 years of experience working in cancer clinical trials, Dr. Ibrahim directs the UNC Laboratory for Innovative Clinical Trials (LICT). He is also the Director of Graduate Studies in UNC’s Department o Biostatistics, as well as the Program Director of the cancer genomics training grant in the department. Dr. Ibrahim has published over 350 research papers, most in top statistical journals. He has published graduate-level books on Bayesian survival analysis and Bayesian computation. He teaches courses in Bayesian Statistics, Advanced Statistical Inference, Theory and Applications of Linear and Generalized Linear Models, and Statistical Analysis with Missing Data.
Dr. Ibrahim is a Fellow of the American Statistical Association (ASA), the Institute of Mathematical Statistics (IMS), the International Society of Bayesian Analysis (ISBA), the Royal Statistical Society (RSS), and the International Statistical Institute (IMS). He has given a great many full day and 2 day short courses in the past at ENAR, JSM, WNAR, and at pharmaceutical companies on several topics including Introduction to Bayesian methods, missing data, joint models for longitudinal and survival data, Bayesian clinical trial design, longitudinal data, Bayesian survival analysis, and cure rate models.
Dr. Ethan Alt is an Instructor of Medicine in Biostatistics at Harvard Medical School and an Investigator in the Division of Pharmacoepidemiology and Pharmacoeconomics at Brigham and Women’s Hospital. His research interests include informative prior elicitation, incorporation of historical data, and Bayesian methods for the design and analysis of clinical trials. He received his PhD from the University of North Carolina at Chapel Hill. He is the author of several R packages including bayescopulareg, bmabasket, and hdbayes.
Sunday, March 19 | 8:00 am – 5:00 pm
SC3 | Targeted Learning: Advanced Methods for Causal Machine Learning
Instructors:
Mark van der Laan, Division of Biostatistics, University of California at Berkeley
Alan Hubbard, Division of Biostatistics, University of California at Berkeley
Nima Hejazi, Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University
Rachael V. Phillips, Division of Biostatistics, University of California at Berkeley
Ivana Malenica, Department of Statistics, Harvard University
Course Description:
This full-day workshop provides a comprehensive introduction to the field of Targeted Learning, at the intersection of causal inference and machine learning, and its accompanying tlverse software ecosystem (https://github.com/tlverse). Focus will be on targeted minimum loss estimators of causal effects, in particular of sophisticated intervention regimes (dynamic, optimal dynamic, stochastic), heterogeneous treatment effects, and ensemble machine learning of complex functionals (multinomial probabilities, conditional densities). The robust, efficient plug-in estimators that will be introduced leverage state-of-the-art, ensemble machine learning tools in order to flexibly adjust for confounding while yielding valid statistical inference. This course will be of interest to both statistical and applied scientists engaged in biomedical/health studies, whether experimental or observational, who wish to apply cutting-edge statistical and causal inference methodology to rigorously formalizing and answering substantive questions. This workshop incorporates interactive discussions and hands-on, guided R programming exercises, allowing participants to familiarize themselves with methodology and tools that translate to real-world data analysis.
Statistical/Programming Knowledge Required:
Participants are highly recommended to have had prior training in basic statistical concepts (e.g., confounding, probability distributions, hypothesis testing and confidence intervals, regression). Advanced knowledge of mathematical statistics is useful but not necessary. Familiarity with the R programming language is essential.
Instructor Biographies:
Mark van der Laan is the Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at the University of California, Berkeley. He has made contributions to survival analysis, semiparametric statistics, multiple testing, and causal inference. He also developed the targeted maximum likelihood methodology and general theory for super-learning. He is a founding editor of the Journal of Causal Inference and International Journal of Biostatistics. He has authored 4 books on targeted learning, censored data and multiple testing, authored over 300 publications, and graduated over 50 PhD students. He received the COPSS Presidents' Award in 2005, the Mortimer Spiegelman Award in 2004, and the van Dantzig Award in 2005.
Alan Hubbard, Professor and Head of Biostatistics, Univ. of California, Berkeley, co-director of the Center of Targeted Learning, and head of the computational biology core of the SuperFund Center at UC Berkeley (NIH/EPA), as well a consulting statistician on several federally funded and foundation projects. He has worked as well on projects ranging from molecular biology of aging, epidemiology, and infectious disease modeling, but most all of his work has focused on semi-parametric estimation in high-dimensional data. His current methods-research focuses on precision medicine, variable importance, statistical inference for data-adaptive parameters, and statistical software implementing targeted learning methods. Currently working in several areas of applied research, including early childhood development in developing countries, environmental genomics and comparative effectiveness research. He has most recently concentrated on using complex patient data for better prediction for acute trauma patients.
Nima Hejazi, PhD, is an Assistant Professor of Biostatistics at the Harvard T.H. Chan School of Public Health. He recently completed an NSF Mathematical Sciences Postdoctoral Research Fellowship, and, prior to this, obtained his PhD in Biostatistics from UC Berkeley. He has been on the founding core development team of the tlverse project (https://github.com/tlverse), an extensible software ecosystem for targeted learning, and, since 2020, has collaborated very closely with the Vaccine and Infectious Disease Division of the Fred Hutchinson Cancer Center as a core member of the US Government Immune Correlates Biostatistical Analysis Team of the NIAID-funded COVID-19 Prevention Network. Nima's research interests combine causal inference and machine learning, driven by the aim of developing assumption-lean statistical procedures tailored for efficient and robust inference about scientifically informative parameters. He is particularly motivated by methodological issues stemming from robust non/semi-parametric inference, high-dimensional inference, targeted loss-based estimation, and biased sampling designs, usually tied to applications from clinical trials or computational biology and especially as related to scientific issues concerning vaccine efficacy evaluation, infectious disease epidemiology, and immunology.
Rachael Phillips is a PhD candidate in biostatistics, advised by Alan Hubbard and Mark van der Laan. She has an MA in Biostatistics, BS in Biology, and BA in Mathematics. As a student of targeted learning, Rachael integrates causal inference, machine learning, and statistical theory to answer causal questions with statistical confidence. She is motivated by issues arising in healthcare and is especially interested in clinical algorithm frameworks and guidelines. Related to this, she is also interested in experimental design; human-computer interaction; statistical analysis pre-specification, automation, and reproducibility; and open-source software. Rachael is a researcher for the Center for Targeted Machine Learning and Causal Inference (https://ctml.berkeley.edu); actively collaborates with oncologists and anesthesiologists at UC San Francisco; and throughout her PhD studies, has worked closely with Dr. Susan Gruber and the U.S. Food and Drug Administration (FDA) in the Center for Drug Evaluation and Research (CDER).
Ivana Malenica (https://imalenica.github.io/) is a Postdoctoral Researcher in the Department of Statistics (https://statistics.fas.harvard.edu/) at Harvard and a Wojcicki and Troper Data Science Fellow at the Harvard Data Science Initiative (https://datascience.harvard.edu/). She obtained her PhD in Biostatistics at UC Berkeley working with Mark van der Laan, where she was a Berkeley Institute for Data Science Fellow and a NIH Biomedical Big Data Fellow. Her research interests span non/semi-parametric theory, causal inference and machine learning, with emphasis on personalized health and dependent settings. Most of her current work involves causal inference with time and network dependence, online learning, optimal individualized treatment, reinforcement learning, and adaptive sequential designs.
Sunday, March 19 | 8:00 am – 12:00 pm
SC4 | Applied Causal Inference for Real-World Observational Data
Instructor:
Michele Santacatterina, Division of Biostatistics, Department of Population Health, NYU Grossman School of Medicine
Course Description:
Evidence-based medicine requires investigators to include the best available evidence into their decision-making process. The best evidence regarding the causal effect of an intervention or a treatment can be provided by properly conducted randomized clinical trials. Randomized trials, however, can be costly, infeasible, or unethical. In addition, results from trials may suffer from selection bias, because participants may not be representative of the real-world population. Data from, for example, electronic health records are more representative of real-world practice. However, despite their potential, they are observational, where confounding bias arises due to the presence of factors related to both the intervention and the outcome under study. In the last few decades, many techniques have been developed to estimate causal effects from observational real-world data. In this short course, I will introduce techniques to identify and estimate causal effects from observational real-world data. Specifically, I will explain how to identify common causal parameters such as the average treatment effect (ATE), using Direct Acyclic Graphs, the potential outcome framework and identification assumptions. I will then introduce statistical methods to estimate such parameters, such as regression adjustment, inverse probability weighting, matching, covariate balancing methods, and doubly robust estimators. In addition to ATE, I will describe how to estimate other causal estimands of interest such as the risk ratio, the quantile treatment effect and the restricted mean survival. R and RStudio will be used together with synthetic and real-world data.
Causal inference is an essential research topic in the statistical, medical, epidemiological and social sciences. After this short course you will be able to identify, estimate and compute causal effects using observational data and open-source statistical software and code, thus improving your research and decision-making skills.
Statistical/Programming Knowledge Required:
Basic knowledge of statistical inference and R/RStudio
Instructor Biography:
Michele Santacatterina is an assistant professor in biostatistics at the division of biostatistics at New York University Grossman School of Medicine. Before joining NYU, he was an assistant research professor of biostatistics and bioinformatics at George Washington University. He received a Ph.D. in biostatistics from Karolinska Institutet, Sweden, in April 2018 and completed a postdoctoral fellowship at the Cornell TRIPODS Center for Data Science and Cornell Tech in August 2020. His research focuses on the development and applications of statistical and data science methods for optimal decision making and causal inference using real-world and experimental data. Michele’s personal website: https://michelesantacatterina.github.io/
Sunday, March 19 | 8:00 am – 12:00 pm
SC5 | Improving Precision and Power in Randomized Trials by Leveraging Baseline Variables
Instructors:
Michael Rosenblum, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Kelly Van Lancker, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health; Department of Applied Mathematics, Computer Science and Statistics, Ghent University
Josh Betz, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Course Description:
In May 2021, the U.S. Food and Drug Administration (FDA) released a revised draft guidance for industry on “Adjustment for Covariates in Randomized Clinical Trials for Drugs and Biological Products”. Covariate adjustment is a statistical analysis method for improving precision and power in clinical trials by adjusting for pre-specified, prognostic baseline variables. Here, the term “covariates” refers to baseline variables, that is, variables that are measured before randomization such as age, gender, BMI, comorbidities. The resulting sample size reductions can lead to substantial cost savings, and also can lead to more ethical trials since they avoid exposing more participants than necessary to experimental treatments. Though covariate adjustment is recommended by the FDA and the European Medicines Agency (EMA), many trials do not exploit the available information in baseline variables or only make use of the baseline measurement of the outcome.
In Part 1, we introduce the concept of covariate adjustment. In particular, we explain what covariate adjustment is, how it works, when it may be useful to apply, and how to implement it (in a preplanned way that is robust to model misspecification) for a variety of scenarios.
In Part 2, we present a new statistical method that enables us to easily combine covariate adjustment with group sequential designs. The result will be faster, more efficient trials for many disease areas, without sacrificing validity or power. This approach can lead to faster trials even when the experimental treatment is ineffective; this may be more ethical in settings where it is desirable to stop as early as possible to avoid unnecessary exposure to side effects.
In Part 3, we demonstrate the impact of covariate adjustment using completed trial data sets in multiple disease areas. We provide step-by-step, clear documentation of how to apply the software in each setting. Participants will have the time to apply the software tools on the different datasets in small groups.
Statistical/Programming Knowledge Required:
Participants should be familiar with the following concepts: Type I error, power, bias and variance. Knowledge of R would be beneficial.
Instructor Biographies:
Michael Rosenblum is a Professor of Biostatistics at Johns Hopkins Bloomberg School of Public Health. His research is in causal inference with a focus on developing new statistical methods and software for the design and analysis of randomized trials, with clinical applications in HIV, Alzheimer’s disease, stroke, and cardiac resynchronization devices. He is funded by the Johns Hopkins Center for Excellence in Regulatory Science and Innovation for the project: “Statistical methods to improve precision and reduce the required sample size in many phase 2 and 3 clinical trials, including COVID-19 trials, by covariate adjustment.”
Kelly Van Lancker is a postdoctoral researcher in the Biostatistics Department of the Johns Hopkins Bloomberg School of Public Health and Ghent University (Belgium). She has obtained a PhD in statistics from Ghent University. Her primary research interests are the use of causal inference methods and in particular covariate adjustment in clinical trials.
Josh Betz is an Assistant Scientist in the Biostatistics department of the Johns Hopkins Bloomberg School of Public Health, and part of the Johns Hopkins Biostatistics Center. His research includes the design, monitoring, and analysis of randomized trials in practice and developing software to assist with randomized trial design.
Sunday, March 19 | 1:00 pm – 5:00 pm
SC6 | An Introduction to Julia for Biostatistics
Instructors:
Saunak Sen, Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center
Gregory Farage, Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center
Course Description:
Julia is an open-source programming language for scientific computing that offers several attractive features for data science. It offers the prototyping simplicity of an interpreted language such as R or Python with the speed of compiled languages such as C/C++. It has strong support for visualization, interactive graphics, machine learning, and parallel computing.
The short course will begin with the basics of getting started with Julia using the terminal and an IDE (integrated development environment). We will present Julia's language design and features comparing it with other languages. We will demonstrate how to install/uninstall packages and use commonly-used packages. We will then show basic data science tasks such as manipulating tabular data, statistical tests, regression, graphics, report generation, and connecting to R/Python/C libraries. Depending on student interest, we will cover more advanced tasks such as machine learning, parallel computing, or package development. Students will have the opportunity to get hands-on experience in Julia programming via examples and small exercises related to data science and scientific computing.
Statistical/Programming Knowledge Required:
No prior experience with Julia is required. Prior programming experience in a language such as R, SAS, Stata, MATLAB, or Python is required. We will assume that participants have statistical knowledge equivalent to a master's degree in statistics or biostatistics.
Instructor Biographies:
Gregory Farage is a postdoctoral scholar in the Division of Biostatistics, Department of Preventive Medicine at the University of Tennessee Health Science Center (UTHSC). He completed his Ph.D. in Remote Sensing, focusing on image processing, from the University of Sherbrooke in Canada. His doctoral work focused on polarimetric RADAR noise reduction based on multiscale approximation analysis. During his postdoc, Dr. Farage specialized in physical activity data analysis and other structured high-dimensional measures; he also collaborated on several multivariate statistical tools in Julia programming language. At UTHSC, Gregory served as a teaching assistant for graduate-level courses teaching R in biostatistics. He also performed webinars about introducing R and Julia. His research interests include high-dimensional data processing, sensor data analysis, causal inference, image processing, and information fusion.
Saunak Sen is Professor and Chief of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, in Memphis, TN. He develops statistical methods for understanding biological systems using genetic variation and high-dimensional data. Current interests include developing statistical approaches for matrix-valued high-throughput data, computational methods for large-scale linear mixed models, and statistical computing using the Julia programming language. His group uses several complementary approaches including bilinear models, penalized regression, multivariate kernel regression, matrix factorization, and gradient-based optimization techniques. The group has developed a number of Julia software packages: MatrixLM/MatrixLMnet (penalized matrix linear models), FlxQTL (multivariate linear mixed models for genetic mapping), LiteQTL (real-time eQTL mapping), and GeneNetworkAPI (interface to GeneNetwork database and computational tools). Dr. Sen obtained his PhD in statistics from the University of Chicago, and did postdoctoral work at Stanford University and the Jackson Laboratory. After 12 years on the faculty at the University of California San Francisco, he joined UTHSC in his current position.
Sunday, March 19 | 1:00 pm – 5:00 pm
SC7 | Semi-competing Risks: Accounting for Death as a Competing Risk in Public Health Research When the Outcome of Interest is Non-terminal
Instructors:
Sebastien Haneuse, Professor and Director of Graduate Studies, Department of Biostatistics, Harvard T.H. Chan School of Public Health
Harrison Reeder, Instructor of Investigation, Massachusetts General Hospital Biostatistics Center; Instructor in Medicine, Harvard Medical School
Course Description:
The short course will provide an overview of semi-competing risks data analysis. Briefly, semi-competing risks refers to the setting where primary interest lies in some non-terminal event, the occurrence of which is subject to a terminal event. Although not as well-known as standard competing risks, semi-competing risks arise in any study of any event that is not mortality but where the force of mortality is strong. Examples include: Alzheimer’s disease in the elderly; quality of end-of-life care among patients with a terminal cancer diagnosis; graft-versus-host disease among bone marrow transplant recipients; and, developmental outcomes among infants admitted to a NICU. Semi-competing risks also arise in some settings where the terminal event is not mortality, including those of preeclampsia where “delivery” is a competing risk but not vice-versa. In this workshop, we will cover basic concepts of semi-competing risks, various modeling strategies, methods for prediction, and software. In addition, we will apply and illustrate the methods to a study of preeclampsia using data from the Beth Israel Deaconess Medical Center, in Boston, MA, specifically with the goals of quantifying risk factor associations and the joint prediction of preeclampsia and delivery.
Statistical/Programming Knowledge Required:
At a minimum familiarity with standard notions/methods in time-to-event data analysis (e.g. censoring and hazard functions) is required. Additionally, familiarity with concepts related to the analysis of longitudinal or cluster-correlated data (e.g., dependence and, say, mixed effects models) would be helpful. All programming will be in R.
Instructor Biographies:
Sebastien Haneuse, PhD, is Professor and Director of Graduate Studies in the Department of Biostatistics at the Harvard T.H. Chan School of Public Health. His research agenda has major threads: (i) design-based approaches to mitigate data limitations in resource-limited settings; (ii) methods for complex missing data in EHR-based studies; and, (iii) novel models and statistical methods for the analysis of semi-competing risks data. From a substantive perspective, he has collaborated in a broad range of scientific domains, including: breast cancer screening, Alzheimer’s’ disease and dementia, HIV/AIDS, bariatric surgery, diabetes, childhood and adult obesity, LGBQT health, and hospital profiling.
Harrison Reeder, PhD, is Instructor of Investigation at the Massachusetts General Hospital Biostatistics Center and Instructor of Medicine at Harvard Medical School. His research focuses on outcome prediction and tools for decision-making in the survival analysis setting of semi-competing risks. In particular, his recent work focuses on predicting individual patients’ prospective joint risk over time of experiencing one or both outcomes. His other research areas of interest include Bayesian survival analysis methods, longitudinal data analysis, and a wide range of applications including preeclampsia during pregnancy, cognitive outcomes among the elderly, and outcomes among heart failure patients receiving implantable cardioverter defibrillators.