ENAR 2026 Spring Meeting: Presidential Invited Address

Presidential Invited Debate

Motion: AI Alone Is Not Enough: Advancing EHR Research Demands Statistical Rigor

Empowering Electronic Health Records Research with AI tools Requires More Statistical Thinking

Tianxi Cai
Harvard T.H. Chan School of Public Health

Harnessing the power of artificial intelligence (AI) in electronic health records (EHR) research unlocks transformative opportunities for translational science, particularly in the realms of precision medicine and optimizing healthcare delivery. AI tools—especially large language models (LLMs)—have demonstrated remarkable capabilities in extracting, summarizing, and predicting clinical information from the vast and complex data within EHRs, particularly from unstructured clinical notes. However, to fully realize the promise of AI in this context, a foundational integration of statistical thinking is indispensable.

Despite their utility, AI models often produce results that lack robustness and transparency. Issues such as hallucinated outputs, limited generalizability across healthcare systems, and insufficient quantification of uncertainty undermine their reliability. These limitations are exacerbated by real-world data challenges: missing or misclassified information, underrepresentation of rare diseases and small subgroups, unmeasured confounding in observational analyses, difficulties in transporting models across different patient populations, and degradation over time.

Statistical methodologies are critical for identifying and mitigating these challenges. Techniques from causal inference can help distinguish correlation from causation in non-randomized settings, while methods for uncertainty quantification can provide confidence in model outputs. Robust transfer learning algorithms can improve transportability across systems and over temporal periods. Hybrid frameworks that combine AI algorithms with rigorous statistical validation, sensitivity analyses, and bias correction strategies can improve both the interpretability and trustworthiness of AI-driven findings. Another practical barrier is the substantial computational cost of training and deploying sophisticated AI models, which may limit scalability or accessibility. Therefore, efficient model design and evaluation—guided by statisticians—are essential to balance performance with feasibility.

As AI becomes increasingly embedded in EHR-based research and clinical workflows, statisticians must take an active leadership role. Their expertise is crucial not only in developing robust analytical pipelines but also in ensuring that AI-driven insights are clinically valid, ethically sound, and ultimately actionable in real-world healthcare settings.

Biography
Tianxi Cai is the John Rock Professor of Population and Translational Data Science at the Harvard T.H. Chan School of Public Health and a Professor of Biomedical Informatics at Harvard Medical School. She co-directs the VERITY Bioinformatics Core at Brigham and Women’s Hospital and the Applied Bioinformatics Core at the Veterans Health Administration. As the founding director of the Translational Data Science Center for a Learning Health System and director of the Big Data Analytics Core at HMS, she leads efforts to provide statistical and biomedical informatics support to both the Harvard research community and external partners, including the VA and industry. Dr. Cai’s research focuses on developing innovative statistical and machine learning methods—such as semi-supervised learning, high-dimensional inference, robust transfer learning, graphical models, and federated learning—to address challenges in analyzing large-scale, multi-institutional biomedical data. Her team has successfully built scalable tools for extracting real-world evidence, performing text mining, and enabling predictive analytics using diverse sources including electronic health records, genomics, cohort studies, and clinical trials.

Leveraging AI Tools in EHR Research Needs a Village - Not Just a Statistician

Marylyn Ritchie
University of Pennsylvania

As AI tools become increasingly integral to Electronic Health Records (EHR) research, the assertion that success hinges primarily on more statistical thinking oversimplifies the challenge. While statistical rigor is essential, it is only one piece of a much broader puzzle. Real-world EHR data are complex, heterogeneous, and often messy—posing challenges that go beyond what traditional statistical frameworks alone can address.

Effective use of AI in this domain requires a multidisciplinary lens. Ethical considerations must be front and center, especially around bias, fairness, and the responsible use of sensitive health data. Feasibility and implementation science are also critical: building deployable, scalable, and sustainable AI systems in real-world clinical settings involves socio-technical factors that statistics alone do not capture. Additionally, expertise in medical informatics and bioinformatics plays a crucial role in understanding how to design, structure, and integrate data and algorithms within health systems.

By framing the conversation solely around statistical thinking, we risk neglecting the importance of interpretability, clinical relevance, interoperability, and usability. AI in EHR research should be seen as a collaborative ecosystem—one that requires not just quantitative insight, but also contextual understanding, ethical foresight, and translational pathways to impact. In this talk, I argue that advancing AI in EHR research depends not on more statistical thinking alone, but on a deliberate and inclusive integration of multiple domains. Only by embracing this holistic approach can we ensure that AI truly empowers meaningful, equitable, and actionable insights in healthcare.

Biography
Dr. Marylyn D. Ritchie is the Vice Dean of Artificial Intelligence and Computing at the University of Pennsylvania, Perelman School of Medicine. Dr. Ritchie is also the Edward Rose, MD and Elizabeth Kirk Rose, MD Professor of Genetics, Director of the Institute for Biomedical Informatics, Director of the Division of Informatics in DBEI, Co-Director of the Penn Medicine BioBank, and Vice President of Research Informatics in the University of Pennsylvania Health System. Dr. Ritchie is an expert in translational bioinformatics, with a focus on developing, applying, and disseminating algorithms, methods, and tools integrating electronic health records (EHR) with genomics. Dr. Ritchie has over 20 years of experience in translational bioinformatics and has authored over 500 publications. Dr. Ritchie was appointed as a Fellow of the American College of Medical Informatics (ACMI) in 2020. Dr. Ritchie was elected as a member of the National Academy of Medicine in 2021.