As part of ENAR's education initiative, our webinars promote continuing education for professional and student statisticians by disseminating cutting-edge knowledge to our membership. An ENAR webinar (or "webENAR") can strengthen your background in methodology and software, provide an opportunity to learn about a topic outside of your primary area of specialization, or deepen your understanding of an area in which you already work. We invite you to participate and benefit from the expertise of some of North America's leading statisticians and biostatisticians.
The Webinar Committee of the ENAR Regional Advisory Board (RAB) is coordinating this ongoing series of 1- to 2-hour webinars given by renowned experts. Registration fees are by membership category, with a reduced fee for student members. The webinars are planned to be broadly available and we encourage groups at your institution or workplace to participate together. WebENARs provide excellent learning opportunities for students and professionals alike.
Registration fees are determined by membership category.
(Almost) All of Entity Resolution
October 2, 2020
10 a.m. to 12 p.m. Eastern
Rebecca C. Steorts
Assistant Professor, Department of Statistical Science
Rebecca C. Steorts received her B.S. in Mathematics in 2005 from Davidson College, her MS in Mathematical Sciences in 2007 from Clemson University, and her PhD in 2012 from the Department of Statistics at the University of Florida under the supervision of Malay Ghosh, where she was a U.S. Census Dissertation Fellow and was a recipient for Honorable Mention (second place) for the 2012 Leonard J. Savage Thesis Award in Applied Methodology. Rebecca was a Visiting Assistant Professor in 2012--2015, where she worked closely with Stephen E. Fienberg.
Rebecca is currently an Assistant Professor in the Department of Statistical Science at Duke University. She is affiliated faculty in the Departments of Computer Science and Biostatics and Bioinformatics, the information initiative at Duke (iiD), and the Social Science Research Institute.
Rebecca was named to MIT Technology Review's 35 Innovators Under 35 for 2015 as a humanitarian in the field of software. Her work was profiled in the September/October issue of MIT Technology Review and she was recognized with an invited talk at EmTech in November 2015. In addition, Rebecca is a recipient of a NSF CAREER award, a collaborative NSF award, a collaborative grant with the Laboratory of Analytic Sciences (LAS) at NC State University, a Metaknowledge Network Templeton Foundation Grant, the University of Florida (UF) Graduate Alumni Fellowship Award, the U.S. Census Bureau Dissertation Fellowship Award, and the UF Innovation through Institutional Integration Program (I-Cubed) and NSF for development of an introductory Bayesian course for undergraduates. Her research interests are in large scale clustering, record linkage (entity resolution or de-duplication), privacy, network analysis, and machine learning for computational social science applications.
Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as record linkage, de-duplication, or entity resolution. In this article, we review motivational applications and seminal papers that have led to the growth of this area. Specifically, we review the foundational work that began in the 1940's and 50's that have led to modern probabilistic record linkage. We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others. Finally, we discuss current research topics of practical importance.