The application of synthetic data generation (SDG) has been increasing over the recent part. With SDG a machine learning model is trained on existing real data, and it learns the patterns in that data. The model is then used to generate new data that preserve the learned patterns. There are multiple use cases for SDG: privacy, de-biasing, and augmentation.
In this presentation to la communauté de pratique DALIAS, Dr. Khaled El Emam provides an overview of SDG, how to evaluate the privacy vulnerabilities in synthetic data, methods for assessing the utility of synthetic data, as well as applications for de-biasing real-world datasets (RWD) and augmentation in clinical trials and RWD. Specific examples and a summary of evidence are reviewed.
Presenter Biography
Dr. Khaled El Emam is the Canada Research Chair (Tier 1) in Medical AI at the University of Ottawa, where he is a Professor in the School of Epidemiology and Public Health. He is also a Senior Scientist at the Children’s Hospital of Eastern Ontario Research Institute, and Scholar-in-Residence at the Office of the Information and Privacy Commissioner of Ontario (IPC). Khaled has founded or co-founded six product and services companies involved with data management and data analytics, with some having successful exits. Prior to his academic roles, he was a Senior Research Officer at the National Research Council of Canada. He also served as the head of the Quantitative Methods Group at the Fraunhofer Institute in Kaiserslautern, Germany. He has a PhD from the Department of Electrical and Electronics Engineering, King’s College, at the University of London, England.