The Electronic Health Information Laboratory (EHIL) was established by the CHEO Research Institute in collaboration with the University of Ottawa in 2005 by the founding director – Dr. Khaled El Emam. The objective of EHIL is to facilitate the sharing of health information for secondary purposes. These secondary purposes include research, public health, data science, comparative effectiveness evaluation, and pharmacovigilance.
The digitization of health information has meant that large amounts of data are readily available for secondary purposes. There is increasing demand for that data from different types of users, including researchers and public health professionals. Privacy legislation and regulations in most jurisdictions allows the disclosure of health information for secondary purposes under specific conditions, for example, if it is non-personal or if patient consent is sought. In addition to the difficulty in getting patient consent in practice, there is compelling evidence that consenters and non-consenters differ systematically on important demographic and socio-economic characteristics. Therefore, sharing non-personal information is the most practical solution to allow such disclosures.
EHIL develops technology to facilitate health data sharing, including data synthesis methods, de-identification methods and secure computation methods to allow public health surveillance or analysis without compromising privacy. The different methods are suitable under different circumstances and constraints, from individual-level data release, on-going surveillance, to interactive remote analysis. EHIL’s research spans:
- Theoretical work (which consists of developing mathematical models and metrics of re-identification risk),
- Empirical work (evaluations of our models and metrics through simulations and controlled studies),
- Applied work (evaluations on large data sets), and
- Knowledge translation (building software tools, instruments, and education).
Currently EHIL is focused on developing methods for the generation and evaluation of synthetic health data. We are working on structured datasets, both cross-sectional and longitudinal. The technologies that we develop are being commercialized by the spin-off company Replica Analytics. The work on data synthesis includes:
- Developing synthetic data generation algorithms that can scale up and down, and that can work with heterogeneous data types.
- Developing a unified privacy model for evaluating the different privacy risks in synthetic data.
- Developing a framework for the evaluation of the utility of synthetic data.
- Investigating the application of data synthesis for the simulation of virtual patients to augment real patients recruited in clinical studies.
Historically our work in this area has consisted of:
- Performing empirical risk assessments on health information “leaks”,
- Developing metrics to evaluate re-identification risk for clinical and geospatial data,
- Developing methods and models for the generation of synthetic data,
- Developing algorithms to de-identify large data sets, including longitudinal, cross-sectional, and free-form text data,
- Developing secure computation methods and tools that allow sophisticated analysis, disease surveillance and linking of registries without sharing personal information.