The Promise & Pitfalls of Public Data in Private ML
Date & Time
Friday, November 10, 2023, 11:00 AM - 12:00 PM

Vector Institute Distinguished Lecture Series
Friday, December 8, 2023, 11:00 AM - Friday, June 14, 2024, 12:00 PM


Talk abstract: Machine learning models are frequently trained on large-scale datasets, which may contain sensitive or personal data. Worryingly, without special care, these models are prone to revealing information about datapoints in their training set, leading to violations of individual privacy. To protect against such privacy risks, we can train models with differential privacy (DP), a rigorous notion of individual data privacy. While training models with DP has previously been observed to result in unacceptable losses in utility, I will discuss recent advances which incorporate public data into the training pipeline, allowing models to guarantee both privacy and utility. I will also discuss potential pitfalls of this approach, and directions forward for the community. 


Join Meeting

* This event is open to the public with emphasis on graduate students in machine learning, computer science, ECE, statistics, mathematics, linguistics, medicine, as well as PhD-level data scientists in the GTA.