Data Minimization for GDPR Compliance in Machine Learning Models

التفاصيل البيبلوغرافية
العنوان: Data Minimization for GDPR Compliance in Machine Learning Models
المؤلفون: Ariel Farkash, Ron Shmelkin, Gilad Ezov, Abigail Goldsteen, Micha Moffie
بيانات النشر: arXiv, 2020.
سنة النشر: 2020
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer science, Generalization, Energy Engineering and Power Technology, 02 engineering and technology, Management Science and Operations Research, Machine learning, computer.software_genre, Machine Learning (cs.LG), Set (abstract data type), K.6.5, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, I.2.6, business.industry, Mechanical Engineering, Privacy rights, General Data Protection Regulation, Deep neural networks, 020201 artificial intelligence & image processing, Minification, Artificial intelligence, business, computer, Cryptography and Security (cs.CR)
الوصف: The EU General Data Protection Regulation (GDPR) and the California Privacy Rights Act (CPRA) mandate the principle of data minimization, which requires that only data necessary to fulfill a certain purpose be collected. However, it can often be difficult to determine the minimal amount of data required, especially in complex machine learning models such as deep neural networks. We present a first-of-a-kind method to reduce the amount of personal data needed to perform predictions with a machine learning model, by removing or generalizing some of the input features of the runtime data. Our method makes use of the knowledge encoded within the model to produce a generalization that has little to no impact on its accuracy, based on knowledge distillation approaches. We show that, in some cases, less data may be collected while preserving the exact same level of model accuracy as before, and if a small deviation in accuracy is allowed, even more generalizations of the input features may be performed. We also demonstrate that when collecting the features dynamically, the generalizations may be even further improved. This method enables organizations to truly minimize the amount of data collected, thus fulfilling the data minimization requirement set out in the regulations.
DOI: 10.48550/arxiv.2008.04113
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::acb46aa7365db31d14eac20a896f3615
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....acb46aa7365db31d14eac20a896f3615
قاعدة البيانات: OpenAIRE
الوصف
DOI:10.48550/arxiv.2008.04113