Mitigating Biases in Toxic Language Detection through Invariant Rationalization

التفاصيل البيبلوغرافية
العنوان: Mitigating Biases in Toxic Language Detection through Invariant Rationalization
المؤلفون: Yung-Sung Chuang, Mingye Gao, Hongyin Luo, James Glass, Yun-Nung Chen, Hung-yi Lee, Shang-Wen Li
المصدر: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021).
بيانات النشر: Association for Computational Linguistics, 2021.
سنة النشر: 2021
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Science - Computation and Language, Language identification, Computer science, Natural language understanding, Spurious correlation, Identity (social science), Debiasing, Verbal abuse, computer.software_genre, Rationalization (economics), Computation and Language (cs.CL), computer, Invariant (computer science), Cognitive psychology
الوصف: Automatic detection of toxic language plays an essential role in protecting social media users, especially minority groups, from verbal abuse. However, biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. The biases make the learned models unfair and can even exacerbate the marginalization of people. Considering that current debiasing methods for general natural language understanding tasks cannot effectively mitigate the biases in the toxicity detectors, we propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns (e.g., identity mentions, dialect) to toxicity labels. We empirically show that our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
The 5th Workshop on Online Abuse and Harms at ACL 2021
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a1b6b9cbc5d967ba1f22f25c3a72ea8c
https://doi.org/10.18653/v1/2021.woah-1.12
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....a1b6b9cbc5d967ba1f22f25c3a72ea8c
قاعدة البيانات: OpenAIRE