دورية أكاديمية

Synthesizing class labels for highly imbalanced credit card fraud detection data

التفاصيل البيبلوغرافية
العنوان: Synthesizing class labels for highly imbalanced credit card fraud detection data
المؤلفون: Robert K. L. Kennedy, Flavio Villanustre, Taghi M. Khoshgoftaar, Zahra Salekshahrezaee
المصدر: Journal of Big Data, Vol 11, Iss 1, Pp 1-22 (2024)
بيانات النشر: SpringerOpen, 2024.
سنة النشر: 2024
المجموعة: LCC:Computer engineering. Computer hardware
LCC:Information technology
LCC:Electronic computers. Computer science
مصطلحات موضوعية: Label synthesis, Label generation, Unsupervised learning, Credit card fraud detection, Class imbalance, Computer engineering. Computer hardware, TK7885-7895, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
الوصف: Abstract Acquiring labeled datasets often incurs substantial costs primarily due to the requirement of expert human intervention to produce accurate and reliable class labels. In the modern data landscape, an overwhelming proportion of newly generated data is unlabeled. This paradigm is especially evident in domains such as fraud detection and datasets for credit card fraud detection. These types of data have their own difficulties associated with being highly class imbalanced, which poses its own challenges to machine learning and classification. Our research addresses these challenges by extensively evaluating a novel methodology for synthesizing class labels for highly imbalanced credit card fraud data. The methodology uses an autoencoder as its underlying learner to effectively learn from dataset features to produce an error metric for use in creating new binary class labels. The methodology aims to automatically produce new labels with minimal expert input. These class labels are then used to train supervised classifiers for fraud detection. Our empirical results show that the synthesized labels are of high enough quality to produce classifiers that significantly outperform a baseline learner comparison when using area under the precision-recall curve (AUPRC). We also present results of varying levels of positive-labeled instances and their effect on classifier performance. Results show that AUPRC performance improves as more instances are labeled positive and belong to the minority class. Our methodology thereby effectively addresses the concerns of high class imbalance in machine learning by creating new and effective class labels.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2196-1115
Relation: https://doaj.org/toc/2196-1115
DOI: 10.1186/s40537-024-00897-7
URL الوصول: https://doaj.org/article/977d85c21b024c07afbcf3f61c5af59b
رقم الأكسشن: edsdoj.977d85c21b024c07afbcf3f61c5af59b
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:21961115
DOI:10.1186/s40537-024-00897-7