دورية أكاديمية

Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis

التفاصيل البيبلوغرافية
العنوان: Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis
المؤلفون: Koichi Fujiwara, Yukun Huang, Kentaro Hori, Kenichi Nishioji, Masao Kobayashi, Mai Kamaguchi, Manabu Kano
المصدر: Frontiers in Public Health, Vol 8 (2020)
بيانات النشر: Frontiers Media S.A., 2020.
سنة النشر: 2020
المجموعة: LCC:Public aspects of medicine
مصطلحات موضوعية: health record analysis, imbalanced data problem, boosting, over- and under-sampling, stomach cancer detection, Public aspects of medicine, RA1-1270
الوصف: A considerable amount of health record (HR) data has been stored due to recent advances in the digitalization of medical systems. However, it is not always easy to analyze HR data, particularly when the number of persons with a target disease is too small in comparison with the population. This situation is called the imbalanced data problem. Over-sampling and under-sampling are two approaches for redressing an imbalance between minority and majority examples, which can be combined into ensemble algorithms. However, these approaches do not function when the absolute number of minority examples is small, which is called the extremely imbalanced and small minority (EISM) data problem. The present work proposes a new algorithm called boosting combined with heuristic under-sampling and distribution-based sampling (HUSDOS-Boost) to solve the EISM data problem. To make an artificially balanced dataset from the original imbalanced datasets, HUSDOS-Boost uses both under-sampling and over-sampling to eliminate redundant majority examples based on prior boosting results and to generate artificial minority examples by following the minority class distribution. The performance and characteristics of HUSDOS-Boost were evaluated through application to eight imbalanced datasets. In addition, the algorithm was applied to original clinical HR data to detect patients with stomach cancer. These results showed that HUSDOS-Boost outperformed current imbalanced data handling methods, particularly when the data are EISM. Thus, the proposed HUSDOS-Boost is a useful methodology of HR data analysis.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2296-2565
Relation: https://www.frontiersin.org/article/10.3389/fpubh.2020.00178/full; https://doaj.org/toc/2296-2565
DOI: 10.3389/fpubh.2020.00178
URL الوصول: https://doaj.org/article/b6b58a053965411693d6a05cc408cd44
رقم الأكسشن: edsdoj.b6b58a053965411693d6a05cc408cd44
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:22962565
DOI:10.3389/fpubh.2020.00178