دورية أكاديمية

An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE.

التفاصيل البيبلوغرافية
العنوان: An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE.
المؤلفون: Yang W; Intelligent Network and Information System, School of Electronic & Information Engineering, Nanjing University of Information Science & Technology, Nanjing, 210044, China., Pan C; Intelligent Network and Information System, School of Electronic & Information Engineering, Nanjing University of Information Science & Technology, Nanjing, 210044, China. 003150@nuist.edu.cn., Zhang Y; Intelligent Network and Information System, School of Electronic & Information Engineering, Nanjing University of Information Science & Technology, Nanjing, 210044, China.
المصدر: Scientific reports [Sci Rep] 2022 Oct 07; Vol. 12 (1), pp. 16820. Date of Electronic Publication: 2022 Oct 07.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE
أسماء مطبوعة: Original Publication: London : Nature Publishing Group, copyright 2011-
مواضيع طبية MeSH: Data Accuracy* , Sampling Studies*, Humans
مستخلص: With the rapid expansion of data, the problem of data imbalance has become increasingly prominent in the fields of medical treatment, finance, network, etc. And it is typically solved using the oversampling method. However, most existing oversampling methods randomly sample or sample only for a particular area, which affects the classification results. To solve the above limitations, this study proposes an imbalanced data oversampling method, SD-KMSMOTE, based on the spatial distribution of minority samples. A filter noise pre-treatment is added, the category information of the near-neighbouring samples is considered, and the existing minority class sample noise is removed. These conditions lead to the design of a new sample synthesis method, and the rules for calculating the weight values are constructed on this basis. The spatial distribution of minority class samples is considered comprehensively; they are clustered, and the sub-clusters that contain useful information are assigned larger weight values and more synthetic sample numbers. The experimental results show that the experimental results outperform existing methods in terms of precision, recall, F1 score, G-mean, and area under the curve values when the proposed method is used to expand the imbalanced dataset in the field of medicine and other fields.
(© 2022. The Author(s).)
References: BMC Bioinformatics. 2013 Mar 22;14:106. (PMID: 23522326)
Sci Rep. 2021 Dec 15;11(1):24039. (PMID: 34912009)
تواريخ الأحداث: Date Created: 20221007 Date Completed: 20221011 Latest Revision: 20221206
رمز التحديث: 20221206
مُعرف محوري في PubMed: PMC9546831
DOI: 10.1038/s41598-022-21046-1
PMID: 36207460
قاعدة البيانات: MEDLINE
الوصف
تدمد:2045-2322
DOI:10.1038/s41598-022-21046-1