دورية أكاديمية

Thresholding Gini variable importance with a single-trained random forest: An empirical Bayes approach

التفاصيل البيبلوغرافية
العنوان: Thresholding Gini variable importance with a single-trained random forest: An empirical Bayes approach
المؤلفون: Robert Dunne, Roc Reguant, Priya Ramarao-Milne, Piotr Szul, Letitia M.F. Sng, Mischa Lundberg, Natalie A. Twine, Denis C. Bauer
المصدر: Computational and Structural Biotechnology Journal, Vol 21, Iss , Pp 4354-4360 (2023)
بيانات النشر: Elsevier, 2023.
سنة النشر: 2023
المجموعة: LCC:Biotechnology
مصطلحات موضوعية: Random forest, Feature selection, Empirical Bayes, Genetic analysis, Machine learning significance, Local FDR, Biotechnology, TP248.13-248.65
الوصف: Random forests (RFs) are a widely used modelling tool capable of feature selection via a variable importance measure (VIM), however, a threshold is needed to control for false positives. In the absence of a good understanding of the characteristics of VIMs, many current approaches attempt to select features associated to the response by training multiple RFs to generate statistical power via a permutation null, by employing recursive feature elimination, or through a combination of both. However, for high-dimensional datasets these approaches become computationally infeasible. In this paper, we present RFlocalfdr, a statistical approach, built on the empirical Bayes argument of Efron, for thresholding mean decrease in impurity (MDI) importances. It identifies features significantly associated with the response while controlling the false positive rate. Using synthetic data and real-world data in health, we demonstrate that RFlocalfdr has equivalent accuracy to currently published approaches, while being orders of magnitude faster. We show that RFlocalfdr can successfully threshold a dataset of 106 datapoints, establishing its usability for large-scale datasets, like genomics. Furthermore, RFlocalfdr is compatible with any RF implementation that returns a VIM and counts, making it a versatile feature selection tool that reduces false discoveries.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2001-0370
Relation: http://www.sciencedirect.com/science/article/pii/S2001037023003082; https://doaj.org/toc/2001-0370
DOI: 10.1016/j.csbj.2023.08.033
URL الوصول: https://doaj.org/article/6ec28ccb8ba54d329a8d116c6276dab5
رقم الأكسشن: edsdoj.6ec28ccb8ba54d329a8d116c6276dab5
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:20010370
DOI:10.1016/j.csbj.2023.08.033