دورية أكاديمية

Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.

التفاصيل البيبلوغرافية
العنوان: Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.
المؤلفون: Xu S; Quantitative Biomedical Sciences Program, Dartmouth College, Hanover, NH, USA., Ackerman ME; Quantitative Biomedical Sciences Program, Dartmouth College, Hanover, NH, USA. margaret.e.ackerman@dartmouth.edu.; Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Dartmouth College, Hanover, NH, USA. margaret.e.ackerman@dartmouth.edu.; Thayer School of Engineering, Dartmouth College, 14 Engineering Dr., Hanover, NH, 03755, USA. margaret.e.ackerman@dartmouth.edu.
المصدر: BMC bioinformatics [BMC Bioinformatics] 2024 Jun 19; Vol. 25 (1), pp. 218. Date of Electronic Publication: 2024 Jun 19.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE
أسماء مطبوعة: Original Publication: [London] : BioMed Central, 2000-
مواضيع طبية MeSH: Machine Learning*, Supervised Machine Learning ; Humans ; Computational Biology/methods ; Algorithms
مستخلص: Background: Compared to traditional supervised machine learning approaches employing fully labeled samples, positive-unlabeled (PU) learning techniques aim to classify "unlabeled" samples based on a smaller proportion of known positive examples. This more challenging modeling goal reflects many real-world scenarios in which negative examples are not available-posing direct challenges to defining prediction accuracy and robustness. While several studies have evaluated predictions learned from only definitive positive examples, few have investigated whether correct classification of a high proportion of known positives (KP) samples from among unlabeled samples can act as a surrogate to indicate model quality.
Results: In this study, we report a novel methodology combining multiple established PU learning-based strategies with permutation testing to evaluate the potential of KP samples to accurately classify unlabeled samples without using "ground truth" positive and negative labels for validation. Multivariate synthetic and real-world high-dimensional benchmark datasets were employed to demonstrate the suitability of the proposed pipeline to provide evidence of model robustness across varied underlying ground truth class label compositions among the unlabeled set and with different proportions of KP examples. Comparisons between model performance with actual and permuted labels could be used to distinguish reliable from unreliable models.
Conclusions: As in fully supervised machine learning, permutation testing offers a means to set a baseline "no-information rate" benchmark in the context of semi-supervised PU learning inference tasks-providing a standard against which model performance can be compared.
(© 2024. The Author(s).)
References: Front Immunol. 2022 Feb 22;13:788619. (PMID: 35273592)
J Bioinform Comput Biol. 2015 Jun;13(3):1541005. (PMID: 25790785)
Nature. 2020 Sep;585(7825):357-362. (PMID: 32939066)
Brief Bioinform. 2022 Jan 17;23(1):. (PMID: 34729589)
Biometrics. 2009 Jun;65(2):554-63. (PMID: 18759851)
IEEE Trans Med Imaging. 2022 Feb;41(2):320-331. (PMID: 34748484)
Nat Genet. 2013 Oct;45(10):1113-20. (PMID: 24071849)
BMC Bioinformatics. 2010 May 05;11:228. (PMID: 20444264)
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1:S6. (PMID: 20122235)
معلومات مُعتمدة: R56AI165448 United States NH NIH HHS
فهرسة مساهمة: Keywords: High-dimensional biological data; Permutation testing; Positive-unlabeled learning; Semi-supervised machine learning
تواريخ الأحداث: Date Created: 20240619 Date Completed: 20240620 Latest Revision: 20240622
رمز التحديث: 20240622
مُعرف محوري في PubMed: PMC11186207
DOI: 10.1186/s12859-024-05834-2
PMID: 38898392
قاعدة البيانات: MEDLINE
الوصف
تدمد:1471-2105
DOI:10.1186/s12859-024-05834-2