دورية أكاديمية
Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.
العنوان: | Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets. |
---|---|
المؤلفون: | Xu S; Quantitative Biomedical Sciences Program, Dartmouth College, Hanover, NH, USA., Ackerman ME; Quantitative Biomedical Sciences Program, Dartmouth College, Hanover, NH, USA. margaret.e.ackerman@dartmouth.edu.; Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Dartmouth College, Hanover, NH, USA. margaret.e.ackerman@dartmouth.edu.; Thayer School of Engineering, Dartmouth College, 14 Engineering Dr., Hanover, NH, 03755, USA. margaret.e.ackerman@dartmouth.edu. |
المصدر: | BMC bioinformatics [BMC Bioinformatics] 2024 Jun 19; Vol. 25 (1), pp. 218. Date of Electronic Publication: 2024 Jun 19. |
نوع المنشور: | Journal Article |
اللغة: | English |
بيانات الدورية: | Publisher: BioMed Central Country of Publication: England NLM ID: 100965194 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2105 (Electronic) Linking ISSN: 14712105 NLM ISO Abbreviation: BMC Bioinformatics Subsets: MEDLINE |
أسماء مطبوعة: | Original Publication: [London] : BioMed Central, 2000- |
مواضيع طبية MeSH: | Machine Learning*, Supervised Machine Learning ; Humans ; Computational Biology/methods ; Algorithms |
مستخلص: | Background: Compared to traditional supervised machine learning approaches employing fully labeled samples, positive-unlabeled (PU) learning techniques aim to classify "unlabeled" samples based on a smaller proportion of known positive examples. This more challenging modeling goal reflects many real-world scenarios in which negative examples are not available-posing direct challenges to defining prediction accuracy and robustness. While several studies have evaluated predictions learned from only definitive positive examples, few have investigated whether correct classification of a high proportion of known positives (KP) samples from among unlabeled samples can act as a surrogate to indicate model quality. Results: In this study, we report a novel methodology combining multiple established PU learning-based strategies with permutation testing to evaluate the potential of KP samples to accurately classify unlabeled samples without using "ground truth" positive and negative labels for validation. Multivariate synthetic and real-world high-dimensional benchmark datasets were employed to demonstrate the suitability of the proposed pipeline to provide evidence of model robustness across varied underlying ground truth class label compositions among the unlabeled set and with different proportions of KP examples. Comparisons between model performance with actual and permuted labels could be used to distinguish reliable from unreliable models. Conclusions: As in fully supervised machine learning, permutation testing offers a means to set a baseline "no-information rate" benchmark in the context of semi-supervised PU learning inference tasks-providing a standard against which model performance can be compared. (© 2024. The Author(s).) |
References: | Front Immunol. 2022 Feb 22;13:788619. (PMID: 35273592) J Bioinform Comput Biol. 2015 Jun;13(3):1541005. (PMID: 25790785) Nature. 2020 Sep;585(7825):357-362. (PMID: 32939066) Brief Bioinform. 2022 Jan 17;23(1):. (PMID: 34729589) Biometrics. 2009 Jun;65(2):554-63. (PMID: 18759851) IEEE Trans Med Imaging. 2022 Feb;41(2):320-331. (PMID: 34748484) Nat Genet. 2013 Oct;45(10):1113-20. (PMID: 24071849) BMC Bioinformatics. 2010 May 05;11:228. (PMID: 20444264) BMC Bioinformatics. 2010 Jan 18;11 Suppl 1:S6. (PMID: 20122235) |
معلومات مُعتمدة: | R56AI165448 United States NH NIH HHS |
فهرسة مساهمة: | Keywords: High-dimensional biological data; Permutation testing; Positive-unlabeled learning; Semi-supervised machine learning |
تواريخ الأحداث: | Date Created: 20240619 Date Completed: 20240620 Latest Revision: 20240622 |
رمز التحديث: | 20240622 |
مُعرف محوري في PubMed: | PMC11186207 |
DOI: | 10.1186/s12859-024-05834-2 |
PMID: | 38898392 |
قاعدة البيانات: | MEDLINE |
تدمد: | 1471-2105 |
---|---|
DOI: | 10.1186/s12859-024-05834-2 |