دورية أكاديمية

Performance evaluate of different chemometrics formalisms used for prostate cancer diagnosis by NMR-based metabolomics.

التفاصيل البيبلوغرافية
العنوان: Performance evaluate of different chemometrics formalisms used for prostate cancer diagnosis by NMR-based metabolomics.
المؤلفون: Oliveira MF; Metabonomics and Chemometrics Laboratory, Fundamental Chemistry Department, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, s/n, Cidade Universitária, Recife, Pernambuco, Brazil. marcio.felipe@ufpe.br.; Fundamental Chemistry Department, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, s/n, Cidade Universitária, Recife, Pernambuco, Brazil. marcio.felipe@ufpe.br., de Albuquerque Neto MC; Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil., Leite TS; Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil., Alves PAA; Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil., Lima SVC; Surgery Department, Clinics Hospital, Urology Clinic, Universidade Federal de Pernambuco, Av. Professor Luis Freire, s/n. Cidade Universitária, Recife, Pernambuco, Brazil., Silva RO; Metabonomics and Chemometrics Laboratory, Fundamental Chemistry Department, Universidade Federal de Pernambuco, Av. Jornalista Anibal Fernandes, s/n, Cidade Universitária, Recife, Pernambuco, Brazil.
المصدر: Metabolomics : Official journal of the Metabolomic Society [Metabolomics] 2023 Dec 21; Vol. 20 (1), pp. 8. Date of Electronic Publication: 2023 Dec 21.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Springer Country of Publication: United States NLM ID: 101274889 Publication Model: Electronic Cited Medium: Internet ISSN: 1573-3890 (Electronic) Linking ISSN: 15733882 NLM ISO Abbreviation: Metabolomics Subsets: MEDLINE
أسماء مطبوعة: Original Publication: New York : Springer, c2006-
مواضيع طبية MeSH: Chemometrics* , Prostatic Neoplasms*/diagnosis, Male ; Humans ; Metabolomics ; Magnetic Resonance Imaging ; Algorithms
مستخلص: Introduction: In general, two characteristics are ever present in NMR-based metabolomics studies: (1) they are assays aiming to classify the samples in different groups, and (2) the number of samples is smaller than the feature (chemical shift) number. It is also common to observe imbalanced datasets due to the sampling method and/or inclusion criteria. These situations can cause overfitting. However, appropriate feature selection and classification methods can be useful to solve this issue.
Objectives: Investigate the performance of metabolomics models built from the association between feature selectors, the absence of feature selection, and classification algorithms, as well as use the best performance model as an NMR-based metabolomic method for prostate cancer diagnosis.
Methods: We evaluated the performance of NMR-based metabolomics models for prostate cancer diagnosis using seven feature selectors and five classification formalisms. We also obtained metabolomics models without feature selection. In this study, thirty-eight volunteers with a positive diagnosis of prostate cancer and twenty-three healthy volunteers were enrolled.
Results: Thirty-eight models obtained were evaluated using AUROC, accuracy, sensitivity, specificity, and kappa's index values. The best result was obtained when Genetic Algorithm was used with Linear Discriminant Analysis with 0.92 sensitivity, 0.83 specificity, and 0.88 accuracy.
Conclusion: The results show that the pick of a proper feature selection method and classification model, and a resampling method can avoid overfitting in a small metabolomic dataset. Furthermore, this approach would decrease the number of biopsies and optimize patient follow-up. 1 H NMR-based metabolomics promises to be a non-invasive tool in prostate cancer diagnosis.
(© 2023. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.)
References: Calzolari, M. (2022). sklearn-genetic. https://doi.org/10.5281/zenodo.5854662 .
Casadei-Gardini, A., Del Coco, L., Marisi, G., Conti, F., Rovesti, G., Ulivi, P., Canale, M., Frassineti, G. L., Foschi, F. G., Longo, S., Fanizzi, F. P., & Giudetti, A. M. (2020). 1 H-NMR based serum metabolomics highlights different specific biomarkers between early and advanced Hepatocellular Carcinoma stages. Cancers, 12(1), 241. https://doi.org/10.3390/cancers12010241. (PMID: 10.3390/cancers12010241319637667016798)
Chen, T., & Guestrin, C. (2016). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939785. (PMID: 10.1145/2939672.2939785)
Diaz, S. O., Barros, A. S., Goodfellow, B. J., Duarte, I. F., Galhano, E., Pita, C., Almeida, M. D. C., Carreira, I. M., & Gil, A. M. (2013). Second trimester maternal urine for the diagnosis of trisomy 21 and prediction of poor pregnancy outcomes. Journal of Proteome Research, 12(6), 2946–2957. https://doi.org/10.1021/pr4002355 . (PMID: 10.1021/pr400235523611123)
Gómez-Cebrián, N., Rojas-Benedicto, A., Albors-Vaquer, A., López-Guerrero, J. A., Pineda-Lucena, A., & Puchades-Carrasco, L. (2019). Metabolomics contributions to the discovery of prostate cancer biomarkers. Metabolites. https://doi.org/10.3390/metabo9030048. (PMID: 10.3390/metabo9030048308571496468766)
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., & Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2 . (PMID: 10.1038/s41586-020-2649-2329390667759461)
Hekler, E. B., Klasnja, P., Chevance, G., Golaszewski, N. M., Lewis, D., & Sim, I. (2019). Why we need a small data paradigm. BMC Medicine, 17(1), 1–9. https://doi.org/10.1186/s12916-019-1366-x . (PMID: 10.1186/s12916-019-1366-x)
Huang, J., Mondul, A. M., Weinstein, S. J., Karoly, E. D., Sampson, J. N., & Albanes, D. (2017). Prospective serum metabolomic profile of prostate cancer by size and extent of primary tumor. Oncotarget, 8(28), 45190–45199. https://doi.org/10.18632/oncotarget.16775. (PMID: 10.18632/oncotarget.16775284233525542177)
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55. (PMID: 10.1109/MCSE.2007.55)
Lemaitre, G., Nogueira, F., & Aridas, C. K. (2016). Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine Learning. Preprint retrieved from  http://arxiv.org/abs/1609.06570 .
McKinney, W. (2010). Data structures for statistical computing in python. Proceedings of the 9th Python in Science Conference. https://doi.org/10.25080/Majora-92bf1922-00a. (PMID: 10.25080/Majora-92bf1922-00a)
Nagana Gowda, G. A., Gowda, Y. N., & Raftery, D. (2015). Expanding the limits of human blood metabolite quantitation using NMR spectroscopy. Analytical Chemistry, 87(1), 706–715. https://doi.org/10.1021/ac503651e . (PMID: 10.1021/ac503651e25485990)
Neto, F. T. L., Marques, R. A., de Freitas Cavalcanti Filho, A., Araujo, L. C. N., Lima, S. V. C., Pinto, L., & Silva, R. O. (2020). 1 H NMR-based metabonomics for infertility diagnosis in men with varicocele. Journal of Assisted Reproduction and Genetics, 37(9), 2233–2247. https://doi.org/10.1007/s10815-020-01896-2. (PMID: 10.1007/s10815-020-01896-2327153737492286)
Nicholson, J. K., Buckingham, M. J., & Sadler, P. J. (1983). High resolution 1 H n.m.r. studies of vertebrate blood and plasma. Biochemical Journal, 211(3), 605–615. https://doi.org/10.1042/bj2110605. (PMID: 10.1042/bj211060564110641154405)
Nicholson, J. K., Foxall, P. J. D., Spraul, M., Farrant, R. D., & Lindon, J. C. (1995). 750 MHz 1 H and 1 H- 13 C NMR spectroscopy of human blood plasma. Analytical Chemistry, 67(5), 793–811. https://doi.org/10.1021/ac00101a004. (PMID: 10.1021/ac00101a0047762816)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html .
Penney, K. L., Tyekucheva, S., Rosenthal, J., El Fandy, H., Carelli, R., Borgstein, S., Zadra, G., Fanelli, G. N., Stefanizzi, L., Giunchi, F., Pomerantz, M., Peisch, S., Coulson, H., Lis, R., Kibel, A. S., Fiorentino, M., Umeton, R., & Loda, M. (2021). Metabolomics of prostate cancer gleason score in tumor tissue and serum. Molecular Cancer Research, 19(3), 475–484. https://doi.org/10.1158/1541-7786.MCR-20-0548. (PMID: 10.1158/1541-7786.MCR-20-054833168599)
Pérez-Rambla, C., Puchades-Carrasco, L., García-Flores, M., Rubio-Briones, J., López-Guerrero, J. A., & Pineda-Lucena, A. (2017). Non-invasive urinary metabolomic profiling discriminates prostate cancer from Benign prostatic hyperplasia. Metabolomics, 13(5), 1–12. https://doi.org/10.1007/s11306-017-1194-y. (PMID: 10.1007/s11306-017-1194-y)
Pinto, J., Almeida, L. M., Martins, A. S., Duarte, D., Barros, A. S., Galhano, E., Pita, C., Almeida, M. D. C., Carreira, I. M., & Gil, A. M. (2015). Prediction of gestational diabetes through NMR metabolomics of maternal blood. Journal of Proteome Research, 14(6), 2696–2706. https://doi.org/10.1021/acs.jproteome.5b00260. (PMID: 10.1021/acs.jproteome.5b0026025925942)
Silva, R. O., Filho, N., Azevedo, R. A. W., Srivastava, R., & Gallardo, H. (2010). Complete 1 H and 13 C NMR signal assignments and chemical shift calculations of four 1,2,4-oxadiazole-based light-emitting liquid crystals. Structural Chemistry, 21(3), 485–494. https://doi.org/10.1007/s11224-009-9576-z. (PMID: 10.1007/s11224-009-9576-z)
Sreejith, S., Nehemiah, K. H., & Kannan, A. (2020). Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection. Computers in Biology and Medicine. https://doi.org/10.1016/j.compbiomed.2020.103991. (PMID: 10.1016/j.compbiomed.2020.10399132987205)
Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 71(3), 209–249. https://doi.org/10.3322/caac.21660. (PMID: 10.3322/caac.2166033538338)
Tokareva, A. O., Chagovets, V. V., Starodubtseva, N. L., Nazarova, N. M., Nekrasova, M. E., Kononikhin, A. S., Frankevich, V. E., Nikolaev, E. N., & Sukhikh, G. T. (2020). Feature selection for OPLS discriminant analysis of cancer tissue lipidomics data. Journal of Mass Spectrometry, 55(1), 0–2. https://doi.org/10.1002/jms.4457 . (PMID: 10.1002/jms.4457)
Toth, R., Schiffmann, H., Hube-Magg, C., Büscheck, F., Höflmayer, D., Weidemann, S., Lebok, P., Fraune, C., Minner, S., Schlomm, T., Sauter, G., Plass, C., Assenov, Y., Simon, R., Meiners, J., & Gerhäuser, C. (2019). Random forest-based modelling to detect biomarkers for prostate cancer progression. BioRxiv. https://doi.org/10.1101/602334. (PMID: 10.1101/602334)
Umer, M., Sadiq, S., Missen, M. M. S., Hameed, Z., Aslam, Z., Siddique, M. A., & NAPPI, M. (2021). Scientific papers citation analysis using textual features and SMOTE resampling techniques. Pattern Recognition Letters, 150, 250–257. https://doi.org/10.1016/j.patrec.2021.07.009 . (PMID: 10.1016/j.patrec.2021.07.009)
Vabalas, A., Gowen, E., Poliakoff, E., & Casson, A. J. (2019). Machine learning algorithm validation with a limited sample size. Plos One, 14(11), 1–20. https://doi.org/10.1371/journal.pone.0224365 . (PMID: 10.1371/journal.pone.0224365)
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., & Vázquez-Baeza, Y. (2020). SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17(3), 261–272. https://doi.org/10.1038/s41592-019-0686-2 . (PMID: 10.1038/s41592-019-0686-2320155437056644)
Wang, Y., Xu, Y., Yang, Z., Liu, X., & Dai, Q. (2021). Using recursive feature selection with random forest to improve protein structural class prediction for low-similarity sequences. Computational and Mathematical Methods in Medicine, https://doi.org/10.1155/2021/5529389. (PMID: 10.1155/2021/5529389350033278741382)
Waskom, M. (2021). Seaborn: Statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021 . (PMID: 10.21105/joss.03021)
Wu, Y., & Fang, Y. (2020). Stroke prediction with machine learning methods among older chinese. International Journal of Environmental Research and Public Health. https://doi.org/10.3390/ijerph17061828. (PMID: 10.3390/ijerph17061828333965267796369)
Zhao, L. L., Qiu, X. J., Wang, W. B., Li, R. M., & Wang, D. S. (2019). NMR metabolomics and random forests models to identify potential plasma biomarkers of blood stasis syndrome with coronary heart disease patients. Frontiers in Physiology, 10, 1–10. https://doi.org/10.3389/fphys.2019.01109. (PMID: 10.3389/fphys.2019.01109)
Zheng, H., Dong, B., Ning, J., Shao, X., Zhao, L., Jiang, Q., Ji, H., Cai, A., Xue, W., & Gao, H. (2020). NMR-based metabolomics analysis identifies discriminatory metabolic disturbances in tissue and biofluid samples for progressive prostate cancer. Clinica Chimica Acta, 501, 241–251. https://doi.org/10.1016/j.cca.2019.10.046. (PMID: 10.1016/j.cca.2019.10.046)
فهرسة مساهمة: Keywords: Biomarkers; Feature selection; Metabonomics; Overfitting; Prostatic neoplasms; Proton magnetic resonance spectroscopy
تواريخ الأحداث: Date Created: 20231221 Date Completed: 20231222 Latest Revision: 20240229
رمز التحديث: 20240301
DOI: 10.1007/s11306-023-02067-x
PMID: 38127222
قاعدة البيانات: MEDLINE
الوصف
تدمد:1573-3890
DOI:10.1007/s11306-023-02067-x