دورية أكاديمية

Deep learning uncertainty quantification for clinical text classification.

التفاصيل البيبلوغرافية
العنوان: Deep learning uncertainty quantification for clinical text classification.
المؤلفون: Peluso A; Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States. Electronic address: pelusoa@ornl.gov., Danciu I; Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States., Yoon HJ; Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States., Yusof JM; Los Alamos National Laboratory, Los Alamos, NM 87545, United States., Bhattacharya T; Los Alamos National Laboratory, Los Alamos, NM 87545, United States., Spannaus A; Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States., Schaefferkoetter N; Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States., Durbin EB; University of Kentucky, Lexington, KY 40536, United States., Wu XC; Louisiana State University, New Orleans, LA 70112, United States., Stroup A; Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, United States., Doherty J; University of Utah, Salt Lake City, UT 84132, United States., Schwartz S; Fred Hutchinson Cancer Research Center, Seattle, WA 98109, United States., Wiggins C; University of New Mexico, Albuquerque, NM 87131, United States., Coyle L; Information Management Services Inc., Calverton, MD 20705, United States., Penberthy L; National Cancer Institute, Bethesda, MD 20814, United States., Tourassi GD; Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States., Gao S; Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
المصدر: Journal of biomedical informatics [J Biomed Inform] 2024 Jan; Vol. 149, pp. 104576. Date of Electronic Publication: 2023 Dec 13.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Elsevier Country of Publication: United States NLM ID: 100970413 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1532-0480 (Electronic) Linking ISSN: 15320464 NLM ISO Abbreviation: J Biomed Inform Subsets: MEDLINE
أسماء مطبوعة: Publication: Orlando : Elsevier
Original Publication: San Diego, CA : Academic Press, c2001-
مواضيع طبية MeSH: Deep Learning*, Humans ; Uncertainty ; Neural Networks, Computer ; Algorithms ; Machine Learning
مستخلص: Introduction: Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated.
Method: In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier.
Results: Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC.
Conclusions: We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Published by Elsevier Inc.)
فهرسة مساهمة: Keywords: Abstaining classifier; Accuracy; CNN; DNN; Deep learning; HiSAN; NCI SEER; Pathology reports; Selective classification; Text classification; Uncertainty quantification
تواريخ الأحداث: Date Created: 20231215 Date Completed: 20240122 Latest Revision: 20240426
رمز التحديث: 20240426
DOI: 10.1016/j.jbi.2023.104576
PMID: 38101690
قاعدة البيانات: MEDLINE
الوصف
تدمد:1532-0480
DOI:10.1016/j.jbi.2023.104576