دورية أكاديمية

PERFORMANCE ASSESSMENT OF AN ARTIFICIAL INTELLIGENCE CHATBOT IN CLINICAL VITREORETINAL SCENARIOS.

التفاصيل البيبلوغرافية
العنوان: PERFORMANCE ASSESSMENT OF AN ARTIFICIAL INTELLIGENCE CHATBOT IN CLINICAL VITREORETINAL SCENARIOS.
المؤلفون: Maywood MJ; Department of Ophthalmology, Corewell Health William Beaumont University Hospital, Royal Oak, Michigan., Parikh R; Manhattan Retina and Eye Consultants, New York, New York.; Department of Ophthalmology, New York University School of Medicine, New York, New York., Deobhakta A; Icahn School of Medicine of Mount Sinai, New York, New York., Begaj T; Department of Ophthalmology, Corewell Health William Beaumont University Hospital, Royal Oak, Michigan.; Associated Retinal Consultants, Royal Oak, Michigan .
المصدر: Retina (Philadelphia, Pa.) [Retina] 2024 Jun 01; Vol. 44 (6), pp. 954-964.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Lippincott Williams & Wilkins Country of Publication: United States NLM ID: 8309919 Publication Model: Print Cited Medium: Internet ISSN: 1539-2864 (Electronic) Linking ISSN: 0275004X NLM ISO Abbreviation: Retina Subsets: MEDLINE
أسماء مطبوعة: Publication: Hagerstown, MD : Lippincott Williams & Wilkins
Original Publication: Philadelphia : Lippincott, [1981?-
مواضيع طبية MeSH: Artificial Intelligence* , Retinal Diseases*/surgery , Vitreoretinal Surgery*, Humans ; Cross-Sectional Studies ; Retrospective Studies
مستخلص: Purpose: To determine how often ChatGPT is able to provide accurate and comprehensive information regarding clinical vitreoretinal scenarios. To assess the types of sources ChatGPT primarily uses and to determine whether they are hallucinated.
Methods: This was a retrospective cross-sectional study. The authors designed 40 open-ended clinical scenarios across four main topics in vitreoretinal disease. Responses were graded on correctness and comprehensiveness by three blinded retina specialists. The primary outcome was the number of clinical scenarios that ChatGPT answered correctly and comprehensively. Secondary outcomes included theoretical harm to patients, the distribution of the type of references used by the chatbot, and the frequency of hallucinated references.
Results: In June 2023, ChatGPT answered 83% of clinical scenarios (33/40) correctly but provided a comprehensive answer in only 52.5% of cases (21/40). Subgroup analysis demonstrated an average correct score of 86.7% in neovascular age-related macular degeneration, 100% in diabetic retinopathy, 76.7% in retinal vascular disease, and 70% in the surgical domain. There were six incorrect responses with one case (16.7%) of no harm, three cases (50%) of possible harm, and two cases (33.3%) of definitive harm.
Conclusion: ChatGPT correctly answered more than 80% of complex open-ended vitreoretinal clinical scenarios, with a reduced capability to provide a comprehensive response.
References: Yeo YH, Samaan JS, Ng WH, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol 2023;29:721–732.
Mihalache A, Huang RS, Popovic MM, Muni RH. Performance of an upgraded artificial intelligence chatbot for ophthalmic knowledge assessment. JAMA Ophthalmol 2023;141:798–800.
Antaki F, Touma S, Milad D, et al. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci 2023;3:100324.
Cai LZ, Shaheen A, Jin A, et al. Performance of generative large language models on ophthalmology board-style questions. Am J Ophthalmol 2023;254:141–149.
Bogost I. ChatGPT Is Dumber Than You Think; 2023. Atl. Available at: https://www.theatlantic.com/technology/archive/2022/12/chatgpt-openai-artificial-intelligence-writingethics/672386/.
Marion S. How to Use OpenAI Model Temperature? GPT for Work; 2023. Available at: https://gptforwork.com/guides/openai-gpt3-temperature . Accessed June 8, 2023.
2023 PAT Survey. American Society of Retinal Specialists; 2023. Available at: https://www.asrs.org/content/documents/_asrs-2023-pat-survey-for-website.pdf . Accessed June 30, 2023.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–174.
Sarraju A, Bruemmer D, Van Iterson E, et al. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 2023;329:842–844.
Flaxel CJ, Adelman RA, Bailey ST, et al. Diabetic retinopathy preferred Practice Pattern®. Ophthalmology 2020;127:P66–P145.
Flaxel CJ, Adelman RA, Bailey ST, et al. Age-related macular degeneration preferred Practice Pattern®. Ophthalmology 2020;127:P1–P65.
Momenaei B, Wakabayashi T, Shahlaee A, et al. Appropriateness and readability of ChatGPT-4-generated responses for surgical treatment of retinal diseases. Ophthalmol Retina 2023;7:862–868.
Caranfa JT, Bommakanti NK, Young BK, Zhao PY. Accuracy of vitreoretinal disease information from an artificial intelligence chatbot. JAMA Ophthalmol 2023;141:906–907.
Grewal DS, Charles S, Parolini B, et al. Autologous retinal transplant for refractory macular holes: multicenter international collaborative study group. Ophthalmology 2019;126:1399–1408.
Ersoz MG, Karacorlu M, Arf S, et al. Retinal pigment epithelium tears: classification, pathogenesis, predictors, and management. Surv Ophthalmol 2017;62:493–505.
Hua HU, Kaakour AH, Rachitskaya A, et al. Evaluation and comparison of ophthalmic scientific abstracts and references by current artificial intelligence chatbots. JAMA Ophthalmol 2023;141:819–824.
تواريخ الأحداث: Date Created: 20240125 Date Completed: 20240520 Latest Revision: 20240520
رمز التحديث: 20240520
DOI: 10.1097/IAE.0000000000004053
PMID: 38271674
قاعدة البيانات: MEDLINE
الوصف
تدمد:1539-2864
DOI:10.1097/IAE.0000000000004053