دورية أكاديمية

Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard.

التفاصيل البيبلوغرافية
العنوان: Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard.
المؤلفون: Cheong RCT; Otolaryngology-Head and Neck Surgery Department, The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK. ryan.cheong@nhs.net., Pang KP; Asia Sleep Centre, Singapore, Singapore., Unadkat S; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK., Mcneillis V; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK., Williamson A; Otolaryngology-Head and Neck Surgery Department, The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK., Joseph J; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK., Randhawa P; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK., Andrews P; Otolaryngology-Head and Neck Surgery Department, The Royal National ENT and Eastman Dental Hospitals, University College London Hospitals NHS Foundation Trust, London, UK., Paleri V; Otolaryngology-Head and Neck Surgery Department, The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK.
المصدر: European archives of oto-rhino-laryngology : official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) : affiliated with the German Society for Oto-Rhino-Laryngology - Head and Neck Surgery [Eur Arch Otorhinolaryngol] 2024 Apr; Vol. 281 (4), pp. 2137-2143. Date of Electronic Publication: 2023 Dec 20.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Springer International Country of Publication: Germany NLM ID: 9002937 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1434-4726 (Electronic) Linking ISSN: 09374477 NLM ISO Abbreviation: Eur Arch Otorhinolaryngol Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Heidelberg : Springer International, c1990-
مواضيع طبية MeSH: Artificial Intelligence* , Physicians*, Humans ; Search Engine ; Certification ; Sleep
مستخلص: Purpose: To conduct a comparative performance evaluation of GPT-3.5, GPT-4 and Google Bard in self-assessment questions at the level of the American Sleep Medicine Certification Board Exam.
Methods: A total of 301 text-based single-best-answer multiple choice questions with four answer options each, across 10 categories, were included in the study and transcribed as inputs for GPT-3.5, GPT-4 and Google Bard. The first output responses generated were selected and matched for answer accuracy against the gold-standard answer provided by the American Academy of Sleep Medicine for each question. A global score of 80% and above is required by human sleep medicine specialists to pass each exam category.
Results: GPT-4 successfully achieved the pass mark of 80% or above in five of the 10 exam categories, including the Normal Sleep and Variants Self-Assessment Exam (2021), Circadian Rhythm Sleep-Wake Disorders Self-Assessment Exam (2021), Insomnia Self-Assessment Exam (2022), Parasomnias Self-Assessment Exam (2022) and the Sleep-Related Movements Self-Assessment Exam (2023). GPT-4 demonstrated superior performance in all exam categories and achieved a higher overall score of 68.1% when compared against both GPT-3.5 (46.8%) and Google Bard (45.5%), which was statistically significant (p value < 0.001). There was no significant difference in the overall score performance between GPT-3.5 and Google Bard.
Conclusions: Otolaryngologists and sleep medicine physicians have a crucial role through agile and robust research to ensure the next generation AI chatbots are built safely and responsibly.
(© 2023. Crown.)
References: AI Principles-Future of Life Institute [Internet]. [cited 2023 Aug 9]. https://futureoflife.org/open-letter/ai-principles/.
OpenAI (2022) Introducing ChatGPT [Internet]. OpenAI.com. 2022 [cited 2023 Jul 6]. p. 1–11. https://openai.com/blog/chatgpt.
Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, et al (2023) Sparks of artificial general intelligence: early experiments with GPT-4. http://arxiv.org/abs/2303.12712.
Google AI updates: Bard and new AI features in Search [Internet]. Google. 2023 [cited 2023 Jul 6]. https://blog.google/technology/ai/bard-google-ai-search-updates/.
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C et al (2023) Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Heal 2(2):e0000198. (PMID: 10.1371/journal.pdig.0000198)
Skalidis I, Cagnina A, Luangphiphat W, Mahendiran T, Muller O, Abbe E et al (2023) ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story? Eur Hear J Digit Heal [Internet] 4(3):279–281. https://doi.org/10.1093/ehjdh/ztad029. (PMID: 10.1093/ehjdh/ztad029)
Bhayana R, Bleakney RR, Krishna S (2023) GPT-4 in radiology: improvements in advanced reasoning. Radiology 307(5):4–6. (PMID: 10.1148/radiol.230987)
Hoch CC, Wollenberg B, Lüers JC, Knoedler S, Knoedler L, Frank K et al (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Oto-Rhino-Laryngology [Internet]. https://doi.org/10.1007/s00405-023-08051-4. (PMID: 10.1007/s00405-023-08051-4)
Antaki F, Touma S, Milad D, El-Khoury J, Duval R (2023) Evaluating the performance of ChatGPT in ophthalmology. Ophthalmol Sci Internet. 3(4):100324. https://doi.org/10.1016/j.xops.2023.100324. (PMID: 10.1016/j.xops.2023.100324)
Kumah-Crystal Y, Mankowitz S, Embi P, Lehmann CU (2023) ChatGPT and the clinical informatics board examination: the end of unproctored maintenance of certification? J Am Med Informatics Assoc. 30(9):1558–1560.
Birkett L, Fowler T, Pullen S (2023) Performance of ChatGPT on a primary FRCA multiple choice question bank. Br J Anaesth [Internet]. 2023 Aug 1 [cited 2023 Aug 11];131(2):e34–5. http://www.bjanaesthesia.org.uk/article/S0007091223002003/fulltext.
Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope [Internet] 131(10):E2712–E2717. https://doi.org/10.1002/lary.29725. (PMID: 10.1002/lary.2972534216147)
Quan SF, Buysse DJ, Davidson Ward SL, Harding SM, Iber C, Kapur VK et al (2012) Development and growth of a large multispecialty certification examination: Sleep medicine certification—results of the first three examinations. J Clin Sleep Med 8(2):221–224. (PMID: 10.5664/jcsm.1790225058713311423)
Grandner MA, Fernandez FX (2021) The translational neuroscience of sleep: a contextual framework. Science (80-) 374(6567):568–573. (PMID: 10.1126/science.abj8188)
Benjafield AV, Ayas NT, Eastwood PR, Heinzer R, Ip MSM, Morrell MJ et al (2019) Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med 7(8):687–698. (PMID: 10.1016/S2213-2600(19)30198-5313003347007763)
Stoller MK. Economic effects of insomnia-PubMed [Internet]. [cited 2023 Aug 12]. https://pubmed.ncbi.nlm.nih.gov/7859246/.
Marin JM, Carrizo SJ, Vicente E, Agusti AGN (2005) Long-term cardiovascular outcomes in men with obstructive sleep apnoea-hypopnoea with or without treatment with continuous positive airway pressure: an observational study. Lancet 365(9464):1046–1053. (PMID: 10.1016/S0140-6736(05)71141-715781100)
Lloyd-Jones DM, Allen NB, Anderson CAM, Black T, Brewer LC, Foraker RE et al (2022) Life’s essential 8: updating and enhancing the american heart association’s construct of cardiovascular health: a presidential advisory from the American Heart Association. Circulation 146(5):E18-43. (PMID: 10.1161/CIR.00000000000010783576602710503546)
Maintenance of Certification for Sleep Medicine | AASM MOC Program [Internet]. [cited 2023 Aug 13]. https://aasm.org/professional-development/maintenance-of-certification/.
Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope [Internet]. 2021 Oct 1 [cited 2023 Aug 15];131(10):E2712–7. https://pubmed.ncbi.nlm.nih.gov/34216147/.
Roche J, Rae DE, Redman KN, Knutson KL, von Schantz M, Gómez-Olivé FX, et al (2021) Sleep disorders in low- and middle-income countries: a call for action. J Clin Sleep Med [Internet]. 2021 Nov 1 [cited 2023 Aug 15];17(11):2341–2. https://pubmed.ncbi.nlm.nih.gov/34666888/.
Global Medical Education Market Report 2023: Sector is Expected to Reach $65.92 Billion by 2028 at a CAGR of 6.6% - ResearchAndMarkets.com | Business Wire [Internet]. [cited 2023 Aug 15]. https://www.businesswire.com/news/home/20230608005458/en/Global-Medical-Education-Market-Report-2023-Sector-is-Expected-to-Reach-65.92-Billion-by-2028-at-a-CAGR-of-6.6---ResearchAndMarkets.com.
Khan RA, Jawaid M, Khan AR, Sajjad M (2023) ChatGPT-Reshaping medical education and clinical management. Pakistan J Med Sci 39(2):605–607.
Maintenance of Certification for Sleep Medicine | AASM MOC Program [Internet]. [cited 2023 Aug 16]. https://aasm.org/professional-development/maintenance-of-certification/.
Online Exam Rules and Settings | ESRS [Internet]. [cited 2023 Aug 16]. https://esrs.eu/sleep-medicine-examination/online-exam-rules-and-settings/.
Apply for Exam | [Internet]. [cited 2023 Aug 16]. https://worldsleepsociety.org/examination/application/.
Susnjak T (2022) ChatGPT: the end of online exam integrity?, pp 1–21. http://arxiv.org/abs/2212.09292.
Nori H, King N, McKinney SM, Carignan D, Horvitz E (2023) Capabilities of GPT-4 on medical challenge problems, pp 1–35. http://arxiv.org/abs/2303.13375.
Yu PK, Gadkaree SK, Li J, McCarty JC, Huyett P, Bergmark RW (2021) Characteristics of the dual board-certified sleep otolaryngology workforce. Laryngoscope 131(10):E2712–E2717. (PMID: 10.1002/lary.2972534216147)
Sleep Medicine Examination-Requirements and Application for Somnologists | ESRS [Internet]. [cited 2023 Aug 16]. https://esrs.eu/sleep-medicine-examination/requirements/somnologists/.
Reeder K, Lee H (2022) Impact of artificial intelligence on US medical students’ choice of radiology. Clin Imaging 1(81):67–71. (PMID: 10.1016/j.clinimag.2021.09.018)
How Much Does ChatGPT Cost to Run? $700K/day, Per Analyst [Internet]. [cited 2023 Aug 18]. https://www.businessinsider.com/how-much-chatgpt-costs-openai-to-run-estimate-report-2023-4?r=US&IR=T.
Oosthuizen RM (2022) The fourth industrial revolution—smart technology, artificial intelligence, robotics and algorithms: industrial psychologists in future workplaces. Front Artif Intell 5(July):1–13.
Ali MR, Lawson CA, Wood AM, Khunti K (2023) Addressing ethnic and global health inequalities in the era of artificial intelligence healthcare models: a call for responsible implementation. J R Soc Med 116:1–3. (PMID: 10.1177/01410768231187734)
Google I/O 2023: Making AI more helpful for everyone [Internet]. [cited 2023 Aug 18]. https://blog.google/technology/ai/google-io-2023-keynote-sundar-pichai/#ai-products.
Statement on AI Risk | CAIS [Internet]. [cited 2023 Aug 18]. https://www.safe.ai/statement-on-ai-risk#open-letter.
Pause Giant AI Experiments: An Open Letter-Future of Life Institute [Internet]. [cited 2023 Aug 19]. https://futureoflife.org/open-letter/pause-giant-ai-experiments/.
The Lancet (2023) AI in medicine: creating a safe and equitable future. Lancet [Internet]. 2023 Aug 12 [cited 2023 Aug 18];402(10401):503. http://www.thelancet.com/article/S0140673623016689/fulltext.
فهرسة مساهمة: Keywords: Artificial intelligence; Certification examinations; ChatGPT; Google Bard; Large language models; Sleep medicine
تواريخ الأحداث: Date Created: 20231220 Date Completed: 20240318 Latest Revision: 20240402
رمز التحديث: 20240402
DOI: 10.1007/s00405-023-08381-3
PMID: 38117307
قاعدة البيانات: MEDLINE
الوصف
تدمد:1434-4726
DOI:10.1007/s00405-023-08381-3