ePoster
100% Page: /
Presentation Description
An Jian Leung1
Nisha Suyien Chandran1 and Meiqi May Liau1
1 National University Hospital, Singapore
Nisha Suyien Chandran1 and Meiqi May Liau1
1 National University Hospital, Singapore
Background:
Artificial intelligence (AI) has advanced dramatically in recent years. ChatGPT is an online chatbot which utilizes algorithms to deliver human-like responses with razor-sharp accuracy. InstructGPT is a sibling AI program touted to perform natural language tasks using engineered text prompts. These language processing models had successfully passed the United States Medical Licensing Examination. We were curious if they could pass the Dermatology Specialty Certificate Examinations (SCE), a postgraduate qualification by the Membership of the Royal College of Physicians (MRCP) United Kingdom (UK).
Artificial intelligence (AI) has advanced dramatically in recent years. ChatGPT is an online chatbot which utilizes algorithms to deliver human-like responses with razor-sharp accuracy. InstructGPT is a sibling AI program touted to perform natural language tasks using engineered text prompts. These language processing models had successfully passed the United States Medical Licensing Examination. We were curious if they could pass the Dermatology Specialty Certificate Examinations (SCE), a postgraduate qualification by the Membership of the Royal College of Physicians (MRCP) United Kingdom (UK).
Aim:
We aim to evaluate the performance of ChatGPT 3.0 & InstructGPT on questions from the dermatology SCE.
We aim to evaluate the performance of ChatGPT 3.0 & InstructGPT on questions from the dermatology SCE.
Methods:
We fed 90 open-source multiple-choice sample questions from the Dermatology SCE by MRCP UK into ChatGPT 3.0 and InstructGPT. Supporting images like clinical photographs or histology slides were unable to be processed by either AI model.
We fed 90 open-source multiple-choice sample questions from the Dermatology SCE by MRCP UK into ChatGPT 3.0 and InstructGPT. Supporting images like clinical photographs or histology slides were unable to be processed by either AI model.
Results:
ChatGPT answered 76% (66) correctly. It offered alternative answers for 1.1% (1), declined to answer 1.1% (1) due to insufficient information, and submitted wrong answers for 24% (22) of the questions. All correctly answered questions had logical justifications. InstructGPT answered 51% (46) correctly and 49% (44) wrongly. The typical pass mark for the dermatology SCE is 70-72%.
Discussion:
ChatGPT 3.0 outperformed InstructGPT and scored a accuracy rate sufficient to pass the SCE. The discrepant result between ChatGPT and InstructGPT was expected, as InstructGPT was coded to perform language tasks rather than answering questions. Both AI models were limited in their ability to process images, a critical component of clinical reasoning in dermatology. Future studies on language processing models should conduct a subtype analysis on wrongly answered questions to delineate deficiencies in these language processing models. The ability for an untrained AI model to pass the SCE has wide ranging implications for dermatological clinical education and medical practice.
ChatGPT 3.0 outperformed InstructGPT and scored a accuracy rate sufficient to pass the SCE. The discrepant result between ChatGPT and InstructGPT was expected, as InstructGPT was coded to perform language tasks rather than answering questions. Both AI models were limited in their ability to process images, a critical component of clinical reasoning in dermatology. Future studies on language processing models should conduct a subtype analysis on wrongly answered questions to delineate deficiencies in these language processing models. The ability for an untrained AI model to pass the SCE has wide ranging implications for dermatological clinical education and medical practice.
References (maximum three)
Passby L, Jenko N, Wernham A. Performance of ChatGPT on dermatology Specialty Certificate Examination multiple choice questions. Clin Exp Dermatol. 2023 Jun 2:llad197. doi: 10.1093/ced/llad197. Epub ahead of print. PMID: 37264670.