ChatGPT versus InstructGPT - which is the superior dermatologist?

Times are shown in your local time zone GMT

ChatGPT versus InstructGPT - which is the superior dermatologist? Comparing artificial intelligence language models used in Dermatology specialty certificate exam questions.

E Poster Presentation

Edit Your Submission

Edit

Favourite

ePoster Presentation

10:30 am

26 February 2024

Exhibition Hall (Poster 1)

Technology and AI in assessment

Themes

Theme 6: Technology Enhanced Assessment

ePoster

100% Page: /

Presentation Description

An Jian Leung¹
Nisha Suyien Chandran¹ and Meiqi May Liau¹
1 National University Hospital, Singapore

Background:
Artificial intelligence (AI) has advanced dramatically in recent years. ChatGPT is an online chatbot which utilizes algorithms to deliver human-like responses with razor-sharp accuracy. InstructGPT is a sibling AI program touted to perform natural language tasks using engineered text prompts. These language processing models had successfully passed the United States Medical Licensing Examination. We were curious if they could pass the Dermatology Specialty Certificate Examinations (SCE), a postgraduate qualification by the Membership of the Royal College of Physicians (MRCP) United Kingdom (UK).

Aim:
We aim to evaluate the performance of ChatGPT 3.0 & InstructGPT on questions from the dermatology SCE.

Methods:
We fed 90 open-source multiple-choice sample questions from the Dermatology SCE by MRCP UK into ChatGPT 3.0 and InstructGPT. Supporting images like clinical photographs or histology slides were unable to be processed by either AI model.

Results:
ChatGPT answered 76% (66) correctly. It offered alternative answers for 1.1% (1), declined to answer 1.1% (1) due to insufficient information, and submitted wrong answers for 24% (22) of the questions. All correctly answered questions had logical justifications. InstructGPT answered 51% (46) correctly and 49% (44) wrongly. The typical pass mark for the dermatology SCE is 70-72%.

Discussion:
ChatGPT 3.0 outperformed InstructGPT and scored a accuracy rate sufficient to pass the SCE. The discrepant result between ChatGPT and InstructGPT was expected, as InstructGPT was coded to perform language tasks rather than answering questions. Both AI models were limited in their ability to process images, a critical component of clinical reasoning in dermatology. Future studies on language processing models should conduct a subtype analysis on wrongly answered questions to delineate deficiencies in these language processing models. The ability for an untrained AI model to pass the SCE has wide ranging implications for dermatological clinical education and medical practice.

References (maximum three)

Passby L, Jenko N, Wernham A. Performance of ChatGPT on dermatology Specialty Certificate Examination multiple choice questions. Clin Exp Dermatol. 2023 Jun 2:llad197. doi: 10.1093/ced/llad197. Epub ahead of print. PMID: 37264670.

Speakers

Meiqi May Liau

National University Hospital, Singapore