Skip to main content
Ottawa 2024
Times are shown in your local time zone GMT

Technology and AI in assessment

E Poster

ePoster

10:30 am

26 February 2024

Exhibition Hall (Poster 1)

Session Program

An Jian Leung1
Nisha Suyien Chandran1 and Meiqi May Liau1
1 National University Hospital, Singapore 


Background:
Artificial intelligence (AI) has advanced dramatically in recent years. ChatGPT is an online chatbot which utilizes algorithms to deliver human-like responses with razor-sharp accuracy. InstructGPT is a sibling AI program touted to perform natural language tasks using engineered text prompts. These language processing models had successfully passed the United States Medical Licensing Examination. We were curious if they could pass the Dermatology Specialty Certificate Examinations (SCE), a postgraduate qualification by the Membership of the Royal College of Physicians (MRCP) United Kingdom (UK). 


Aim:
We aim to evaluate the performance of ChatGPT 3.0 & InstructGPT on questions from the dermatology SCE. 


Methods:
We fed 90 open-source multiple-choice sample questions from the Dermatology SCE by MRCP UK into ChatGPT 3.0 and InstructGPT. Supporting images like clinical photographs or histology slides were unable to be processed by either AI model. 


Results:
ChatGPT answered 76% (66) correctly. It offered alternative answers for 1.1% (1), declined to answer 1.1% (1) due to insufficient information, and submitted wrong answers for 24% (22) of the questions. All correctly answered questions had logical justifications. InstructGPT answered 51% (46) correctly and 49% (44) wrongly. The typical pass mark for the dermatology SCE is 70-72%. 

Discussion:
ChatGPT 3.0 outperformed InstructGPT and scored a accuracy rate sufficient to pass the SCE. The discrepant result between ChatGPT and InstructGPT was expected, as InstructGPT was coded to perform language tasks rather than answering questions. Both AI models were limited in their ability to process images, a critical component of clinical reasoning in dermatology. Future studies on language processing models should conduct a subtype analysis on wrongly answered questions to delineate deficiencies in these language processing models. The ability for an untrained AI model to pass the SCE has wide ranging implications for dermatological clinical education and medical practice. 



References (maximum three) 

Passby L, Jenko N, Wernham A. Performance of ChatGPT on dermatology Specialty Certificate Examination multiple choice questions. Clin Exp Dermatol. 2023 Jun 2:llad197. doi: 10.1093/ced/llad197. Epub ahead of print. PMID: 37264670. 

Michelle Mun1,2
Alastair Sloan1 and Samantha Byrne3
1 Melbourne Dental School, University of Melbourne, Melbourne, Australia
2 Centre for Digital Transformation of Health, University of Melbourne, Melbourne, Australia
3 Melbourne Dental School, The University of Melbourne




Background:
The emergence of generative artificial intelligence (AI) systems, including large language models, has garnered significant attention in education and healthcare. Future dentists must be adaptable to changing healthcare landscapes, including AI integration. In dental education, simulation-based learning is grounded in experiential learning theory and has extensive support for use in skills acquisition (Gaba, 2004). Similarly, generative AI systems can be leveraged to create safe environments for learning about the use, benefits and limitations of AI in health education and clinical practice (Mohammad et al., 2023). 


Summary of work:
This project explores student perceptions of using large language models for writing assessments in dental education. First-year dental students are required to complete a 1000-word essay as part of their Population Oral Health subject. In 2023, the essay was designed to incorporate use of AI. Students were required to use the openly available generative AI system, ChatGPT, to formulate, critique, and edit an essay response to a given topic, using a provided rubric. The students were also asked to reflect on the utility and reliability of using AI for generating the essay. Contribution of their assessment submission to the study was voluntary. 


Results and Discussion:
Data collection is ongoing and includes transcription, manual coding and thematic analysis of qualitative data regarding student experience and perceptions of the use of ChatGPT. Key themes, discussion and conclusion resulting from the analysis will be presented at the conference. 


Take-home messages:
In this project we seek to formalise knowledge about the effect of using generative AI systems on learning and assessment for preclinical dental students. This project addresses a gap in the literature regarding AI in health education and student perspectives of AI, and may inform the design and format of learning activities, assessments, guidelines and policy for using generative AI in health education assessment. 


References (maximum three) 

Gaba, D. M. (2004) The future vision of simulation in health care. BMJ Quality & Safety, 13(1). https://doi.org/10.1136/qshc.2004.009878 

Mohammad, B., Supti, T., Alzubaidi, M., Shah, H., Alam, T., Shah, Z. & Househ, M. (2023) The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review. Stud Health Technol Inform. 29;305:644-647. doi: 10.3233/SHTI230580. 

piyaporn sirijanchune1
Panomkorn Lakham2
1 medical education center, Chiangrai Prachanukroh hospital 


Clinical skills remain fundamental to medicine and form a core component of the professional identity of medical students. Artificial intelligence (AI) is the simulation of human intelligence rapidly advancing technology to revolutionize medical education. AI provides personalized assistance with the assessment. This study explored the value of AI programs to assess the clinical skills of medical students and compared them to traditional assessment methods. 

A total of 37 fifth-year medical students enrolled in the study. A cross-sectional study of medical students from June 2022-2023. During the clinical skills assessment, the medical students interact with the examination station for diagnosis and treatment. There are 6 stations of the formative evaluation of clinical internal medicine categorized into 3 domains; 1. history taking and physical examination, 2. diagnosis and treatment, 3. advice and patient education. The AI program uses processing to score the medical student’s diagnosis in real-time with immediate insight into their performance. AI to score student performance on in-person components of clinical skills assessment compared to the traditional assessment methods. 

The AI group had a higher score than the traditional group, the mean score was 85.27+4.28 and 81.25+5.12 out of 100 points, respectively. There was no significant difference in mean score with P-value of 0.07. The Interrater reliability between AI and traditional assessment was 46.36%, Kappa 0.443. The stress in the AI group was 43% compared to the traditional group with 80%. The AI assessment agreed with traditional assessment tools in 80% of cases, with 92% sensitivity and 90% positive predictive value. 

AI can be used to assess clinical skills with good quality and accuracy as traditional methods, demonstrated favorable performance with consistent results and immediate feedback to the students. The AI had the advantage of decreasing the error of the assessment which improved the reliability and validity of the assessment. 




References (maximum three) 

1. Daniel, Michelle & Rencic, Joseph & Durning, Steven & Holmboe, Eric & Santen, Sally & Lang, Valerie & Ratcliffe, Temple & Gordon, David & Heist, Brian & Lubarsky, Stuart & Estrada, Carlos & Ballard, Tiffany & Artino, Anthony & Da Silva, Ana & Cleary, Tim & Stojan, Jennifer & Gruppen, Larry. (2019). 

2. Ahmed, Ishtiaq & Ishtiaq, Sundas. (2020). ASSESSMENT METHODS IN MEDICAL EDUCATION: A REVIEW 1 2 ISHTIAQ AHMED, SUNDAS ISHTIAQ. 6. 95-102. Clinical Reasoning Assessment Methods: A Scoping Review and Practical Guidance. Academic Medicine. 1. 10.1097/ACM.0000000000002618. 

3. González-Calatayud V, Prendes-Espinosa P, Roig-Vila R. Artificial Intelligence for Student Assessment: A Systematic Review. Applied Sciences. 2021; 11(12):5467. https://doi.org/10.3390/app11125467 

Syed Latifi
Mark Healy1
1 Weill Cornell Medicine-Qatar



Background and Aim 
The advent of Large Language Models (LLMs) has revolutionized the writing process. One intriguing application of LLMs is the generation of vignette-style multiple-choice questions (MCQs). Vignette-style MCQs involve the presentation of a brief scenario or vignette followed by a set of multiple-choice options, that are both contextually rich and diverse in content. 

The aim of this e-poster is to demonstrate the potential of LLMs and compare the abilities of three popular LLMs, as a valuable resource-saving educational tool, in generating acceptable vignette-style questions. 


Methodology
Three popular LLMs will be used to generate the questions. Prompts will be developed to generate these questions that will provide context to the question and relevance by mimicking real-world problem-solving scenarios. Initial prompts will be further refined through prompt engineering to generate MCQs with acceptable validity from the perspective of subject matter expert (faculty). 

Next, each question will be evaluated by the faculty expert using a rubric for item writing flaws [1] and an Item-Writing Flaw Ratio will be computed [2]. The cognitive level of items will also be evaluated using Buckwalter’s rubric [3]. 


Results
The study is currently in the design phase, and the data will be collected between Sep-Oct 2023, where faculty will assess the item quality. The results will be analyzed and reported between Nov-Dec 2023. 


Discussion and Implication for future
This study will have significant implications for the design and development of reliable assessments, in which the power of machines (LLMs) can be harnessed and utilized to create draft questions for faculty to review and refine. 


Take home:
It is hypothesized that the questions generated via LLMs will be comparable to human-developed questions in terms of quality and educational effectiveness. This could alleviate the time and resource constraints associated with the conventional process of faculty generated item generation. 



References (maximum three) 

  1. Tarrant, M., & Ware, J. (2008). Impact of item‐writing flaws in multiple‐choice questions on student achievement in high‐stakes nursing assessments. Medical education, 42(2), 198-206. 

  2. Przymuszała, P., Piotrowska, K., Lipski, D., Marciniak, R., & Cerbin-Koczorowska, M. (2020). Guidelines on writing multiple choice questions: a well-received and effective faculty development intervention. SAGE Open, 10(3), 2158244020947432. 

  3. Buckwalter, J. A., Schumacher, R., Albright, J. P., & Cooper, R. R. (1981). Use of an educational taxonomy for evaluation of cognitive performance. Journal of Medical Education, 56(2), 115-21. 

Tracy Mendolia1
Chaya Prasad2 and Fanglong Dong1
1 Western University of Health Sciences
2 AAMC




In the post-COVID-19 era, medical students have increasingly embraced various EdTech modalities, prompting educators to explore new approaches for delivering asynchronous content to enhance learning. Microlearning has shown its effectiveness in facilitating learning, comprehension, and retention. We conducted a study to investigate the implementation and transition from a traditional pre-recorded lecture format to an online eLearning module (e- Module) segmented into microlearning chunks and its impact on student-perceived learning effectiveness, appeal, and general satisfaction, comparing it to archived recorded video lectures. 

During 2021 and 2022, a comparative study was conducted with year 2 medical students, who had access to two learning modalities: a microlearning e-module and recorded lectures. The e- module, created using Rise360 software, comprised short video lectures, text, images, and interactive learning assets, segmented into microlearning chunks. Students compared their experiences with both modalities on the same topic. Feedback survey assessed the effectiveness, appeal, and overall satisfaction of each approach. 

The new format was perceived to be more organized for easy search and review of information (26.6% vs 71.9%), easier to locate specific lecture topics (34.8% vs 79.7%), and more visually appealing (28.6% vs 79.7%) and meaningful in conveying written material (27.9% vs 71.6%). Students reported feeling more engaged (32% vs 65.7%), and the online module was seen as more effective in helping them retain information (25.6% vs 62.7%) and better prepare for future exams (18.8% vs 55.2%). 

Our 2-year comparative study highlighted the advantages of microlearning e-modules over traditional recorded lectures. It underscores the importance of innovative instructional approaches to enhance student engagement, motivation, and learning outcomes. Incorporating microlearning strategies into medical education can be a promising step toward fostering a more effective learning environment and better preparation for future exams. 

  • Microlearning e-modules are effective in medical education. 
  • Microlearning e-module positively impacts engagement, motivation, and preparation for future exams. 


References (maximum three) 

Conceicao, S. C. O., Strachota, E., & Schmidt, S. W. (2007). Academy of Human Resource Development International Research Conference in The Americas. In The Development and Validation of an Instrument to Evaluate Online Training Materials. Retrieved 2021, from https://eric.ed.gov/?id=ED504339. 

Sirwan Mohammed, G., Wakil, K., & Sirwan Nawroly, S. (2018). The effectiveness of microlearning to improve students’ learning ability. International Journal of Educational Research Review, 3(3), 32–38. https://doi.org/10.24331/ijere.415824. 

Severine Lancia1 Catherine Johnson1
1 Elsevier


Technology enhancements can play a critical role in meeting the needs of both educators and learners, presenting real opportunities to deliver personalized assessment and timely feedback. 

Global disruption caused by the Covid-19 pandemic forced institutions to rapidly adopt and adapt online education, while also embracing technology to offer inventive assessment solutions in Health Professions Education. 

There is a lot to learn from that growth in assessment innovation. 

The GME division of Elsevier has been looking at best practices in TEA across the globe post- pandemic (Asia, Africa, Australasia, Europe, North America, South Africa), identifying innovative solutions to advance the authenticity of assessment, engage learners, optimize assessment delivery and track learners’ progress, but also significant challenges, particularly the digital inequity amongst learners and institutions. 

Available resources vary greatly between countries and between institutions within the same country, as universities exhibit diversity in terms of digital accessibility, infrastructure for connectivity, capacity of devices, affordability, and students' digital capabilities. 

We’ve seen a rapid growth in computer-based assessment, digital open-book assessment, video-based assessment, gamification in assessment, and BYOD (Bring Your Own Device) assessment. 

The post-pandemic TEA future should focus on promoting digital equity and using technology to drive authentic assessment, supporting compassion, self-care, self-development. 

Drawing on interviews with educators and students, this work attempts at capturing new approaches to assessment post-Covid, providing a common understanding of the TEA and taking into consideration technology that promotes equity and accessibility. 



References (maximum three) 

Fuller R, Goddard VCT, Nadarajah VD, Treasure-
 Jones T, Yeates P, Scott K, Webb A, Valter K, Pyorala E. 2022. Technology enhanced 

assessment: Ottawa consensus statement and recommendations. Med Teac. 44(8):836– 850. [Taylor & Francis Online], [Web of Science ®], [Google Scholar] 

Fuller R, Joynes V, Cooper J, Boursicot K, Roberts T. 2020. Could Covid-19 be our ‘there is no alternative’ (TINA) opportunity to enhance assessment?Med Teach. 42(7):781– 786. [Taylor & Francis Online], [Web of Science ®], [Google Scholar] 

EDUCAUSE. 2021. Top IT issues, 2021: emerging from the pandemic; [accessed 2022 Jan 21]. https://er.educause.edu/articles/2020/11/top-it-issues-2021-emerging-from-the-pandemic. [Google Scholar] 

Rungroj Angwatcharaprakan1
Chaowaphon Ittiphanitphong1
1 Sawanpracharak Hospital



Background:
In medical education, assessing students' knowledge application traditionally relies on methods like Multiple-Choice Questions (MCQs). The emergence of Large Language Models (LLMs), specifically ChatGPT, can enhance medical educators' ability to formulate higher-volume of quality MCQs. However, concerns exist about LLM capabilities, especially data accuracy and credibility. Validation by medical experts is crucial for data integrity. This research proposes an approach that combines medical educators' expertise with LLM to enhance MCQ-based assessments. 


Summary of Work:
Refining prompts to communicate with LLM for MCQs creation; MCQs characterized by the application of knowledge style, with coherent key options and distractors in terms of sentence length and category; adaptability of question topics, and accommodating queries for diagnosis, differential diagnosis, further investigations, or management; cross- sectional study, gathering and analyzing clinical-level medical students' responses to forming the basis of a model for assessment enhancement. 


Results:
Our pilot study yielded dual outcomes. Firstly, collaborative MCQs creation exhibited medical educator-LMM synergy, generating complex, contextually relevant questions and options. Secondly, student responses highlighted the optimum acceptability index and efficiency of distractors, indicating assessment efficacy enhancement. 


Discussion:
LLM-generated MCQs displayed notable clinical vignette accuracy, but unclear option detail. Expert refinement improved accuracy, enabling efficient creation and enhancement of question structures. Responses from medical students showed the efficiency of the questions. 


Conclusions:
This research advances MCQ-based medical education assessment. Through medical educator-LMM collaboration, pilot model prompts showcase enhanced MCQs assessing medical knowledge and clinical reasoning. Preliminary findings encourage further implementation. 


Take-home Messages:
Collaborative MCQ creation with LLM enhances assessment efficiency, and quality, reflecting clinical accuracy and promoting the application of knowledge in medical education. LLM alone is inadequate for MCQs; educator expertise is vital for accurate and comprehensive question development. 



References (maximum three) 

Campbell DE. How to write good multiple-choice questions. J Paediatr Child Health. 2011 Jun;47(6):322-5. doi: 10.1111/j.1440-1754.2011.02115.x. Epub 2011 May 25. PMID: 21615597. 

Fozzard, N., Pearson, A., du Toit, E. et al. Analysis of MCQ and distractor use in a large first year Health Faculty Foundation Program: assessing the effects of changing from five to four options. BMC Med Educ 18, 252 (2018). https://doi.org/10.1186/s12909-018-1346-4 

Eysenbach G. The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers. JMIR Med Educ 2023;9:e46885. URL: https://mededu.jmir.org/2023/1/e46885. DOI: 10.2196/46885