Skip to main content
Ottawa 2024
Times are shown in your local time zone GMT

Test development and analysis strategies

E Poster

ePoster

4:00 pm

27 February 2024

Exhibition Hall (Poster 1)

Session Program

Khaled Almisnid1
Matt Homer2, Trudie Roberts3 and Rikki Goddard-Fuller4
1 Medical Education Unit, Unaizah College of Medicine, Qassim University, KSA.
2 Leeds Institute of Medical Education, School of Medicine, University of Leeds, UK 
3 Emeritus Professor of Medical Education, University of Leeds, UK
4 Christie Education, Manchester, UK 



Background


The Objective Structured Clinical Examination (OSCE) is a valuable performance assessment tool widely adopted in medical schools. Despite its significance and widespread use, the literature indicates that many medical schools worldwide struggle to implement effective OSCEs. 


Methodology
This study triangulated data from four sources to generate a comprehensive overview of the processes and procedures required to implement a successful OSCE in a Saudi undergraduate setting: a scoping literature review exploring OSCE implementation across disciplines and regional contexts, documentary analysis, interviews with academic leaders, and focus groups involving clinical teachers who use the OSCE in their practice. 


Results
The analysis adopted a three-stage model for designing the OSCE, encompassing pre-OSCE, peri-OSCE, and post-OSCE processes. Alongside the multiple steps within each of these stages, three key influential factors played a crucial role in each step, including the institutional environment, staff expertise, and the availability of resources and infrastructure. Thoughtful consideration and establishment of these factors are essential to implement each stage effectively. 


Discussion
The integration of empirical research aligns the findings with the existing key literature and enriches and contextualises them. This approach offers a more comprehensive and nuanced understanding of OSCE implementation dynamics and their practical implications for emerging medical schools. 


Conclusions
Institutions should be aware of the extensive time and effort necessary to implement an effective OSCE. Therefore, this study proposes a model to facilitate this pursuit. Furthermore, it underlines that the Western literature/model could have some limitations in developing effective OSCEs in other contexts, stressing the need for contextually relevant practice models. 


Take-home messages
Although the OSCE is a complex assessment tool, when used properly, it can produce results that meet the criteria for good assessment. However, this study suggests that OSCE implementation is context-dependent, highlighting the need for future research on contextual factors and OSCE implementation. 



References (maximum three) 

Bearman, M., Ajjawi, R., Bennett, S. and Boud, D. 2020. The hidden labours of designing the Objective Structured Clinical Examination: a Practice Theory study. Advances in Health Sciences Education. 26, pp.637-651. 

Harden, Lilley, P. and Patricio, M. 2015. The Definitive Guide to the OSCE: The Objective Structured Clinical Examination as a performance assessment. 1 ed. Edinburgh: Elsevier. 

Hodges, B. 2003. Validity and the OSCE. Medical Teacher. 25(3), pp.250-255. 

Chi-Wei Lin1,2,3
Kuo-Chin Huang4,5,3, Pei-Chun Kuo1, I-Ting Liu1,2,3 and Ming-Nan Lin6,3
1 E-Da Hospital
2 I-Shou University
3 Taiwan Association of Family Medicine
4 National Taiwan University Hospital
5 National Taiwan University
6 Dalin Tzu Chi Hospital




Background: 
Since 1986, the Taiwan Association of Family Medicine has overseen the training, evaluations, and specialist examinations of family medicine in Taiwan. The Family Medicine specialist exam includes a preliminary 125 multiple choice questions (MCQs), followed by an oral test for those who pass. Despite rigorous MCQ design, issues like high pass rates and question flaws persisted. The study aims to elucidate the impact of a 2018-initiated intervention to enhance question quality. 


Summary of work: 
A five-year program was introduced wherein testing experts trained MCQ composers. Following each exam, questions underwent a difficulty and discrimination assessment. Feedback was provided to composers, flagging overly simplistic, challenging, or low- discrimination questions for subsequent revisions. 


Results: 
Approximately 130 candidates sit for the board examination annually. Post-intervention data showed that the average discrimination index increased from 0.15 (2017) to 0.23 (2022) (p=0.003). The proportion of questions with low discrimination (D<0.2) decreased from 72% to 46%. Question difficulty remained stable, ranging between 0.70 and 0.75. 


Discussion: 
MCQs, pivotal in global medical exams, present both benefits (flexibility, control over difficulty) and challenges (bias towards memory recall, difficulties in creating high- discrimination questions). This study underscores the potential of continued education and feedback mechanisms to augment the quality of MCQs. 


Conclusions: 
The Taiwan Association of Family Medicine's intervention emphasizes the critical role of ongoing training and feedback in boosting MCQ test quality, suggesting a promising model for other medical education boards globally. 


Take-home messages: 
Addressing challenges in MCQ design, especially those favoring memory recall, requires a dedicated and systematic approach. 

Continued education and feedback mechanisms can significantly enhance the discrimination index and overall quality of MCQs in medical exams. 

The Taiwan Association of Family Medicine initiative's success serves as a model for medical education boards globally, emphasizing the value of expert-led training and consistent feedback. 



References (maximum three) 

PA Coughlin, CR Featherstone. How to Write a High Quality Multiple Choice Question (MCQ): A Guide for Clinicians. Review Eur J Vasc Endovasc Surg. 2017 Nov;54(5):654-658. 

Ghada Khafagy, Marwa Ahmed, Nagwa Saad. Stepping up of MCQs' quality through a multi- stage reviewing process. Educ Prim Care. 2016 Jul;27(4):299-303. 

Piyush Gupta, Pinky Meena, Amir Maroof Khan, Rajeev Kumar Malhotra, Tejinder Singh. Effect of Faculty Training on Quality of Multiple-Choice Questions. Int J Appl Basic Med Res. 2020 Jul-Sep;10(3):210-214. 

Clare Owen1
1 MSC



Background
The Medical Schools Council (MSC) is working with UK medical schools to introduce a national applied knowledge test (AKT) that will form part of the Medical Licencing Assessment regulated by the General Medical Council (GMC). A voluntary Policy Framework has been put in place that covers issues such as the suggested number of permitted attempts students will have to pass the test. 


Summary of work

For the purposes of the voluntary policy it was agreed that students should be granted up to four valid attempts to pass the AKT. MSC seeks to monitor the number of attempts offered across UK medical schools. 


Results

Results will be shared from the 2023 survey of UK medical schools about implementation of the AKT Policy Framework. 


Discussion

When schools are asked the number of attempts permitted the results are very different from when they are given models of first sit and resit cycles to choose from. 

When asked how many valid attempts are your students currently allowed to pass final assessments 9 schools responded saying four attempts but when given this scenario – ‘Where a student fails their first attempt they are offered a second sit in the same academic year and if they fail again they must repeat the academic year. They will then have two further attempts in their repeat year which they must pass or the degree will not be awarded.’ 14 schools said this matched their process. 


Conclusions

The term valid attempt is not well understood by medical schools and care must be taken to define exactly what is meant when seeking to define practice within institutions. 


Take-home messages / implications for further research or practice
Researchers seeking to understand medical school practice must be as precise as possible when asking questions. Even when the question appears straightforward! 

References (maximum three) 

MSC website https://www.medschools.ac.uk/our-work/medical-licensing- assessment accessed August 2023 

Sarunyapong Atchariyapakorn1
Farsai Chiewbangyang1, Yanisa Srisomboon1, Kanik Sritara1, Thananop Pothikamjorn11 and Vorapol Jaroonvanichkul1,2
1 Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
2 Office of Academic Affairs, Chulalongkorn University, Bangkok, Thailand




Background:
Faculty of Medicine, Chulalongkorn University is currently undergoing a curriculum revision aimed at ensuring students' well-being, the cultivation of competent medical graduates, and the fulfilment of stakeholders' needs. Additionally, the school intended to explore the latest trends in student assessment and grading systems to improve our assessment policy. The study aims to identify the most appealing features and classify the pattern of the grading systems from renowned medical schools. 


Summary of work:
The published documents, curricula, and student guides from various global and local medical schools were reviewed. We searched for global perspectives and appealing features in the assessment system from some ASPIRE Award winners and global top-ranked medical schools. Additionally, interviews with executives from five leading Thai medical schools were conducted to understand local trends in curriculum design and their assessment systems. 


Results:
The findings revealed two main grading system types: letter grades and non-letter grades. Letter grades have been used for overall curriculum assessment or in some parts of the program, particularly a clinical clerkship. In contrast, non-letter grading systems are commonly used in Western medical schools, though some only apply them in the early years. We also classify five types of non-letter grades from our studies which can be subcategorized according to the tier in the system: 2-tier, 3-tier, 4-tier, credits/non-credits, and optional systems which were found in many schools. In terms of student assessment trends, low-stake examinations, workplace-based assessments, and formative assessments are recommended. 


Discussion and Conclusions:
Letter grades are majorly existing during clinical clerkship years. Nevertheless, the non-letter grades are more prevalent global trends, featuring a variety of tiers. While some schools prioritize this system exclusively in the pre-clerkship curriculum, most institutions administer it throughout the entire program. 


Take-home messages:
Non-letter grading systems are a worldwide trend, showing a tendency towards increasing adoption. 



References (maximum three) 

1. Association of American Medical Colleges. Grading systems used in medical school programs [Internet]. [cited 2023 Aug 9]. Available from: https://www.aamc.org/data- reports/curriculum-reports/data/grading-systems-used-medical-school-programs 

2. Bloodgood R, Short J, Jackson J, Martindale J. A change to pass/fail grading in the first two years at one medical school results in improved psychological well-being. Academic Medicine. 2009;84(5):655–62. doi:10.1097/acm.0b013e31819f6d78 

3. Durning SJ, Hemmer PA. Commentary: Grading. Academic Medicine. 2012;87(8):1002–4. doi:10.1097/acm.0b013e31825d0b3 

Jillian Yeo1
Dujeepa Samarasekera1, Gominda Ponnamperuma2, Ruth Lim3 and Sabrina Wong4
1 Centre for Medical Education (CenMED), Yong Loo Lin School of Medicine, National University of Singapore
2 Department of Medical Education, Faculty of Medicine, University of Colombo
3 Ministry of Health, Singapore
4 National Healthcare Group Polyclinics, Singapore 



Modified Angoff (MA) method is an extensively studied standard setting technique. However, literature reported variations in cut scores when the standard setters (i.e., judges) changed (Tavakol & Dennick, 2017; Taylor et al., 2017). When applying MA, judges face two challenges: (a) defining knowledge/skills required of borderline candidates vis-a-vis exam items; and (b) estimating the probability of a borderline candidate answering an item correctly (Tavakol & Dennick, 2017). This study aimed to find which of these two poses the highest challenge to the judges. 

The study was conducted with 15 judges for a 65-item Family Medicine exit examination. All judges used three formats of MA. 

Format 1: Traditional MA standard setting method. 

Format 2: After the initial steps of MA, judges rated each item based on three criteria: frequency of the item encountered in practice; clinical/practical relevance of the item; and difficulty of the item. Both relevance and difficulty were rated on a 5-point scale while frequency was rated on a 4-point scale. Then, judges provided a probability estimate per item considering the three ratings they gave. 

Format 3: Same as Format 2, but judges’ probabilities were not considered. Instead, a probability estimation guide was used to convert the three ratings of Format 2 for a given item into a probability. 

Cut-score generated by Format 1 (61.8) differed from that of Format 2 (66.1). This suggested that in traditional MA, judges utilised different criteria and/or attached different weights to those criteria. Format 3 generated a cut-score of 65.6 which was closer to the standard generated by Format 2. 

These results suggest that defining uniform criteria to judge an item is the major challenge for the judges, rather than converting decisions on those criteria into a probability. 



References (maximum three) 

Tavakol, M., & Dennick, R. (2017). The foundations of measurement and assessment in medical education. Medical teacher, 39(10), 1010–1015. Doi: 10.1080/0142159X.2017.1359521 

Taylor, C. A., Gurnell, M., Melville, C. R., Kluth, D. C., Johnson, N., & Wass, V. (2017). Variation in passing standards for graduation-level knowledge items at UK medical schools. Medical education, 51(6), 612–620. Doi: 10.1111/medu.13240