Presentation Description
Peter Yeates1
Adriano Maluf2, Natalie Cope1, Gareth McCray1, Kathy Cullen3, Vikki O'Neill3, Rhian Goodfellow4, Rebecca Vallander4, Ching-wa Chung5 and Richard Fuller6
1 Keele university
2 de Montford University
3 Queens University Belfast
4 Cardiff University
5 University of Aberdeen
6 Christie Hospitals NHS Foundation Trust
Adriano Maluf2, Natalie Cope1, Gareth McCray1, Kathy Cullen3, Vikki O'Neill3, Rhian Goodfellow4, Rebecca Vallander4, Ching-wa Chung5 and Richard Fuller6
1 Keele university
2 de Montford University
3 Queens University Belfast
4 Cardiff University
5 University of Aberdeen
6 Christie Hospitals NHS Foundation Trust
Introduction
Ensuring inter-institutional equivalence of graduation-level OSCE decisions is critical to fairness and patient safety, however methodological challenges mean this is rarely studied. Recently, an innovation called video-based examiner score comparison and adjustment (VESCA)(1) has enabled linked comparison of examiners within distributed OSCE. Since prior research has hinted at potentially substantial inter-institutional differences(2), we used VESCA to determine the equivalence of different parallel groups (“examiner-cohorts”) within and between UK medical schools, and the impact of adjusting for any differences on students’ pass rate.
Methods
We ran the same 6-station formative OSCE at four UK medical schools(3). After examining live performances, examiners additionally scored three station-specific comparison videos which provided 1/ controlled comparison of examiners’ scoring between schools and 2/ data linkage within a linear mixed model. Impact of adjusting for examiner variations on students’ pass/fail and rank were calculated.
Results
Controlled comparison of examiners’ scores differed between schools by up to 16.3% from 16.52 (95%CIs 15.52-17.52) out of 27 to 19.96 (95%Cis 18.94-20.97) out of 27, p< 0.001. Examiner-cohorts varied more between schools than within schools (16.3% vs 8.8%). Students’ unadjusted scores suggested inter-school variation in students’ performances of up to 10.8% (17.65(16.87-18.43) to 19.91(19.13-20.69),p<0.001), which was no longer present after adjusting for examiner differences (18.38(17.25-19.52) to 19.14((18.19-20.10), 3.62% difference, p=0.69), thereby suggesting the apparent difference was attributable to examiner, rather than student, variation. Failure rates varied between schools and were substantially
altered by score adjustment (e.g. school 2: observed score failure rate=39.1%; adjusted failure rate=8.7%; school 4 observed=0.0%, adjusted=21.7%).
Discussion and Conclusions:
We found substantial inter-institutional differences in examiner stringency which would challenge the equivalence of outcomes if replicated within a summative setting. These apparent variations in graduation-level expectations warrant prospective investigation in summative settings to safeguard equivalence nationally. VESCA offers a feasible method to perform these comparisons.
References (maximum three)
1. Yeates P, Moult A, Cope N, McCray G, Xilas E, Lovelock T, et al. Measuring the Effect of Examiner Variability in a Multiple-Circuit Objective Structured Clinical Examination (OSCE). Academic Medicine. 2021;96(8):1189–96.
2.Sebok SS, Roy M, Klinger D a, De Champlain AF. Examiners and content and site: Oh My! A national organization’s investigation of score variation in large-scale performance assessments. Adv Health Sci Educ 2015;20(3):581–94.
3. Peter Yeates, Adriano Maluf, Ruth Kinston, Natalie Cope, Gareth McCray, Kathy Cullen, et al. Enhancing Authenticity, Diagnosticity and Equivalence (AD-Equiv) in multi-centre OSCE exams in Health Professionals Education. Protocol for a Complex Intervention Study. BMJ Open. 2022;12:e064387. doi: 10.1136/bmjopen-2022-064387