All things being equal? Impact of examiner differences on students’

Times are shown in your local time zone GMT

All things being equal? Impact of examiner differences on students’ outcomes in a multi-centre graduation-level OSCE

Oral Presentation

Edit Your Submission

Edit

Favourite

Oral Presentation

2:45 pm

27 February 2024

M203

Virtual, dispersed and disrupted OSCEs

Themes

Theme 4: Clinical Assessment Formats

Presentation Description

Peter Yeates¹
Adriano Maluf², Natalie Cope¹, Gareth McCray¹, Kathy Cullen³, Vikki O'Neill³, Rhian Goodfellow⁴, Rebecca Vallander⁴, Ching-wa Chung⁵ and Richard Fuller⁶
1 Keele university
2 de Montford University
3 Queens University Belfast
4 Cardiff University
5 University of Aberdeen
6 Christie Hospitals NHS Foundation Trust

Introduction

Ensuring inter-institutional equivalence of graduation-level OSCE decisions is critical to fairness and patient safety, however methodological challenges mean this is rarely studied. Recently, an innovation called video-based examiner score comparison and adjustment (VESCA)(1) has enabled linked comparison of examiners within distributed OSCE. Since prior research has hinted at potentially substantial inter-institutional differences(2), we used VESCA to determine the equivalence of different parallel groups (“examiner-cohorts”) within and between UK medical schools, and the impact of adjusting for any differences on students’ pass rate.

Methods

We ran the same 6-station formative OSCE at four UK medical schools(3). After examining live performances, examiners additionally scored three station-specific comparison videos which provided 1/ controlled comparison of examiners’ scoring between schools and 2/ data linkage within a linear mixed model. Impact of adjusting for examiner variations on students’ pass/fail and rank were calculated.

Results

Controlled comparison of examiners’ scores differed between schools by up to 16.3% from 16.52 (95%CIs 15.52-17.52) out of 27 to 19.96 (95%Cis 18.94-20.97) out of 27, p< 0.001. Examiner-cohorts varied more between schools than within schools (16.3% vs 8.8%). Students’ unadjusted scores suggested inter-school variation in students’ performances of up to 10.8% (17.65(16.87-18.43) to 19.91(19.13-20.69),p<0.001), which was no longer present after adjusting for examiner differences (18.38(17.25-19.52) to 19.14((18.19-20.10), 3.62% difference, p=0.69), thereby suggesting the apparent difference was attributable to examiner, rather than student, variation. Failure rates varied between schools and were substantially

altered by score adjustment (e.g. school 2: observed score failure rate=39.1%; adjusted failure rate=8.7%; school 4 observed=0.0%, adjusted=21.7%).

Discussion and Conclusions:

We found substantial inter-institutional differences in examiner stringency which would challenge the equivalence of outcomes if replicated within a summative setting. These apparent variations in graduation-level expectations warrant prospective investigation in summative settings to safeguard equivalence nationally. VESCA offers a feasible method to perform these comparisons.

References (maximum three)

1. Yeates P, Moult A, Cope N, McCray G, Xilas E, Lovelock T, et al. Measuring the Effect of Examiner Variability in a Multiple-Circuit Objective Structured Clinical Examination (OSCE). Academic Medicine. 2021;96(8):1189–96.

2.Sebok SS, Roy M, Klinger D a, De Champlain AF. Examiners and content and site: Oh My! A national organization’s investigation of score variation in large-scale performance assessments. Adv Health Sci Educ 2015;20(3):581–94.

3. Peter Yeates, Adriano Maluf, Ruth Kinston, Natalie Cope, Gareth McCray, Kathy Cullen, et al. Enhancing Authenticity, Diagnosticity and Equivalence (AD-Equiv) in multi-centre OSCE exams in Health Professionals Education. Protocol for a Complex Intervention Study. BMJ Open. 2022;12:e064387. doi: 10.1136/bmjopen-2022-064387

Speakers

Peter Yeates

Senior Lecturer

Keele University - Manchester, United Kingdom