Skip to main content
Ottawa 2024
Times are shown in your local time zone GMT

Technical matters in OSCEs

Oral Presentation

Oral Presentation

10:00 am

28 February 2024

M204

Session Program

Matt Homer1
1 University of Leeds, School of Medicine 



Background and context 
In many high-stakes settings, there is an internet-based industry supporting candidates to succeed in OSCE-type assessments. There is a natural concern that some exam material might then, over time, become available to prospective candidates, and this might impact systematically on station and exam difficulty. This study investigates whether there is any evidence that this is happening in practice in the PLAB2 exam in the UK, for international graduates who want to work in the UK NHS, There is only limited, and partially relevant, literature on this issue (e.g. McKinley and Boulet 2004; Baig and Violato 2012). 


Data and methods 
The quantitative analysis models variation in station-level facility (n>18000 station-level observations across >750 different stations over the period 2016 to 2023) and controls for: 

  • different underlying station facility and examiner stringency (via random intercepts) 

  • date (to see if there is an overall trend upwards or downward in facility over time), and, 

  • the number of times a station has been used up to that point. 


Results 
27% of variance in facility is due to station, and 21% to examiner, once other factors are accounted for. Date has a very small negative effect on facility indicating an overall decline of around 1% a year in overall facility across all stations. Finally, there is some small evidence of a slight increase in station facility each time a station is used – every 100 times a station is used its facility increases by an average of 11%. 


Discussion/Conclusion/Take home messages 
This work suggests that 'station leakage' over time is likely quite minimal in terms of having an impact on the difficulty of stations based on how often they have been used. However, policy might be enhanced and the public better reassured via the retiring of stations at a certain level of re-usage. 



References (maximum three) 

Baig LA, Violato C. 2012. Temporal stability of objective structured clinical exams: a longitudinal study employing item response theory. BMC Medical Education. 12(1):121. https://doi.org/10.1186/1472-6920-12-121 

McKinley DW, Boulet JR. 2004. Detecting Score Drift in a High-Stakes Performance-Based Assessment. Adv Health Sci Educ Theory Pract. 9(1):29–38. https://doi.org/10.1023/B:AHSE.0000012214.40340.03 

Peter Yeates1
Becky Edwards1 and Richard Hays2
1 Keele university
2 JCU Murtupuni Centre for Rural & Remote Health




Background
Equivalence can be defined as the tendency for a given student to reach the same outcome in an assessment regardless of where or when they are examined. Ensuring equivalence in high stakes performance assessments (for example OSCEs or standardized patient exams) is vital for patient safety and candidate fairness, as well as a growing focus of regulators. Equivalence is important both within institutions who run large or distributed exams, as well as between institutions nationally. 


Why is the topic important for research and / or practice? 
Several countries or regions internationally have announced or are moving towards national licensing exams, or share OSCE stations to aid alignment. To test clinical skills or performance, this often requires that candidates are examined across several locations, or at multiple times with different groups of examiners. Ensuring alignment of the judgements made by different groups of examiners (at separate times or in different locations) is critical to the fairness of large-scale exams but has traditionally been challenging both to investigate and to support. Recent innovations in this space (Video-based examiner score comparison and adjustment (VESCA)(1), and Video-based benchmarking (VBB)(2)) offer novel potential approaches to calibrate, compare or even equate for examiner differences which can supplement traditional approaches to faculty development. 


Workshop format, including participant engagement methods 
Workshop facilitators will invite participants to share their experiences, opinions and recommendations for enhancing OSCE equivalence. Building from current best-practice recommendations for enhancing graduation level performance assessment(3), facilitators will use case studies to illustrate the practical use of two approaches to enhancing equivalence: 

  1. examiner calibration (VBB) and 

  2. score adjustment (VESCA). 

Using data from recently completed and ongoing research, facilitators will illustrate the extent and impact of examiner variability across locations within an OSCE, and its implications for equivalence. They will present data which informs the opportunities and challenges in using these relatively novel approaches to either calibrate examiners or equate for their differences, including data on the accuracy and effectiveness of these approaches, and data which describe how participants will interact, use and trust each approach. 

Working in groups, participants will critically reflect on the relative merits of trying to calibrate examiners versus adjusting for their differences. Participants will work together to produce actionable plans for enhancing their own OSCEs. 


Who should participate? 
People involved in or interested in OSCEs, particularly when run across multiple parallel tracks or different locations. 


Level of workshop 
All


Workshop outcomes:

By the end of the workshop, participants will be able to: 

  • Describe the importance and challenges of equivalence in distributed OSCEs. 
  • Understand the roles and implications of novel approaches to supporting examiner equivalence. 
  • Critically reflect on the opportunities and challenges of these approaches in relation to their own situation. 
  • Plan potential methods to investigate or support equivalence in their own OSCE settings.


References (maximum three) 

  1. Yeates P, Moult A, Cope N, McCray G, Xilas E, Lovelock T, et al. Measuring the Effect of Examiner Variability in a Multiple-Circuit Objective Structured Clinical Examination (OSCE). Academic Medicine. 2021; 96(8):1189–96. 

  2. Edwards R, Yeates P, Lefroy J, McKinley R. Addressing OSCE Examiner Variability: A Video-Based Benchmarking Approach. In: Association for Medical Education in Europe Annual Conference. 2021. p. 1.1.2: 8682. 

  3. Malau-Aduli BS, Hays RB, D’Souza K, Saad SL, Rienits H, Celenza A, et al. Twelve tips for improving the quality of assessor judgements in senior medical student clinical assessments. Med Teach. 2023: (26):1–5. 

Jinelle Ramlackhansingh1
Fern Brunger1
1 Memorial University 


Background 
This work is part of a critical ethnography examining professional identity development of pre- clinical medical students at one Canadian medical school. The research examined the hidden curriculum conveyed through the OSCE evaluation process. 

Summary
Focus groups with students were conducted every six weeks. Faculty and administrative staff were interviewed. The data was supplemented by participant observation of some classes and governance meetings. The theoretical frameworks of Bourdieu and Foucault were used in analysis. 

Results
Students described the OSCE as a space in which they were placed on display for the purpose ofjudging. As“Mila”putit,“youhavetoknockonthedoorandgoin,there’speoplewatching you behind the glass... ...watching you in the corner and grading you...” 

Discussion
The students unknowingly identified Foucault’s description of the panopticon in their discussion of the OSCE experience. They described how they are under surveillance and are self-disciplined as they are watched and judged during their examination. The power of the panopticon ensures that the students' performance/examination is correctly completed: disciplinary power is exercised through hierarchical observation and examination of students to confirm that the standardized performance is acceptable. 

Conclusions
The OSCE perpetuates a disciplinary discourse as the students are under surveillance. Students are disciplined in the standardization of patient management and run the risk of developing pseudo-competence and hiding patient discrimination in their OSCE. 

Take home
Medical educators should consider that during the OSCE, students are self-disciplined to perform correctly. This surveillance risks students hiding unprofessional prejudices to perform what they understand to be the required mould of a "good doctor." 


References (maximum three) 

Foucault, M. (1979). Discipline and punish: The birth of the prison (Vintage Books ed.). Vintage Books. 

Foucault, M. (1982a). The subject and power. In H. Dreyfus & P. Rabinow (Eds.), Michael Foucault: Beyond structuralism and hermeneutics. (Vol. 1–Book, Section, pp. 208–226). Chicago University Press. 

Hyslop-Margison, E., & Rochester, R. (2016). Assessment or surveillance? Panopticism and higher education. Philosophical Inquiry in Education, 24(1), 102–109. https://doi.org/10.7202/1070559ar 

Richard Hankins1
Matt Homer2 and Javier Caballero1
1 General Medical Council, UK
2 University of Leeds


Abstract 

In high stakes clinical assessments, it is common to utilise a requirement to pass a defined proportion of stations as well as to achieve the pass mark. This secondary hurdle exists to limit compensation between stations and ensure that candidates have the required breadth of knowledge. Whereas the primary standard setting methods used in modern clinical assessments such as Borderline Regression are designed to compensate for variance in exam form difficulty, the secondary hurdle is usually fixed across test administrations. This study sets out to assess if it is practical and beneficial to calculate the conjunctive standard for each diet dynamically, taking into account variance in test form difficulty to produce a more consistent standard. Homer (2023) has proposed methods for dynamically calculating the conjunctive standard in a way that takes into account differences in station difficulty and examiner stringency between administrations. Method: Using data from the OSCE component of the PLAB exam (a medical licensing exam for the UK) we have calculated the conjunctive standard dynamically over a period of three months including over 90 exam administrations. We calculated the diet specific conjunctive standard by regression of total number of stations passed on total score, thereby calculating the borderline standard for each administration. We compare the outcomes of utilising a dynamic approach of calculating the conjunctive standard versus using a fixed hurdle and consider the effect this has on fairness, defensibility and validity. Whilst many in assessment consider the use of a conjunctive standard as necessary in a high stakes setting to limit compensation, and ensure candidates have the range of skills to work safely, it is little studied and understood. Most conjunctive standards are arbitrary and fixed in nature and this study for the first time tests the practical implementation of an objective method of calculating this hurdle. 

References (maximum three) 

Homer, M. 2023. Setting defensible minimum-stations-passed standards in OSCE-type assessments. Medical Teacher., pp.1–7.