Skip to main content
Ottawa 2024
Times are shown in your local time zone GMT

“Which way were you leaning?” The impact of two borderline categories in borderline regression standard setting

Oral Presentation
Edit Your Submission
Edit

Oral Presentation

2:00 pm

27 February 2024

M209

Standard setting and validity

Presentation Description

Jacob Pearce1
Vernon Mogol1, Gabes Lau2, Barry Soans2 and Anne Rogan2
1 Australian Council for Educational Research
2 Royal Australian and New Zealand College of Radiologists




Borderline regression standard setting is considered best-practice for determining pass marks in Objective Structured Clinical Examinations (OSCEs).(1) Candidates receive question-based marks for stations and examiners also provide a global ratings of candidate performance. The global scales themselves may be purely categorical, but are often 5-point ordinal scales. Recent work has interrogated whether these scales are also interval (of equal distance) in practice.(2) However, the impact of which category labels are chosen on the validity of the standard setting process is under-researched. 

A 6-point categorical scale was applied during borderline regression for the Royal Australian and New Zealand College of Radiology (RANZCR) OSCERs. This scale was based on a number of similar categorical scales from the assessment literature, and did not involve numbers.(3) The scale comprised three ‘passing’ categories: Outstanding, Clear Pass and Borderline Pass, and three ‘failing’ categories: Borderline Fail, Clear Fail and Significant Concerns. We hypothesised that two borderline categories would be more helpful to examiners, than the one borderline category used in the previous RANZCR Viva Examinations. When examiners are pressed on a borderline rating, they can often tell you which way they are leaning. Examiners underwent training and calibration, and were advised that “borderline pass” should be considered a “minimally competent candidate”. 

The BR standard setting data was highly detailed and psychometrically robust. Separating the borderlines into two categories worked well in practice. Examiners found the application of the scale straightforward. The data demonstrated an empirical difference between the two borderline categories, and provided more nuanced assessment evidence for review by the OSCER panel. 

The precise wording used in categorical rating scales does impact the standard setting outcomes. But the more important factor to consider is how examiners conceptualise the minimally competent candidate, and appreciate the differences between levels of candidate performance captured in the rating scale. 



References (maximum three) 

1. Boursicot, K., Kemp, S., Wilkinson, T., Findyartini, A., Canning, C., Cilliers, F., & Fuller, R. (2021). Performance assessment: Consensus statement and recommendations from the 2020 Ottawa Conference, Medical Teacher, 43:1, 58-67, DOI: 10.1080/0142159X.2020.1830052 

2. McGown, P.J., Brown, C.A., Sebastian, A. et al. (2022). Is the assumption of equal distances between global assessment categories used in borderline regression valid?. BMC Med Educ 22, 708. https://doi.org/10.1186/s12909-022-03753-5 

3. Pearce, J., Reid, K., Chiavaroli, N., Hyam, D. (2021). Incorporating aspects of programmatic assessment into examinations: aggregating rich information to inform decision-making. Med Teach. 43(5):567-574. DOI:10.1080/0142159X.2021.1878122 

Speakers