Standard setters’ challenge in defining criteria and its impact on the

Times are shown in your local time zone GMT

Standard setters’ challenge in defining criteria and its impact on the cut-score

E Poster Presentation

Edit Your Submission

Edit

Favourite

ePoster Presentation

4:25 pm

27 February 2024

Exhibition Hall (Poster 1)

Test development and analysis strategies

Themes

Theme 7: Assessors and assessing

ePoster

100% Page: /

Presentation Description

Jillian Yeo¹
Dujeepa Samarasekera¹, Gominda Ponnamperuma², Ruth Lim³ and Sabrina Wong⁴
1 Centre for Medical Education (CenMED), Yong Loo Lin School of Medicine, National University of Singapore
2 Department of Medical Education, Faculty of Medicine, University of Colombo
3 Ministry of Health, Singapore
4 National Healthcare Group Polyclinics, Singapore

Modified Angoff (MA) method is an extensively studied standard setting technique. However, literature reported variations in cut scores when the standard setters (i.e., judges) changed (Tavakol & Dennick, 2017; Taylor et al., 2017). When applying MA, judges face two challenges: (a) defining knowledge/skills required of borderline candidates vis-a-vis exam items; and (b) estimating the probability of a borderline candidate answering an item correctly (Tavakol & Dennick, 2017). This study aimed to find which of these two poses the highest challenge to the judges.

The study was conducted with 15 judges for a 65-item Family Medicine exit examination. All judges used three formats of MA.

Format 1: Traditional MA standard setting method.

Format 2: After the initial steps of MA, judges rated each item based on three criteria: frequency of the item encountered in practice; clinical/practical relevance of the item; and difficulty of the item. Both relevance and difficulty were rated on a 5-point scale while frequency was rated on a 4-point scale. Then, judges provided a probability estimate per item considering the three ratings they gave.

Format 3: Same as Format 2, but judges’ probabilities were not considered. Instead, a probability estimation guide was used to convert the three ratings of Format 2 for a given item into a probability.

Cut-score generated by Format 1 (61.8) differed from that of Format 2 (66.1). This suggested that in traditional MA, judges utilised different criteria and/or attached different weights to those criteria. Format 3 generated a cut-score of 65.6 which was closer to the standard generated by Format 2.

These results suggest that defining uniform criteria to judge an item is the major challenge for the judges, rather than converting decisions on those criteria into a probability.

References (maximum three)

Tavakol, M., & Dennick, R. (2017). The foundations of measurement and assessment in medical education. Medical teacher, 39(10), 1010–1015. Doi: 10.1080/0142159X.2017.1359521

Taylor, C. A., Gurnell, M., Melville, C. R., Kluth, D. C., Johnson, N., & Wass, V. (2017). Variation in passing standards for graduation-level knowledge items at UK medical schools. Medical education, 51(6), 612–620. Doi: 10.1111/medu.13240

Speakers

Jillian Yeo

Medical Educationalist

National University of Singapore - Singapore, Singapore