Skip to main content
Ottawa 2024
Times are shown in your local time zone GMT

Standard setters’ challenge in defining criteria and its impact on the cut-score

E Poster Presentation
Edit Your Submission
Edit

ePoster Presentation

4:25 pm

27 February 2024

Exhibition Hall (Poster 1)

Test development and analysis strategies

ePoster

100% Page:   /  

Presentation Description

Jillian Yeo1
Dujeepa Samarasekera1, Gominda Ponnamperuma2, Ruth Lim3 and Sabrina Wong4
1 Centre for Medical Education (CenMED), Yong Loo Lin School of Medicine, National University of Singapore
2 Department of Medical Education, Faculty of Medicine, University of Colombo
3 Ministry of Health, Singapore
4 National Healthcare Group Polyclinics, Singapore 



Modified Angoff (MA) method is an extensively studied standard setting technique. However, literature reported variations in cut scores when the standard setters (i.e., judges) changed (Tavakol & Dennick, 2017; Taylor et al., 2017). When applying MA, judges face two challenges: (a) defining knowledge/skills required of borderline candidates vis-a-vis exam items; and (b) estimating the probability of a borderline candidate answering an item correctly (Tavakol & Dennick, 2017). This study aimed to find which of these two poses the highest challenge to the judges. 

The study was conducted with 15 judges for a 65-item Family Medicine exit examination. All judges used three formats of MA. 

Format 1: Traditional MA standard setting method. 

Format 2: After the initial steps of MA, judges rated each item based on three criteria: frequency of the item encountered in practice; clinical/practical relevance of the item; and difficulty of the item. Both relevance and difficulty were rated on a 5-point scale while frequency was rated on a 4-point scale. Then, judges provided a probability estimate per item considering the three ratings they gave. 

Format 3: Same as Format 2, but judges’ probabilities were not considered. Instead, a probability estimation guide was used to convert the three ratings of Format 2 for a given item into a probability. 

Cut-score generated by Format 1 (61.8) differed from that of Format 2 (66.1). This suggested that in traditional MA, judges utilised different criteria and/or attached different weights to those criteria. Format 3 generated a cut-score of 65.6 which was closer to the standard generated by Format 2. 

These results suggest that defining uniform criteria to judge an item is the major challenge for the judges, rather than converting decisions on those criteria into a probability. 



References (maximum three) 

Tavakol, M., & Dennick, R. (2017). The foundations of measurement and assessment in medical education. Medical teacher, 39(10), 1010–1015. Doi: 10.1080/0142159X.2017.1359521 

Taylor, C. A., Gurnell, M., Melville, C. R., Kluth, D. C., Johnson, N., & Wass, V. (2017). Variation in passing standards for graduation-level knowledge items at UK medical schools. Medical education, 51(6), 612–620. Doi: 10.1111/medu.13240 

Speakers