Skip to main content
Ottawa 2024
Times are shown in your local time zone GMT

Identifying and Evaluating Rater Effects: A Primer with Examples from a Formative OSCE Designed to Assess Clinical Reasoning

Conference Workshop
Edit Your Submission
Edit

Presentation Description

Thai Ong1
Su Somay2
1 National Board of Medical Examiners
2 NBME




Background and Importance of Topic 
Performance assessments and human raters are often employed in medical education to evaluate critical medical competencies (e.g., clinical skills via objective structured clinical examination [OSCE]). Although human raters provide insights not easily captured by automated algorithms, “... raters are human, and they are therefore subject to all the errors to which humankind must plead guilty” (Guilford, 1936, p.272). For this reason, human- generated scores, even with well-developed rater training and scoring guidelines, are prone to various rater effects and may be confounded with construct-irrelevant variance, posing a threat to validity of score interpretations (AERA, APA, & NCME, 2014). Rater effects, or systematic differences in how raters evaluate performance, can stem from different sources, such as implicit biases and/or lack of familiarity with the rubric (Myford & Wolfe, 2003). Given the threat to validity, medical schools relying on human raters should investigate rater effects and their impact on inferences made from the scores. 

Several statistical approaches to identify and evaluate rater effects exist, ranging from simple to complex methods. The statistical approach chosen for an assessment program depends on various factors, such as number of raters, number of cases, and rater effects of interest. Therefore, having knowledge of both the assessment program (from the participants) and framework for selecting from the various statistical approaches (from the workshop) is ideal for determining the most appropriate statistical approach for an assessment program. 


Workshop Format 
In this workshop, we aim to provide participants with a) a primer on common rater biases and b) a primer on several statistical methods to identify and evaluate rater effects through applied examples from a formative OSCE designed to assess clinical reasoning skills. The workshop will be interactive in nature: 

  1. Introduction (5 Minutes) 

  2. Didactic presentation on common rater bias (10 minutes) 

  3. Group discussion (5 minutes) 

1. How does your school/program currently evaluate rater bias? 

  1. Didactic presentation on statistical approaches to evaluating rater effects (30 minutes) 

  2. Group discussion (5 minutes) 

    1. How can you apply what you learn today to your own program? 

    2. What are some potential obstacles to applying? 


Participants 
We encourage all medical and assessment professionals with any level of experience with performance assessment using human raters to evaluate students to attend the workshop. 


Level of Workshop 
We intend to create workshop material at the introductory to intermediate level. 


Workshop Outcomes 
By the end of the workshop, participants will be able to:
 1. Understand sources of common rater biases
 2. Understand several statistical methods to evaluate rater effects Maximum Number of Participants
30 



References (maximum three) 
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. 

Guilford, J. P. (1936). Psychometric methods. New York, NY: McGraw-Hill 

Myford, C. M. & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet rasch measurement: Part I. Journal of Applied Measurement, 4, 386422. 

Speakers