Identifying and Evaluating Rater Effects: A Primer with Examples from

Times are shown in your local time zone GMT

Identifying and Evaluating Rater Effects: A Primer with Examples from a Formative OSCE Designed to Assess Clinical Reasoning

Conference Workshop

Edit Your Submission

Edit

Favourite

Conference Workshop

10:30 am

26 February 2024

M212

Assessing clinical reasoning and decision making

Themes

Theme 7: Assessors and assessing

Presentation Description

Thai Ong¹
Su Somay²
1 National Board of Medical Examiners
2 NBME

Background and Importance of Topic

Performance assessments and human raters are often employed in medical education to evaluate critical medical competencies (e.g., clinical skills via objective structured clinical examination [OSCE]). Although human raters provide insights not easily captured by automated algorithms, “... raters are human, and they are therefore subject to all the errors to which humankind must plead guilty” (Guilford, 1936, p.272). For this reason, human- generated scores, even with well-developed rater training and scoring guidelines, are prone to various rater effects and may be confounded with construct-irrelevant variance, posing a threat to validity of score interpretations (AERA, APA, & NCME, 2014). Rater effects, or systematic differences in how raters evaluate performance, can stem from different sources, such as implicit biases and/or lack of familiarity with the rubric (Myford & Wolfe, 2003). Given the threat to validity, medical schools relying on human raters should investigate rater effects and their impact on inferences made from the scores.

Several statistical approaches to identify and evaluate rater effects exist, ranging from simple to complex methods. The statistical approach chosen for an assessment program depends on various factors, such as number of raters, number of cases, and rater effects of interest. Therefore, having knowledge of both the assessment program (from the participants) and framework for selecting from the various statistical approaches (from the workshop) is ideal for determining the most appropriate statistical approach for an assessment program.

Workshop Format

In this workshop, we aim to provide participants with a) a primer on common rater biases and b) a primer on several statistical methods to identify and evaluate rater effects through applied examples from a formative OSCE designed to assess clinical reasoning skills. The workshop will be interactive in nature:

Introduction (5 Minutes)
Didactic presentation on common rater bias (10 minutes)
Group discussion (5 minutes)

1. How does your school/program currently evaluate rater bias?

Didactic presentation on statistical approaches to evaluating rater effects (30 minutes)
Group discussion (5 minutes)
1. How can you apply what you learn today to your own program?
2. What are some potential obstacles to applying?

Participants

We encourage all medical and assessment professionals with any level of experience with performance assessment using human raters to evaluate students to attend the workshop.

Level of Workshop

We intend to create workshop material at the introductory to intermediate level.

Workshop Outcomes

By the end of the workshop, participants will be able to:
1. Understand sources of common rater biases
2. Understand several statistical methods to evaluate rater effects Maximum Number of Participants
30

References (maximum three)

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Guilford, J. P. (1936). Psychometric methods. New York, NY: McGraw-Hill

Myford, C. M. & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet rasch measurement: Part I. Journal of Applied Measurement, 4, 386422.

Speakers

Thai Ong