Presentation Description
Thai Ong1
Su Somay2
1 National Board of Medical Examiners
2 NBME
Su Somay2
1 National Board of Medical Examiners
2 NBME
Background and Importance of Topic
Performance assessments and human raters are often employed in medical education to evaluate critical medical competencies (e.g., clinical skills via objective structured clinical examination [OSCE]). Although human raters provide insights not easily captured by automated algorithms, “... raters are human, and they are therefore subject to all the errors to which humankind must plead guilty” (Guilford, 1936, p.272). For this reason, human- generated scores, even with well-developed rater training and scoring guidelines, are prone to various rater effects and may be confounded with construct-irrelevant variance, posing a threat to validity of score interpretations (AERA, APA, & NCME, 2014). Rater effects, or systematic differences in how raters evaluate performance, can stem from different sources, such as implicit biases and/or lack of familiarity with the rubric (Myford & Wolfe, 2003). Given the threat to validity, medical schools relying on human raters should investigate rater effects and their impact on inferences made from the scores.
Several statistical approaches to identify and evaluate rater effects exist, ranging from simple to complex methods. The statistical approach chosen for an assessment program depends on various factors, such as number of raters, number of cases, and rater effects of interest. Therefore, having knowledge of both the assessment program (from the participants) and framework for selecting from the various statistical approaches (from the workshop) is ideal for determining the most appropriate statistical approach for an assessment program.
Workshop Format
In this workshop, we aim to provide participants with a) a primer on common rater biases and b) a primer on several statistical methods to identify and evaluate rater effects through applied examples from a formative OSCE designed to assess clinical reasoning skills. The workshop will be interactive in nature:
- Introduction (5 Minutes)
- Didactic presentation on common rater bias (10 minutes)
- Group discussion (5 minutes)
1. How does your school/program currently evaluate rater bias?
- Didactic presentation on statistical approaches to evaluating rater effects (30 minutes)
- Group discussion (5 minutes)
- How can you apply what you learn today to your own program?
- What are some potential obstacles to applying?
- How can you apply what you learn today to your own program?
Participants
We encourage all medical and assessment professionals with any level of experience with performance assessment using human raters to evaluate students to attend the workshop.
Level of Workshop
We intend to create workshop material at the introductory to intermediate level.
Workshop Outcomes
By the end of the workshop, participants will be able to:
1. Understand sources of common rater biases
2. Understand several statistical methods to evaluate rater effects Maximum Number of Participants
30
1. Understand sources of common rater biases
2. Understand several statistical methods to evaluate rater effects Maximum Number of Participants
30
References (maximum three)
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
Guilford, J. P. (1936). Psychometric methods. New York, NY: McGraw-Hill
Myford, C. M. & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet rasch measurement: Part I. Journal of Applied Measurement, 4, 386422.