Presentation Description
Joanna McFarlane1
Marcus Edwards1
1 Australian Pharmacy Council
Marcus Edwards1
1 Australian Pharmacy Council
This presentation describes a trial of a psychometric theory(1) in high stakes exams, and how software design can support widening the bottleneck of exam content development.
In early 2021 our psychometric consultants suggested ‘pairwise scaling’ as a solution to alleviate the bottleneck of adding new items to our Intern Written exam. Similar to a comparative judgement technique used for marking(2), we designed a tool for SMEs to compare new and anchor items to produce a dataset used to calculate a perceived scale rating for use as scored items in live exams.
The method has been trialled in 2 remote workshops producing comparison data to determine the scale values for new items to use in our exams in 2021 and 2022. Data from the workshops was analysed for judge consistency and produced perceived item difficulty for new items informed by the anchor items and used when developing our 2022 exam forms.
Data results showed SME speed during the task was a key success factor in alignment with responses across the group, despite their clinical backgrounds. Results analysis and evaluation of all exam sessions in 2022 shows consistency between SME data and live candidate data. We identified outliers across all items presented 2022 that were evaluated and resolved through revising our training messages for SMEs in subsequent workshops.
We believe our application of pairwise scaling is an effective method to alleviate the bottleneck of developing exam content for our exams. and invite all discussion and any suggestions to the process application or data evaluation and analyses during our presentation too.
Our experiences can inform other health programs seeking to use innovative methods and understand the effectiveness of pairwise scaling as a robust method to develop new items for scaled exams.
References (maximum three)
1. Andrich D. (1978) 'Relationships between the Thurstone and Rasch approaches to item scaling’, Applied Psychological Measurement, 2, 2, 451-462.
2. Bramley T. (2005) 'A Rank-Ordering Method for Equating Tests by Expert Judgement', Journal of Applied Measurement, 6(2), 202-223.