Ottawa 2024

Times are shown in your local time zone GMT

MCQs

Oral Presentation

10:00 am

28 February 2024

M214

Themes

Theme 1: Assessment across the continuum

Session Program

10:00 am

The effect of item writing flaws on student performance in MCQ-based assessments

Daniel Nguyen - Medical Student - Flinders University

Daniel Nguyen¹
Lambert Schuwirth
1 Flinders University

Multiple-choice questions (MCQ) are effective assessment tools, but poorly constructed they can lead to errors (so-called item-writing flaws; IWFs) and have a negative impact on pass-fail outcomes in high-stakes assessments. Therefore, item-writing guidelines exist to mitigate these errors. But empirical evidence supporting these item-writing guidelines is limited; only few studies explore how and whether IWFs produce an effect. Our study aims to investigate the effect of IWFs on student scores. Two groups of effects are studied: false-positive, where students guess a question correctly despite not having the knowledge and the opposite: false- negative effect. Two sub-studies were conducted. The first contained easily-answerable questions with IWFs likely to lead to false-negative responses. The same questions without the IWF were used as controls. The second sub-study explored false-positive effects by using nonsense items with obvious IWF cues towards the correct option, that were not answerable with any knowledge. For the first sub-study differences between the correct answers on the IWF version of the items and the correctly-formulated item was used as an outcome measure. For the second, the outcome was whether the p_correct value was higher than chance. Overall conclusions were drawn at the level of the whole test, across all items in each sub-test. Preliminary data showed 17/21 items showing a false-positive effect (binomial p =0.0044) and 11/15 items had an indication of a false-negative effect (binomial p = 0.06). We concluded that IWFs impact the scores of students in the false-positive direction. Regarding the false-negative direction the effect does not reach the 5% threshold but the numbers in the preliminary data are still too small yet to more definitely claim the absence of an effect. Our power analysis indicated an n=60 is needed. But even if only a false-positive effect exists it would be meaningful enough to warrant careful item-review procedures.

References (maximum three)

Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Advances in health sciences education, 10(2), 133-143.

Schuwirth, L., & Pearce, J. (2014). Determining the Quality of Assessment Items in Collaborations: Aspects to Discuss to Reach Agreement Developed by the Australian Medical Assessment Collaboration. https://research.acer.edu.au/higher_education/42

10:15 am

Evidence of multiple-choice questions in health profession education settings – A systematic review of literature

Slavko Rogan - Lecturer - Bern University of Applied Sciences, Department of Health Professions, Division of Physiotherapy

Slavko Rogan¹
Eefje Luijckx¹, Annika Stampfler¹, Caroline Aubry¹, Antonia Hauswirth¹, Evert Zinzen², Prasath Jayakaran³ and Angela Blasimann¹
1 Bern University of Applied Sciences, Department of Health Professions, Division of Physiotherapy
2 Vrije Universiteit Brussel, Faculty of Physical Education and Physiotherapy
3 University of Otago, School of Physiotherapy

Background:

In higher education, assessments play an important role to evaluate learning outcomes. Multiple choice question (MCQ) is a favored tool to evaluate factual knowledge.

Summary of work:

The focus of this systematic review was to provide an overview of current practice and recommendations on designing MCQs in health professional education.

Methods:

The PICo tool (P: Population, Problems; I: Intervention or Phenomena of Interest; Co: Context) was used to formulate a research question regarding criteria for MCQs. Potential articles were identified by a Boolean search on the PubMed database with "multiple choice question," "item analysis OR number of items," AND "students” as search terms. In addition, hand searches were completed on the reference list of the included studies. Studies with qualitative, quantitative, and mixed-method designs were included.

Results:

24 articles were included from which eight main categories were identified. The MCQs should 1) contain clinical vignettes (n=2), 2) avoid sources of error so-called “cues” (n=1), 3) be fair (e.g. avoidance of complex language or sentence structure, question should match the case vignette) (n=3), 4) prefer a 3-option MC item (1 option which includes the key, plus 2 distractors) (n=6), 5) include 30 MC questions or more (n=3), 6) be item-analyzed to improve validity and reliability (n=5), 7) be made available to students to improve learning outcomes (n=2) and 8) use the number-right test (n = 5).

Conclusion:

In education of medical doctors and health professionals, MCQs should contain clinical vignettes, omit cues, be fair regarding language, include 3-option items, have 30 MC questions or more, be item-analyzed and available for students. The number-right scoring should be used as the test score. The correct answers are given a positive score and the total of the points for correct answers results in the test score.

References (maximum three)

Al-Wardy, N. M. (2010). Assessment methods in undergraduate medical education. Sultan Qaboos University Medical Journal, 10(2), 203.
Palmer, E., & Devitt, P. (2006). Constructing multiple choice questions as a method for learning. Annals-academy of medicine Singapore, 35(9), 604.
Stern, T. (2014) What is good action research? Reflections about quality criteria. In Action Research, Innovation and Change: International perspectives across disciplines (pp. 202-220). Routledge.

10:30 am

Effect of Removal of a Guessing Penalty on Medical Student MCQ Performance

Quang Ngo - Associate Professor - McMaster University

Quang Ngo¹
Keyna Bracken², Helen Neighbour¹, Mike Lee-Poy¹, Rebecca Long¹, Jeffrey McCarthy¹, Jeremy Sandor¹ and Matthew Sibbald¹
1 DeGroote School of Medicine, McMaster University
2 Michael G. DeGroote School of Medicine, McMaster University, AMEE member

Multiple choice exams (MCEs) in medical education are an efficient means to assess knowledge and clinical reasoning while maintaining standardization.

There is debate in the literature as to whether MCEs should penalize wrong answers. Proponents argue this rewards accuracy while maintaining test validity. Those against state that it unfairly penalizes risk takers who are making educated, rather than uninformed guesses. Evidence is emerging that this may contribute to gender inequity in MCEs.

The undergraduate medical education program at McMaster University uses a longitudinal progress test called the Personal Progress Inventory (PPI) to monitor knowledge acquisition, taken 8 times over the course of the program. Traditionally there has been a 0.25 mark penalty for incorrect answers.

The penalty for guessing was removed in January 2023 due to concerns the penalty unfairly targeted risk averse groups and increased test anxiety without improving validity. This natural experiment allowed us to compare the effects of the penalty on exam outcomes.

Means of the class scores before and after the removal of the guessing penalty were compared using ANOVA. Data was available for 2 sittings of the PPI (Feb and May/23) for the 1st and 2nd year cohorts. Compared to matched historical cohorts, the means were higher after elimination of the penalty for both sittings of the PPI and for both cohorts (p<0.05). No interaction was found between penalty and year.

The reasons for the change in scores is likely multifactorial. Anxiety is known to hinder performance and reducing anxiety could contribute to improvement. With respect to test validity, we continue to see scores increase over time, regardless of penalty, suggesting the expertise gradient is preserved. Cut scores for this exam are norm-referenced and the number of below-threshold students has not changed. Future work should identify risk averse groups (i.e gender or specialty choice).

References (maximum three)

Lucy R. Betts, Tracey J. Elder, James Hartley & Mark Trueman (2009) Does correction for guessing reduce students’ performance on multiple‐choice examinations? Yes? No? Sometimes?, Assessment & Evaluation in Higher Education, 34:1, 1- 15, DOI: 10.1080/02602930701773091

Coffman KB, Klinowski D. The impact of penalties for wrong answers on the gender gap in test scores. Proceedings of the National Academy of Sciences. 2020;117(16). pmid:32253310