Times are shown in your local time zone GMT
Ad-blocker Detected - Your browser has an ad-blocker enabled, please disable it to ensure your attendance is not impacted, such as CPD tracking (if relevant). For technical help, contact Support.
Developing and using test data
Workshop
Workshop
11:00 am
27 February 2024
M203
Themes
Theme 8: Evaluation
Session Program
11:00 am
Terry Judd
Lachlan McOmish1, Simone Elliott11, David Swanson1, and Anna Ryan1,
1 The University of Melbourne
Lachlan McOmish1, Simone Elliott11, David Swanson1, and Anna Ryan1,
1 The University of Melbourne
Background
Assessment item development is a key aspect of designing effective medical and health professional curricula. Multiple-choice questions (MCQs) are commonly used to evaluate students' knowledge and critical thinking abilities. However, crafting high-quality MCQs can be time-consuming and challenging for educators. Generative language models, and ChatGPT in particular, have gained considerable notoriety for their ability to pass MCQ-based exams such as the USMLE (Gilson et. al., 2023), but can also be leveraged to create assessment items (Biswas, 2023; Agarwal, Sharma & Goswami, 2023). This workshop introduces an innovative tool called the 'Item Writer's Workbench' that harnesses and scaffolds the power of ChatGPT to assist in the development assessment items for application in health professional curricula.
Importance for Research and Practice
Efficiently generating well-constructed assessment items is essential for maintaining the rigor and validity of educational assessments. The 'Item Writer's Workbench' addresses the need for a time-saving solution while ensuring the quality and relevance of MCQs. Leveraging ChatGPT's text generation capabilities, the tool enhances the item writing process by providing a user-friendly interface and a range of customizable templates optimized for various item types.
This workshop presents an opportunity for health professional educators to explore and utilize the tool and to assess whether it, and the use of ChatGPT more generally, can enhance their item writing capacity and skills.
Workshop Format
The workshop will be interactive and hands-on, combining presentations, demonstrations, and practical exercises. Following a short presentation on the fundamentals of item design, and an introduction to the workbench tool, participants will have the opportunity to firstly explore its capabilities, learning how to create detailed and targeted query statements using its customisable templates and then apply their learning to develop draft items relevant to their own assessment contexts. The workshop will conclude with a discussion on best practices for utilising ChatGPT appropriately and effectively in assessment development processes.
Target Audience
This workshop is designed for educators and curriculum and assessment developers involved in health professional education. The workshop is open to participants of all expertise levels, catering to both novice and experienced item writers. Educators new to assessment item writing will benefit from the tool's guidance and assistance in constructing MCQs, while seasoned practitioners can explore how the tool streamlines their existing workflows and enhances their item development strategies.
Outcomes and Implications for Further Practice
By attending this workshop, participants will:
· Gain familiarity with the 'Item Writer's Workbench' tool and its features (participants will continue to have access to an online version of the tool beyond the workshop).
· Understand how to utilise customisable templates to generate detailed query statements for ChatGPT.
· Experience the practical application of ChatGPT in generating draft assessment items for use in health professional curricula.
· Explore best practices for integrating ChatGPT into their own item development processes.
References (maximum three)
Gilson, A., Safranek, C. W., Huang, T., Socrates, V., Chi, L., Taylor, R. A., & Chartash, D. (2023). How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment.JMIR Medical Education,9(1), e45312.
Biswas, S. (2023). Passing is great: can ChatGPT conduct USMLE exams?.Annals of Biomedical Engineering, 1-2.
Agarwal, M., Sharma, P., & Goswami, A. (2023). Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology.Cureus,15(6).
12:00 pm
Neville Chiavaroli1
Clare Ozolins1
1 Australian Council for Educational Research
Clare Ozolins1
1 Australian Council for Educational Research
Background
The importance of validity in assessment is widely accepted, as is the notion of validity as an argument based on evidence (Cook et al., 2015). One of the key forms of evidence for validity arguments is item analysis data, especially for MCQ exams. While many forms of data are available, the most common statistics for written examinations are the item facility/difficulty, discrimination index and distractor analyses (Tavakol & Dennick, 2016). These data are an essential part of quality assurance for any high-stakes examinations and for making defensible decisions based on the results of the examination. While such data is increasingly available through online teaching and testing platforms, their interpretation and application are not always made clear or adequately supported.
The importance of validity in assessment is widely accepted, as is the notion of validity as an argument based on evidence (Cook et al., 2015). One of the key forms of evidence for validity arguments is item analysis data, especially for MCQ exams. While many forms of data are available, the most common statistics for written examinations are the item facility/difficulty, discrimination index and distractor analyses (Tavakol & Dennick, 2016). These data are an essential part of quality assurance for any high-stakes examinations and for making defensible decisions based on the results of the examination. While such data is increasingly available through online teaching and testing platforms, their interpretation and application are not always made clear or adequately supported.
Workshop format and participants
This workshop will explore the nature of these statistics, how they can be calculated using simple software, and most importantly, how they can be interpreted and applied to participants’ own testing contexts to evaluate the quality of examination items and contribute to the validity of results and decisions. In addition, the workshop will also briefly demonstrate other data which can be calculated using more advanced approaches and software and which provide further value from a validity perspective such as mean ability, item characteristic curves, and differential item functioning (Hope et al, 2018). While focussed primarily on the use of item analysis for MCQs, the workshop will also demonstrate how to apply and interpret similar data for short answer questions.
The workshop will include several practical activities guided by the facilitators, including calculating key statistics with sample data using basic spreadsheet software; group review of sample items with corresponding item analysis data for interpretation practice; and demonstration of further statistical and visual data available with more advanced item analysis software. Participants are encouraged to bring their own laptops and item data for practice purposes, but these are not necessary for attending the workshop.
This workshop will be useful for health professional educators involved in constructing written examinations and who wish to have a sound understanding of how item analysis can be used to guide item selection and support decisions based on examination results. The workshop is primarily aimed at educators who have limited experience in using and/or generating item analyses.
Workshop outcomes
After attending this workshop, participants will be able to:
- calculate item facility/difficulty, discrimination indices and distractor analysis using simple software;
- understand the meaning and significance of these statistics for evaluating the quality of test items; and
- appreciate the value of these statistics in relation to validity arguments for decisions based on examination results.
References (maximum three)
- Cook, D. A., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: a practical guide to Kane's framework. Medical Education, 49(6), 560-575.
- Hope, D., Adamson, K., McManus, I. C., Chis, L., & Elder, A. (2018). Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment. BMC Medical Education, 18(1), 1-7.
- Tavakol, M., & Dennick, R. (2016). Postexamination analysis: a means of improving the exam cycle. Academic Medicine, 91, 1324.