Times are shown in your local time zone GMT
Ad-blocker Detected - Your browser has an ad-blocker enabled, please disable it to ensure your attendance is not impacted, such as CPD tracking (if relevant). For technical help, contact Support.
Artificial intelligence
Oral Presentation
Oral Presentation
4:00 pm
27 February 2024
M209
Session Program
4:00 pm
Carmel Tepper1
1 Bond University
1 Bond University
Background:
In 2022, Bond transitioned high-stakes written clinical examinations to Progress Tests; longitudinal knowledge examinations of ‘exit level’ content, required to be known by interns. As use of online resources is now considered common practice - and sometimes best practice - in the clinical workplace, the Progress Tests were conducted Open Book. January 2023 brought the rise of Generative Artificial Intelligence (Gen-AI) technology with potential to disrupt assessments, particularly knowledge based tests with open internet access.
Summary of work
We opted to embrace this technology and provide students with clear permission to access Gen- AI during Open Book written exams. We then advised students on effective use of Gen-AI for clinical exam questions and warned them of current limitations of the tool. I will share with you steps taken to secure the integrity of our Open Book examinations with Gen-AI.
Results
We used a variety of methods during construction of our exam questions to maximise requirements for students’ to activate their clinical reasoning prior to use of any online tools. The aim was to ensure that our written exams remained fit for purpose in determining student clinical knowledge competency for progression. Exam performance data of two Open Book Progress Tests will be shared that indicates we need not fear Gen-AI but can work with our medical students to teach them how to use these tools effectively to best prepare them for authentic clinical work practices.
Conclusions
Early data suggests that use of Gen-AI in Open Book Progress Tests provides minimal advantage to medical students in conducting Open Book Progress Tests. Gen-AI continues to evolve and grow in the online space. We can best prepare medical students for their authentic future professional activity by educating them on effective use of Gen-AI to support their core medical knowledge and clinical reasoning.
References (maximum three)
1. Loh E. BMJ Leader Published Online First: Accessed 2nd August 2023. doi:10.1136/leader- 2023-000797
4:15 pm
Natasa Lazarevic1
Libby Newton2
1 RACP
2 Royal Australasian College of Physicians
Libby Newton2
1 RACP
2 Royal Australasian College of Physicians
The advent and widespread access to large language models such as ChatGPT and AI writing co-pilots has disrupted traditional methods of teaching and assessment in medical education [1]. The ability of these models to respond to complex prompts and generate high-quality written content such as research articles has led to some experts suggesting that this signals the end of traditional assessments [2]. Ultimately, AI needs to be accepted as a fait accompli, as it will continue to disrupt medical education in both positive and negative ways, raising both opportunities for innovation as well as concerns about academic assessment and integrity [3].
The Royal Australasian College of Physicians (RACP), like many other organisations, is appraising the impacts that AI presents to medical education. As such, the RACP is developing a response framework to guide organisational activities that accept the inevitable influence of AI and seeks to capitalise on opportunities while mitigating threats. In developing the framework, we first identified available position statements and guidelines related to the use of AI in education and particularly medical education. We analysed these to extract key insights and identify commonalities and variations in philosophy and actions. Building on this, we are working with stakeholders to identify priorities, opportunities, and challenges, and outline recommendations for next steps for our organisation.
Thus far, the key challenges identified include impacts to cultural safety, implicit bias, ethics, research and academic integrity, credibility, and data privacy. The two most significant recommendations we are developing within our response framework are: (1) shifts in assessment design and suggestions for how to approach this, and (2) the need to develop governance processes and policies that regulate the responsible use of AI in medical education. This presentation will summarise the RACP’s evolving response framework, the development process and lessons learned along the way.
References (maximum three)
[1] Masters, Ken. 2023. “Ethical Use of Artificial Intelligence in Health Professions Education: AMEE Guide No. 158.” Medical Teacher 45 (6): 574–84. https://doi.org/10.1080/0142159X.2023.2186203.
[2] Devlin, Hannah. 2023. “AI Likely to Spell End of Traditional School Classroom, Leading Expert Says.” The Guardian, July 7, 2023, sec. Technology. https://www.theguardian.com/technology/2023/jul/07/ai-likely-to-spell-end-of-traditional- school-classroom-leading-expert-says.
[3] Cotton, Debby R. E., Peter A. Cotton, and J. Reuben Shipway. 2023. “Chatting and Cheating: Ensuring Academic Integrity in the Era of ChatGPT.” Innovations in Education and Teaching International 0 (0): 1–12. https://doi.org/10.1080/14703297.2023.2190148.
4:30 pm
Leo Morjaria1
Levi Burns1, Keyna Bracken1,2, Quang Ngo1,2, Mark Lee2 and Matthew Sibbald1,2
1 Michael G. DeGroote School of Medicine, McMaster University
2 McMaster Education Research, Innovation and Theory (MERIT) Program
Levi Burns1, Keyna Bracken1,2, Quang Ngo1,2, Mark Lee2 and Matthew Sibbald1,2
1 Michael G. DeGroote School of Medicine, McMaster University
2 McMaster Education Research, Innovation and Theory (MERIT) Program
Background:
Following ChatGPT’s launch [1], it achieved passing grades on question subsets of standardized medical licensing exams, such as the USMLE [2,3]. In this way, ChatGPT introduces a threat to the validity of medical student assessment. This study evaluates the extent of this threat to short-answer assessment problems that are used as important learning benchmarks for pre-clerkship students.
Following ChatGPT’s launch [1], it achieved passing grades on question subsets of standardized medical licensing exams, such as the USMLE [2,3]. In this way, ChatGPT introduces a threat to the validity of medical student assessment. This study evaluates the extent of this threat to short-answer assessment problems that are used as important learning benchmarks for pre-clerkship students.
Summary of work:
Using 40 problems from past student assessments, 30 responses were generated by ChatGPT, and 10 minimally passing responses were drawn from past students. Problems were selected to encompass both lower and higher-order cognitive domains. Minimally passing responses were chosen as they reflect the standard of competency expected of students in the program. Six experienced tutors graded all 40 responses. Standard statistical techniques were applied to compare performance between student-generated and ChatGPT- generated answers. ChatGPT performance was also compared with historical student averages at our institution.
Using 40 problems from past student assessments, 30 responses were generated by ChatGPT, and 10 minimally passing responses were drawn from past students. Problems were selected to encompass both lower and higher-order cognitive domains. Minimally passing responses were chosen as they reflect the standard of competency expected of students in the program. Six experienced tutors graded all 40 responses. Standard statistical techniques were applied to compare performance between student-generated and ChatGPT- generated answers. ChatGPT performance was also compared with historical student averages at our institution.
Results:
ChatGPT-generated responses received a score of 3.29 out of 5 (n=30, 95% CI 2.93- 3.65) compared to 2.38 for minimally passing students (n=10, 95% CI 1.94-2.82), representing stronger performance (p=0.008, η2=0.169), but was outperformed by historical class averages (mean 3.67, p=0.018) when including all past responses regardless of student performance level. There was no significant trend in performance across domains of Bloom’s Taxonomy.
Discussion:
ChatGPT was effective in answering short-answer assessment problems across the pre-clerkship curriculum. Human assessors were often unable to distinguish between responses generated by ChatGPT and those produced by students.
ChatGPT was effective in answering short-answer assessment problems across the pre-clerkship curriculum. Human assessors were often unable to distinguish between responses generated by ChatGPT and those produced by students.
Conclusions:
While ChatGPT was able to reach passing grades in our short-answer assessments, it outperformed only underperforming students, and failed to outperform the historical class average.
While ChatGPT was able to reach passing grades in our short-answer assessments, it outperformed only underperforming students, and failed to outperform the historical class average.
Take-Home Messages/Implications:
Risks to assessment validity include uncertainty in identifying struggling students. Areas of future research include: ChatGPT's performance in higher cognitive tasks, its role as a learning tool, and its potential in evaluating assessments.
Risks to assessment validity include uncertainty in identifying struggling students. Areas of future research include: ChatGPT's performance in higher cognitive tasks, its role as a learning tool, and its potential in evaluating assessments.
References (maximum three)
[1] Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel). 2023;11(6):887. doi:10.3390/healthcare11060887
[2] Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198
[3] Gilson A, Safranek CW, Huang T, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023;9:e45312. doi:10.2196/45312
4:45 pm
Lisa Purdy1
Shirley Schipper1 and Shelley Ross1
1 University of Alberta
Shirley Schipper1 and Shelley Ross1
1 University of Alberta
Background:
Advances in technology can spark many emotions for health professions education (HPE) learners, teachers, and programs, ranging from excitement to anxiety. While some advances are rapidly embraced, such as technologies that increase the affordability and fidelity of simulation, others are viewed with caution. The explosive advances in artificial intelligence (AI) and machine learning, especially the emergence of generative large language models (LLM) like OpenAI’s ChatGPT have unprecedented potential but also pose ethical challenges and other dangers to HPE. In this study, we conducted a narrative review of recent literature to identify advantages and dangers of AI in HPE.
Advances in technology can spark many emotions for health professions education (HPE) learners, teachers, and programs, ranging from excitement to anxiety. While some advances are rapidly embraced, such as technologies that increase the affordability and fidelity of simulation, others are viewed with caution. The explosive advances in artificial intelligence (AI) and machine learning, especially the emergence of generative large language models (LLM) like OpenAI’s ChatGPT have unprecedented potential but also pose ethical challenges and other dangers to HPE. In this study, we conducted a narrative review of recent literature to identify advantages and dangers of AI in HPE.
Summary of work:
Search terms for the narrative review included: AI and subdomains (e.g., machine learning); and HPE and subdomains (e.g., medical education). Databases were MEDLINE, EBSCOHOST, and Web of Science; search was restricted to 2015-2023 to capture recent technology developments. Included articles were read in full by all team members. Relevant themes were determined through discussion and consensus.
Search terms for the narrative review included: AI and subdomains (e.g., machine learning); and HPE and subdomains (e.g., medical education). Databases were MEDLINE, EBSCOHOST, and Web of Science; search was restricted to 2015-2023 to capture recent technology developments. Included articles were read in full by all team members. Relevant themes were determined through discussion and consensus.
Results:
We identified two main themes: benefits and dangers. Benefits included: using AI to design and implement automated grading systems; create simulated patient encounters for progress testing; and safe integration of productive failure for learners to facilitate development of adaptive expertise. Dangers included: risk of dehumanizing patient interactions; data privacy and security; and potential for bias and plagiarism inherent in the way that LLMs generate content.
We identified two main themes: benefits and dangers. Benefits included: using AI to design and implement automated grading systems; create simulated patient encounters for progress testing; and safe integration of productive failure for learners to facilitate development of adaptive expertise. Dangers included: risk of dehumanizing patient interactions; data privacy and security; and potential for bias and plagiarism inherent in the way that LLMs generate content.
Discussion:
There is enormous potential of using AI in HPE. However, positives must be balanced by an awareness of the ethical issues and potential detrimental impacts on learning that come with these technology advances..
There is enormous potential of using AI in HPE. However, positives must be balanced by an awareness of the ethical issues and potential detrimental impacts on learning that come with these technology advances..
Conclusions and take-home message:
Education leaders in HPE are in uncharted territory, with few comparable situations to draw upon for guidance. The first step in being prepared for the impact of AI in HPE is to become informed about potentials and pitfalls.
Education leaders in HPE are in uncharted territory, with few comparable situations to draw upon for guidance. The first step in being prepared for the impact of AI in HPE is to become informed about potentials and pitfalls.
References (maximum three)
1. Rampton V, Mittelman M, Goldhahn J. Implications of artificial intelligence for medical education. The Lancet Digital Health. 2020 Mar 1;2(3):e111-2.
2. Lee J, Wu AS, Li D, Kulasegaram KM. Artificial intelligence in undergraduate medical education: a scoping review. Academic Medicine. 2021 Nov 1;96(11S):S62-70.