Última alteração: 2023-08-14
Resumo
Theme II - Artificial Intelligence in Medicine
TITLE
How much ChatGPT can you get away with?
Detection of different levels of AI use in medical education
AIM
To assess human and software detection of different levels of AI text generation/enhancement in the medical education context
INTRODUCTION
New artificial intelligence (AI) based tools such as ChatGPT have shown great potential for meaningful change in healthcare and medical education [1]. Detection of AI use in text generation by humans and automatic classifiers has gathered particular interest from the general public and research community, having been explored in several publications [2]. An important still unexplored question that remains in the field is how good AI detection is for different degrees of AI use in text generation: “AI only” generated text; AI generated text with task specific human text examples; human text with AI enhancement; and “human only” generated examples. This question becomes considerably more pressing with the advent of GPT-4 since March 2023 that substantially increased general public available text generation AI capabilities. There are currently very few academic publications concerning the potential impact of this new more powerful AI tool in medical education.
BRIEF DESCRIPTION
In this work we explore AI detection in a real-world context of medical education, assessing how well medical school professors identify AI generated text with and without aid from the most well-known and easy to use automatic text classifiers available today. Approximately 150 first year medical students wrote a 500-word subjective essay on their motivation to enroll in medical school. These “human only” generated texts were written before ChatGPT was available, making it unlikely that AI was used to generate or enhance them. Using the same essay instructions AI generated/enhanced texts were produced using ChatGPT-4. AI detection performance by professors is being compared with publicly available automatic AI classifiers. Performance of a 2-step detection (automatic AI screening followed by human assessment of potential AI texts) is also being assessed. Preliminary results indicate a substantial increase in difficulty to discern between human and AI generated/enhanced text when using GPT-4 rather than GPT-3.5. With this work we will achieve a detailed real-world context understanding of academic medical school professors’ performance in detection of different levels of AI generated/enhanced subjective texts. Such insight will help shape the future AI use in medical education.
Key words: artificial intelligence, medical education, ChatGPT-4
References:
1 - Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel). 2023 Mar19
2 - De Angelis et al. ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health. 2023 Apr25