Title (English)
Evaluation of ChatGPT-4o in oral and maxillofacial surgery examinations: a comparative study of performance on U.S. dental decks and chinese dental licensing examination practice questions
Thong tin bai bao / Article info
- Tac gia / Authors: Hao Zheng, Zohreh Yousefian Zare, Menghong Li, Yuzhu Pan, Shuyao Ren, Haoyue Cui, Yuexiao Li
- Tap chi / Journal: BMC Oral Health
- Ngay xuat ban / Published: 2026-05-11
- DOI: 10.1186/s12903-026-08551-9
- Nguon / Source: OpenAlex
Abstract (English)
Abstract Background This study evaluates the performance of ChatGPT-4o on practice questions from the U.S. Integrated National Board Dental Examination (INBDE) and the Chinese Dental Licensing Examination to assess its potential and limitations in a high-stakes dental context. Methods ChatGPT-4o and human participants (ten dental undergraduate students, ten graduate students specializing in oral and maxillofacial surgery (OMS), and ten OMS specialists with over five years of clinical experience) were tested on all available questions categorized under oral and maxillofacial surgery from the U.S. Dental Decks online platform (English) and the 2024 Chinese Dental Licensing Exam Question Bank (simplified Chinese). Accuracy (percentage of correct responses) was compared between groups using chi-squared tests. Results ChatGPT-4o achieved an overall accuracy of 90% ( n = 252) on the U.S. Dental Decks, outperforming all human participants on English-language questions ( p < 0.001), with a moderate magnitude of difference. In contrast, its accuracy on the Chinese question bank was 71% ( n = 567, p < 0.001 vs. English performance), a level comparable to that of dental undergraduate and OMS graduate students, but lower than that of OMS specialists (87%, p < 0.001), with differences of modest magnitude. By question type, accuracy on the U.S. Dental Decks was consistent across all formats. In the Chinese question bank, performance varied modestly by question type, with the highest accuracy observed for A1-type questions (76%) and the lowest for A2-type questions (61%). Conclusion ChatGPT-4o demonstrated higher accuracy on English-language questions than on Chinese questions, suggesting a potential language-related performance gap. This finding highlights the importance of localized training data for Artificial Intelligence (AI) systems. Its performance in non-English contexts should be interpreted with caution, as it may not yet reach specialist-level expertise.
Doc bai day du / Read full article
Bai dang tu dong boi plugin Ortho OA Fetcher. Anh (neu co) tu PubMed Central. Noi dung lay tu nguon open access va dich tu dong – chi mang tinh tham khao.
Facebook Comments