Evaluation of ChatGPT-4o in oral and maxillofacial surgery examinations: a comparative study of performance on U.S. dental decks and chinese dental licensing examination practice questions

Mục Lục

Title (English)

Evaluation of ChatGPT-4o in oral and maxillofacial surgery examinations: a comparative study of performance on U.S. dental decks and chinese dental licensing examination practice questions

Thong tin bai bao / Article info

Tac gia / Authors: Hao Zheng, Zohreh Yousefian Zare, Menghong Li, Yuzhu Pan, Shuyao Ren, Haoyue Cui, Yuexiao Li
Tap chi / Journal: BMC Oral Health
Ngay xuat ban / Published: 2026-05-11
DOI: 10.1186/s12903-026-08551-9
Nguon / Source: OpenAlex

Abstract (English)

Abstract Background This study evaluates the performance of ChatGPT-4o on practice questions from the U.S. Integrated National Board Dental Examination (INBDE) and the Chinese Dental Licensing Examination to assess its potential and limitations in a high-stakes dental context. Methods ChatGPT-4o and human participants (ten dental undergraduate students, ten graduate students specializing in oral and maxillofacial surgery (OMS), and ten OMS specialists with over five years of clinical experience) were tested on all available questions categorized under oral and maxillofacial surgery from the U.S. Dental Decks online platform (English) and the 2024 Chinese Dental Licensing Exam Question Bank (simplified Chinese). Accuracy (percentage of correct responses) was compared between groups using chi-squared tests. Results ChatGPT-4o achieved an overall accuracy of 90% ( n = 252) on the U.S. Dental Decks, outperforming all human participants on English-language questions ( p < 0.001), with a moderate magnitude of difference. In contrast, its accuracy on the Chinese question bank was 71% ( n = 567, p < 0.001 vs. English performance), a level comparable to that of dental undergraduate and OMS graduate students, but lower than that of OMS specialists (87%, p < 0.001), with differences of modest magnitude. By question type, accuracy on the U.S. Dental Decks was consistent across all formats. In the Chinese question bank, performance varied modestly by question type, with the highest accuracy observed for A1-type questions (76%) and the lowest for A2-type questions (61%). Conclusion ChatGPT-4o demonstrated higher accuracy on English-language questions than on Chinese questions, suggesting a potential language-related performance gap. This finding highlights the importance of localized training data for Artificial Intelligence (AI) systems. Its performance in non-English contexts should be interpreted with caution, as it may not yet reach specialist-level expertise.

Doc bai day du / Read full article

Bai dang tu dong boi plugin Ortho OA Fetcher. Anh (neu co) tu PubMed Central. Noi dung lay tu nguon open access va dich tu dong – chi mang tinh tham khao.

Ortho News

User tu dong cua plugin Ortho OA Fetcher – dang bai bao Open Access ve Chinh nha.

See author's posts

Evaluation of ChatGPT-4o in oral and maxillofacial surgery examinations: a comparative study of performance on U.S. dental decks and chinese dental licensing examination practice questions

Title (English)

Thong tin bai bao / Article info

Abstract (English)

Ortho News

Facebook Comments

Leave a Reply Cancel reply

Title (English)

Thong tin bai bao / Article info

Abstract (English)

Ortho News

Facebook Comments

Related Posts

Environmental sustainability in dental education in Jordan: a cross-sectional survey of dental students and educators

L-shaped iliac crest flap: an effective solution for reconstructing Class I and II mandibular defects – a retrospective study

Passive fit and trueness of complete-arch implant-supported cobalt chromium frameworks manufactured using different digital techniques: a laboratory study

Leave a Reply Cancel reply