Kumarhane keyfini yaşamak bettilt isteyenler için kategorisi oldukça ilgi çekici.

Spor tutkunları canlı karşılaşmalara bettilt üzerinden yatırım yapıyor.

Bahis kullanıcılarının %46’sı canlı casino deneyimini gerçek bir mekanda oynamaya alternatif olarak görmektedir; bu nedenle pinco giriş etkileşimli ortamlar sağlar.

Promosyon dünyasında en çok tercih edilen bettilt giriş seçenekleri yatırımları artırıyor.

Her zaman şeffaf politikalarıyla bilinen bahsegel güvenilir bir bahis ortamı sağlar.

2026’te yeni tasarımıyla dikkat çekecek olan bahsegel şimdiden konuşuluyor.

Bahis keyfini sorunsuz yaşamak isteyenlerin tercihi bahsegel olmalı.

Kazandıran kombinasyon önerileriyle kullanıcılarına destek olan pinco profesyonel bir sitedir.

Canlı rulet oyunlarında topun her dönüşü gerçek zamanlı gerçekleşir; bu adillik bettilt giriş tarafından garanti edilir.

Her oyuncu güvenle giriş bahsegel yapmak için linkini kullanıyor.

Do large language models differ in their pharmacology-related response quality for oral and maxillofacial surgery? a blinded expert benchmark study

Title (English)

Do large language models differ in their pharmacology-related response quality for oral and maxillofacial surgery? a blinded expert benchmark study

Thong tin bai bao / Article info

  • Tac gia / Authors: Mustafa Isleyen, Asenur Aydemir
  • Tap chi / Journal: BMC Oral Health
  • Ngay xuat ban / Published: 2026-07-04
  • DOI: 10.1186/s12903-026-09182-w
  • Nguon / Source: OpenAlex

Abstract (English)

Abstract Background Pharmacology represents the lowest-performing subcategory in oral and maxillofacial surgery (OMFS) evaluations of large language models (LLMs), yet no study has simultaneously compared the leading commercial LLMs across multiple pharmacological domains and question formats. This study evaluated ChatGPT 5.3, Gemini 3.1 Pro, and Claude 4.6 Sonnet in OMFS pharmacology. Methods Thirty-six OMFS pharmacology questions spanning five clinical domains (antibiotic prophylaxis, analgesics, drug–drug interactions, anesthetic pharmacology, special populations) and three formats (open-ended, multiple-choice, true/false; n = 12 each) were submitted to each LLM using a standardized role-conditioning prompt. The 108 responses were independently and blindly evaluated by two oral and maxillofacial surgeons (one specialist and one resident) on three 5-point Likert criteria. Inter-rater reliability was quantified using ICC(2,1) and Cohen’s κ_w. Inter-model differences were assessed using Friedman tests; format effects were assessed using Kruskal–Wallis tests with Bonferroni-corrected post-hoc comparisons. Results Inter-rater reliability was excellent (ICC = 0.828; κ_w = 0.827; exact agreement 91.0%). A robust hierarchy emerged: Claude > Gemini > ChatGPT (χ²(2) = 47.91, p < 0.001, W = 0.665), with all pairwise comparisons significant. Gemini and Claude did not differ significantly in any format section, indicating clinical equivalence. ChatGPT exhibited a significant decline on open-ended, integrative-reasoning items (H(2) = 17.04, p < 0.001, ε² = 0.456), absent in Gemini and Claude. Significant positive correlations among the evaluation criteria within the ChatGPT data indicated convergence among the three scoring dimensions. Conclusion Claude 4.6 Sonnet and Gemini 3.1 Pro achieved near-maximal scores on this structured pharmacology benchmark, while ChatGPT 5.3 showed a significant decline in open-ended reasoning. Current LLMs should be regarded as adjunctive tools requiring expert verification for high-risk OMFS pharmacological decisions.

Doc bai day du / Read full article


Bai dang tu dong boi plugin Ortho OA Fetcher. Anh (neu co) tu PubMed Central. Noi dung lay tu nguon open access va dich tu dong – chi mang tinh tham khao.

Facebook Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

2