Revolutionizing Patient Education: Artificial Intelligence Versus Experts in Ocular Dyskinesia Responses

PubMed ID: 40865167

Author(s): Bahir D, Hartstein M, Burkat C, Ezra D, Wulc AE, Zloto O, Holds J, Hamed Azzam S. Revolutionizing Patient Education: Artificial Intelligence Versus Experts in Ocular Dyskinesia Responses. Ophthalmic Plast Reconstr Surg. 2026 Mar-Apr 01;42(2):172-182. doi: 10.1097/IOP.0000000000003046. Epub 2025 Aug 27. PMID: 40865167.PMID 40865167

Journal: Ophthalmic Plastic and Reconstructive Surgery 42(2):172-182

Purpose: Ocular dyskinesia, including dystonic blepharospasm and hemifacial spasm, significantly impacts patient quality of life. This study evaluates the effectiveness of advanced artificial intelligence models (ChatGPT-3.5, GPT-4o, Gemini, and Gemini Advanced) compared with expert ophthalmologists in providing accurate, reliable, and patient-focused answers to common ocular dyskinesia-related questions.

Methods: A panel of oculoplastic surgeons developed 13 clinically relevant questions addressing symptoms, treatments, and posttreatment care for ocular dyskinesia. Anonymized responses from 4 artificial intelligence models (ChatGPT-3.5, GPT-4o, Gemini, and Gemini Advanced) and experts were evaluated by a panel of 7 international oculoplastic surgeons for correctness and reliability using a 7-point Likert scale. Statistical analyses were performed to identify differences among groups.

Results: ChatGPT-3.5 emerged as the top-performing model, achieving the highest correctness (mean score: 5.80) and reliability score (5.68), surpassing both GPT-4o (5.58/5.38) and the expert panel (5.56/5.31). GPT-4o closely mirrored expert performance, while Gemini and Gemini Advanced consistently lagged, reflecting lower correctness (4.67 and 5.03, respectively) and reliability scores. Statistical analysis confirmed significant differences across groups ( p < 0.001).

Conclusions: ChatGPT-3.5 demonstrates exceptional potential in transforming patient education regarding ocular dyskinesia, delivering highly accurate and patient-accessible responses. While ChatGPT-4o and experts offer strong, clinically sound insights, the Gemini models require refinement to meet higher benchmarks. These findings underscore the potential role of artificial intelligence in complementing human expertise, paving the way for innovative and collaborative approaches to patient care and education.