Longitudinal Comparison of Geographic Atrophy Enlargement Using Manual, Semiautomated, and Deep Learning Approaches.

PubMed ID: 40469899

Author(s): Bogost J, Linderman RE, Slater R, Saunders TF, Pacheco C, Pak J, Voland R, Blodi B, Domalpally A. Longitudinal Comparison of Geographic Atrophy Enlargement Using Manual, Semiautomated, and Deep Learning Approaches. Ophthalmol Sci. 2025 Apr 7;5(5):100787. doi: 10.1016/j.xops.2025.100787. eCollection 2025 Sep-Oct. PMID 40469899

Journal: Ophthalmology Science, Volume 5, Issue 5, 2025

OBJECTIVE To compare a fully automated artificial intelligence (AI) model, a semiautomated method, and manual planimetry in the longitudinal assessment of geographic atrophy (GA) using fundus autofluorescence images.

DESIGN A retrospective analysis of 3 GA assessment methods: AI, Heidelberg Eye Explorer semiautomated software (RegionFinder), and manual planimetry.

SUBJECTS AND CONTROLS One hundred eight patients (185 eyes) with GA from a phase IIb clinical trial by GlaxoSmithKline, which evaluated an experimental drug that did not reduce GA enlargement compared with the placebo.

METHODS Fundus autofluorescence images of 185 eyes were annotated using manual planimetry, semiautomated RegionFinder, and a fully automated AI model trained and validated on manual planimetry annotations at screening, year 1, and year 2. Artificial intelligence masks were compared with human-guided methods, and regression errors were assessed by stacking masks from consecutive visits. Agreement between methods was assessed using Bland-Altman plots, Dice similarity coefficient (DSC), and comparisons of GA growth rates. Artificial intelligence performance was evaluated based on its need for human edits and frequency of regression errors.

MAIN OUTCOME MEASURES Agreement between methods was evaluated using Bland-Altman plots, DSC, and intraclass correlation coefficients (ICCs). The mean GA growth rate (mm2/year) and square root transformation of GA size were compared across methods. Artificial intelligence performance was assessed by the percentage of acceptable masks and the frequency of longitudinal regression errors.

RESULTS At screening, the mean GA area was 7.22 mm2 with RegionFinder, 8.37 mm2 with AI, and 8.66 mm2 with manual planimetry. RegionFinder measured smaller GA areas than both AI and manual, with a mean difference of -1.45 mm2 (95% confidence interval [CI]: -1.56, -1.35) versus AI (ICC = 0.945) and -1.87 mm2 (95% CI: -1.99, -1.75) versus manual (ICC = 0.920). Growth rates were comparable between RegionFinder (1.54 mm2/year), AI (1.68 mm2/year), and manual (1.80 mm2/year) (P = 0.25). Artificial intelligence masks were deemed acceptable in 84.8% of visits, and 81.4% of cases showed no regression over time.

CONCLUSIONS Artificial intelligence accurately measures GA in approximately 85% of cases, requiring human intervention in only 15%, indicating potential to streamline GA measurement in clinical trials while maintaining human oversight.

FINANCIAL DISCLOSURES The author(s) have no proprietary or commercial interest in any materials discussed in this article.

© 2025 by the American Academy of Ophthalmologyé.