Addressing Artificial Intelligence Bias in Retinal Disease Diagnostics

Philippe Burlina, Neil Joshi, William Paul, Katia D. Pacheco, Neil M. Bressler

Few studies of deep learning systems (DLS) have addressed issues of artificial intelligence bias for retinal diagnostics. This study evaluated novel AI and deep learning generative methods to address bias for retinal diagnostic applications when specifically applied to diabetic retinopathy (DR). A baseline DR diagnostics DLS designed to solve a two-class problem of referable vs not referable DR was applied to the public domain EyePACS dataset (88,692 fundi and 44,346 individuals), expanded to include clinician-annotated labels for race. Training data included diseased whites, healthy whites and healthy blacks, but lacked training exemplars for diseased blacks. Results: Accuracy (95% confidence intervals [CI]) of whites was 73.0% (66.9%,79.2%) vs. blacks of 60.5% (53.5%,67.3%], demonstrating disparity (Welch t-test t=2.670, P=.008) of AI performance as measured by accuracy across races. By contrast, an AI approach leveraging generative models was used to train a debiased diagnostic DLS with additional synthetic data for the missing subpopulation (diseased blacks), which achieved accuracy for whites of 77.5% (71.7%,83.3%) and for blacks of 70.0% (63.7%,76.4%), demonstrating closer parity in accuracy across races (Welch t-test t=1.70, P=.09). The debiased DLS also showed improvement in sensitivity of over 21% for blacks, with the same level of specificity, when compared with the baseline DLS. These findings demonstrate how data imbalance can lead to bias and inequality of accuracy depending on race, and illustrate the potential benefits of using novel generative methods for debiasing AI. Translational Relevance: These methods might decrease AI bias for other retinal and opthalmic diagnostic DLS.

Knowledge Graph



Sign up or login to leave a comment