Provably Good Early Detection of Diseases using Non-Sparse Covariance-Regularized Linear Discriminant Analysis

Haoyi Xiong, Yanjie Fu, Wenqing Hu, Guanling Chen, Laura E. Barnes

To improve the performance of Linear Discriminant Analysis (LDA) for early detection of diseases using Electronic Health Records (EHR) data, we propose \TheName{} -- a novel framework for \emph{\underline{E}HR based \underline{E}arly \underline{D}etection of \underline{D}iseases} on top of \emph{Covariance-Regularized} LDA models. Specifically, \TheName\ employs a \emph{non-sparse} inverse covariance matrix (or namely precision matrix) estimator derived from graphical lasso and incorporates the estimator into LDA classifiers to improve classification accuracy. Theoretical analysis on \TheName\ shows that it can bound the expected error rate of LDA classification, under certain assumptions. Finally, we conducted extensive experiments using a large-scale real-world EHR dataset -- CHSN. We compared our solution with other regularized LDA and downstream classifiers. The result shows \TheName\ outperforms all baselines and backups our theoretical analysis.

Knowledge Graph



Sign up or login to leave a comment