Risk-Monotonicity via Distributional Robustness

Zakaria Mhammedi, Hisham Husain

Acquisition of data is a difficult task in most applications of Machine Learning (ML), and it is only natural that one hopes and expects lower populating risk (better performance) with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms such as the Empirical Risk Minimizer (ERM). Non-monotonic behaviour of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems not only highlight our lack of understanding of learning algorithms and generalization but rather render our efforts at data acquisition in vain. It is, therefore, crucial to pursue this concern and provide a characterization of such behaviour. In this paper, we derive the first consistent and risk-monotonic algorithms for a general statistical learning setting under weak assumptions, consequently resolving an open problem (Viering et al. 2019) on how to avoid non-monotonic behaviour of risk curves. Our algorithms make use of Distributionally Robust Optimization (DRO) -- a technique that has shown promise in other complications of deep learning such as adversarial training. Our work makes a significant contribution to the topic of risk-monotonicity, which may be key in resolving empirical phenomena such as double descent.

Knowledge Graph



Sign up or login to leave a comment