Deep Network Approximation Characterized by Number of Neurons

Zuowei Shen, Haizhao Yang, Shijun Zhang

This paper quantitatively characterizes the approximation power of deep feed-forward neural networks (FNNs) in terms of the number of neurons, i.e., the product of the network width and depth. It is shown by construction that ReLU FNNs with width $\mywidth$ and depth $9L+12$ can approximate an arbitrary H\"older continuous function of order $\alpha$ with a Lipschitz constant $\nu$ on $[0,1]^d$ with a tight approximation rate $5(8\sqrt{d})^\alpha\nu N^{-2\alpha/d}L^{-2\alpha/d}$ for any given $N,L\in \N^+$. The constructive approximation is a corollary of a more general result for an arbitrary continuous function $f$ in terms of its modulus of continuity $\omega_f(\cdot)$. In particular, the approximation rate of ReLU FNNs with width $\mywidth$ and depth $9L+12$ for a general continuous function $f$ is $5\omega_f(8\sqrt{d} N^{-2/d}L^{-2/d})$. We also extend our analysis to the case when the domain of $f$ is irregular or localized in an $\epsilon$-neighborhood of a $d_{\mathcal{M}}$-dimensional smooth manifold $\mathcal{M}\subseteq [0,1]^d$ with $d_{\mathcal{M}}\ll d$. Especially, in the case of an essentially low-dimensional domain, we show an approximation rate $3\omega_f\big(\tfrac{4\epsilon}{1-\delta}\sqrt{\tfrac{d}{d_\delta}}\big)+5\omega_f\big(\tfrac{16d}{(1-\delta)\sqrt{d_\delta}}N^{-2/d_\delta}L^{-2/d_\delta }\big)$ for ReLU FNNs to approximate $f$ in the $\epsilon$-neighborhood, where $d_\delta=\OO\big(d_{\mathcal{M}}\tfrac{\ln (d/\delta)}{\delta^2}\big)$ for any given $\delta\in(0,1)$. Our analysis provides a general guide for selecting the width and the depth of ReLU FNNs to approximate continuous functions especially in parallel computing.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment