DiCENet: Dimension-wise Convolutions for Efficient Networks

Sachin Mehta, Hannaneh Hajishirzi, Mohammad Rastegari

We introduce a novel and generic convolutional unit, DiCE unit, that is built using dimension-wise convolutions and dimension-wise fusion. The dimension-wise convolutions apply light-weight convolutional filtering across each dimension of the input tensor while dimension-wise fusion efficiently combines these dimension-wise representations; allowing the DiCE unit to efficiently encode spatial and channel-wise information contained in the input tensor. The DiCE unit is simple and can be easily plugged into any architecture to improve its efficiency and performance. Compared to depth-wise separable convolutions, the DiCE unit shows significant improvements across different architectures. When DiCE units are stacked to build the DiCENet model, we observe significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation. On the ImageNet dataset, the DiCENet delivers either the same or better performance than existing models with fewer floating-point operations (FLOPs). Notably, for a network size of about 70 MFLOPs, DiCENet outperforms the state-of-the-art neural search architecture, MNASNet, by 4% on the ImageNet dataset. Our code is open source and available at \url{https://github.com/sacmehta/EdgeNets}

Knowledge Graph



Sign up or login to leave a comment