Recovering 3D phase features of complex, multiple-scattering biological samples traditionally sacrifices computational efficiency and processing time for physical model accuracy and reconstruction quality. This trade-off hinders the rapid analysis of living, dynamic biological samples that are often of greatest interest to biological research. Here, we overcome this bottleneck by combining annular intensity diffraction tomography (aIDT) with an approximant-guided deep learning framework. Using a novel physics model simulator-based learning strategy trained entirely on natural image datasets, we show our network can robustly reconstruct complex 3D biological samples of arbitrary size and structure. This approach highlights that large-scale multiple-scattering models can be leveraged in place of acquiring experimental datasets for achieving highly generalizable deep learning models. We devise a new model-based data normalization pre-processing procedure for homogenizing the sample contrast and achieving uniform prediction quality regardless of scattering strength. To achieve highly efficient training and prediction, we implement a lightweight 2D network structure that utilizes a multi-channel input for encoding the axial information. We demonstrate this framework's capabilities on experimental measurements of epithelial buccal cells and Caenorhabditis elegans worms. We highlight the robustness of this approach by evaluating dynamic samples on a living worm video, and we emphasize our approach's generalizability by recovering algae samples evaluated with different experimental setups. To assess the prediction quality, we develop a novel quantitative evaluation metric and show that our predictions are consistent with our experimental measurements and multiple-scattering physics.