Remote-memory-access models, also known as one-sided communication models, are becoming an interesting alternative to traditional two-sided communication models in the field of High Performance Computing. In this paper we extend previous work on an MPI-based, locality-aware remote-memory-access model with a asynchronous progress-engine for non-blocking communication operations. Most previous related work suggests to drive progression on communication through an additional thread within the application process. In contrast, our scheme uses an arbitrary number of dedicated processes to drive asynchronous progression. Further, we describe a prototypical library implementation of our concepts, namely DART, which is used to quantitatively evaluate our design against a MPI-3 baseline reference. The evaluation consists of micro-benchmark to measure overlap of communication and computation and a scientific application kernel to assess total performance impact on realistic use-cases. Our benchmarks shows, that our asynchronous progression scheme can overlap computation and communication efficiently and lead to substantially shorter communication cost in real applications.