MPI is the most widely used data transfer and communication model in High Performance Computing. The latest version of the standard, MPI-3, allows skilled programmers to exploit all hardware capabilities of the latest and future supercomputing systems. The revised asynchronous remote-memory-access model in combination with the shared-memory window extension, in particular, allow writing code that hides communication latencies and optimizes communication paths according to the locality of data origin and destination. The latter is particularly important for today's multi- and many-core systems. However, writing such efficient code is highly complex and error-prone. In this paper we evaluate a recent remote-memory-access model, namely DART-MPI. This model claims to hide the aforementioned complexities from the programmer, but deliver locality-aware remote-memory-access semantics which outperforms MPI-3 one-sided communication primitives on multi-core systems. Conceptually, the DART-MPI interface is simple; at the same time it takes care of the complexities of the underlying MPI-3 and system topology. This makes DART-MPI an interesting candidate for porting legacy applications. We evaluate these claims using a realistic scientific application, specifically a finite-difference stencil code which solves the heat diffusion equation, on a large-scale Cray XC40 installation.