Visual grouping is a key mechanism in human scene perception. There, it belongs to the subconscious, early processing and is key prerequisite for other high level tasks such as recognition. In this paper, we introduce an efficient, realtime capable algorithm which likewise agglomerates a valuable hierarchical clustering of a scene, while using purely local appearance statistics. To speed up the processing, first we subdivide the image into meaningful, atomic segments using a fast Watershed transform. Starting from there, our rapid, agglomerative clustering algorithm prunes and maintains the connectivity graph between clusters to contain only such pairs, which directly touch in the image domain and are reciprocal nearest neighbors (RNN) wrt. a distance metric. The core of this approach is our novel cluster distance: it combines boundary and surface statistics both in terms of appearance as well as spatial linkage. This yields state-of-the-art performance, as we demonstrate in conclusive experiments conducted on BSDS500 and Pascal-Context datasets.