We propose an efficient and scalable method for incrementally building a dense, semantically annotated 3D map in real-time. The proposed method assigns class probabilities to each region, not each element (e.g., surfel and voxel), of the 3D map which is built up through a robust SLAM framework and incrementally segmented with a geometric-based segmentation method. Differently from all other approaches, our method has a capability of running at over 30Hz while performing all processing components, including SLAM, segmentation, 2D recognition, and updating class probabilities of each segmentation label at every incoming frame, thanks to the high efficiency that characterizes the computationally intensive stages of our framework. By utilizing a specifically designed CNN to improve the frame-wise segmentation result, we can also achieve high accuracy. We validate our method on the NYUv2 dataset by comparing with the state of the art in terms of accuracy and computational efficiency, and by means of an analysis in terms of time and space complexity.