The trade-off between coarse- and fine-grained locking is a well understood issue in operating systems. Coarse-grained locking provides lower overhead under low contention, fine-grained locking provides higher scalability under contention, though at the expense of implementation complexity and re- duced best-case performance. We revisit this trade-off in the context of microkernels and tightly-coupled cores with shared caches and low inter-core migration latencies. We evaluate performance on two architectures: x86 and ARM MPCore, in the former case also utilising transactional memory (Intel TSX). Our thesis is that on such hardware, a well-designed microkernel, with short system calls, can take advantage of coarse-grained locking on modern hardware, avoid the run-time and complexity cost of multiple locks, enable formal verification, and still achieve scalability comparable to fine-grained locking.