The ability to record and replay program executions with low overhead enables many applications, such as reverse-execution debugging, debugging of hard-to-reproduce test failures, and "black box" forensic analysis of failures in deployed systems. Existing record-and-replay approaches rely on recording an entire virtual machine (which is heavyweight), modifying the OS kernel (which adds deployment and maintenance costs), or pervasive code instrumentation (which imposes significant performance and complexity overhead). We investigated whether it is possible to build a practical record-and-replay system avoiding all these issues. The answer turns out to be yes --- if the CPU and operating system meet certain non-obvious constraints. Fortunately modern Intel CPUs, Linux kernels and user-space frameworks meet these constraints, although this has only become true recently. With some novel optimizations, our system RR records and replays real-world workloads with low overhead with an entirely user-space implementation running on stock hardware and operating systems. RR forms the basis of an open-source reverse-execution debugger seeing significant use in practice. We present the design and implementation of RR, describe its performance on a variety of workloads, and identify constraints on hardware and operating system design required to support our approach.