Finding cycles in directed graphs enables important applications in various domains such as finance, biology, chemistry, and network science. However, as the size of graph datasets continues to grow, it becomes increasingly difficult to discover cycles within them, which necessitates more efficient algorithms and their parallel implementations. In this work, we propose scalable parallelisation of state-of-the-art sequential algorithms for enumerating simple, temporal, and hop-constrained cycles. First, we focus on the simple cycle enumeration problem and parallelise the algorithms by Johnson and by Read and Tarjan in a fine-grained manner. We theoretically show that our resulting fine-grained parallel algorithms are scalable, with the fine-grained parallel Read-Tarjan algorithm being strongly scalable. In contrast, we show that straightforward coarse-grained parallel versions of these simple cycle enumeration algorithms that exploit edge- or vertex-level parallelism are not scalable. Next, we adapt our fine-grained approach to enable scalable parallelisation of state-of-the-art algorithms for temporal and hop-constrained cycle enumeration. Our evaluation on a cluster with 256 physical cores demonstrates a near-linear scalability of our fine-grained parallel algorithms when enumerating all the aforementioned types of cycles. On the same cluster, our fine-grained parallel algorithms achieve, on average, one order of magnitude speedup compared to the respective coarse-grained parallel versions of the state-of-the-art algorithms for cycle enumeration. The performance gap between the fine-grained and the coarse-grained parallel algorithms increases as we use more CPU cores.