Processing-in-memory (PIM) architectures have seen an increase in popularity recently, as the high internal bandwidth available within 3D-stacked memory provides greater incentive to move some computation into the logic layer of the memory. To maintain program correctness, the portions of a program that are executed in memory must remain coherent with the portions of the program that continue to execute within the processor. Unfortunately, PIM architectures cannot use traditional approaches to cache coherence due to the high off-chip traffic consumed by coherence messages, which, as we illustrate in this work, can undo the benefits of PIM execution for many data-intensive applications. We propose LazyPIM, a new hardware cache coherence mechanism designed specifically for PIM. Prior approaches for coherence in PIM are ill-suited to applications that share a large amount of data between the processor and the PIM logic. LazyPIM uses a combination of speculative cache coherence and compressed coherence signatures to greatly reduce the overhead of keeping PIM coherent with the processor, even when a large amount of sharing exists.We find that LazyPIM improves average performance across a range of data-intensive PIM applications by 19.6%, reduces off-chip traffic by 30.9%, and reduces energy consumption by 18.0%, over the best prior approaches to PIM coherence.