A Primer on Memory Consistency and Cache Coherence

ToRead

Cite

Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2020. A Primer on Memory Consistency and Cache Coherence. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-031-01764-3

Attachments

Collections

Annotations

As part of this consistency model

support, the hardware provides cache coherence (or coherence).

It is worth stressing that unlike consistency which is an architectural specification that defines shared memory correctness, coherence is a means to supporting a consistency model

Essentially, all of the variants make one processor’s write visible to the other processors by propagating the write to all caches

But protocols differ in when and how the syncing happens. There are two major classes of coherence protocols.

In the second approach, the coherence protocol propagates writes to the caches asynchronously, while still honoring the consistency model.

GPUs originally chose not to support hardware cache coherence, since GPUs are designed for embarrassingly parallel graphics workloads that do not synchronize or share data all that much. However, the absence of hardware cache coherence leads to programmability and/or performance challenges when GPUs are used for general-purpose workloads with fine-grained synchronization and data sharing

We classify these protocols into two categories based on the nature of their coherence interfaces—specifically, based on whether there is a clean separation of coherence from the consistency model or whether they are indivisible.

writes are propagated synchronously, the first category presents an interface that is identical to that of an atomic memory system

The cache coherence protocol abstracts away the caches completely and presents an illusion of atomic memory

In the second, more-recent category, writes are propagated asynchronously—a write can thus return before it has been made visible to all processors, thus allowing for stale values (in real time) to be observed.

coherence protocols in this class must ensure that the order in which writes are

eventually made visible adheres to the ordering rules mandated by the consistency model.

We define coherence through the single-writer–multiple-reader (SWMR) invariant.

note that it is possible to to enforce a variety of consistency models, including strong models such as SC and TSO, using this approach

This invariant states that the value of a memory location at the start of an epoch is the same as the value of the memory location at the end of its last read-write epoch.

must appear to execute all threads’ loads and stores to a single memory location in a total order that respects the program order of each thread

Per-location SC=COH

This definition highlights an important distinction between coherence and consistency in the literature: coherence is specified on a per-memory location basis, whereas consistency is specified with respect to all memory locations.

Power is assumed to be incomparable with respect to Alpha, ARM, RMO, and XC until someone proves that one is more relaxed than the other or that the two are equivalent.