Constructive Computer Architecture

## Cache Coherence

#### Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

November 17, 2014

## Shared Memory Systems





Μ





 Inclusion property is maintained between a parent and its children, i.e.,
 Because usual

$$a \in L_i \implies a \in L_{i+1}$$

http://www.

Because usually  $L_{i+1} >> L_i$ 

November 17, 2014

http://www.csg.csail.mit.edu/6.175



November 17, 2014



November 17, 2014

http://www.csg.csail.mit.edu/6.175

## Maintaining Store Atomicity

- Store atomicity requires all processors to see writes occur in the same order
  - multiple copies of an address in various caches can cause this to be violated
- This property can be ensured if:
  - Only one cache at a time has the write permission for an address
  - No cache can have a stale copy of the data after a write to the address has been performed

⇒ cache coherence protocols are used to implement store atomicity

November 17, 2014

### **Cache Coherence Protocols**

#### Write request:

- the address is *invalidated* in all other caches before the write is performed
- Read request:
  - if a dirty copy is found in some cache, that value must be used by doing a write-back and then reading the memory or forwarding that dirty value directly to the reader

Such protocols are called Invalidation-based

November 17, 2014

## State needed to maintain Cache Coherence

- Use MSI encoding in caches where
  - I means this cache does not contain the address
  - S *means* this cache has the address but so may other caches; hence it can only be read
  - M *means* only this cache has the address; hence it can be read and written
- The states M, S, I can be thought of as an order M > S > I
  - A transition from a lower state to a higher state is called an Upgrade
  - A transition from a higher state to a lower state is called a *Downgrade*

November 17, 2014

## Sibling invariant and compatibility

### Sibling invariant:

- Cache is in state  $M \Rightarrow$  its siblings are in state I
- That is, the sibling states are "compatible"
  - IsCompatible(M, M) = False
  - IsCompatible(M, S) = False
  - IsCompatible(S, M) = False

All other cases = True

## **Cache State Transitions**



optimizations

This state diagram is helpful as long as one remembers that each transition involves cooperation of other caches and the main memory to maintain the sibling invariants

November 17, 2014

http://www.csg.csail.mit.edu/6.175

## **Cache Actions**

#### On a read miss (i.e., Cache state is I):

- In case some other cache has the address in state M then write back the dirty data to Memory
- Read the value from Memory and set the state to S

#### On a write miss (i.e., Cache state is I or S):

- Invalidate the address in all other caches and in case some cache has the address in state M then write back the dirty data
- Read the value from Memory if necessary and set the state to M

Misses cause Cache upgrade actions which in turn may cause further downgrades or upgrades on other caches

November 17, 2014

## MSI protocol: some issues

- It never makes sense to have two outstanding requests for the same address from the same processor/cache
- It is possible to have multiple requests for the same address from different processors. Hence there is a need to arbitrate requests
- A cache needs to be able to evict an address in order to make room for a different address
  - Voluntary downgrade
- Memory system (higher-level cache) should be able to force a lower-level cache to downgrade
  - caches need to keep track of the state of their children's caches

November 17, 2014

## Directory State Encoding

Two-level (L1, M) system



For each address in a cache, the directory keeps two types of info

- c.state[a] (sibling info): do c's siblings have a copy of address a; M (means no), S (means maybe)
- m.child[c<sub>k</sub>][a] (children info): the state of child c<sub>k</sub> for address a; At most one child can be in state M

November 17, 2014

## Directory state encoding

#### wait states

#### New states to deal with waiting for responses:

- c.waitp[a] : Denotes if cache c is waiting for a response from its parent
  - No means not waiting
  - Yes (S|I) means waiting for a response to transition to state S or I, respectively
- m.waitc[c<sub>k</sub>][a] : Denotes if memory m is waiting for a response from its child c<sub>k</sub>
  - No | Yes (M|S)
- Cache state in L1:
  - <(M|S|I), (No | Yes(M|S))>

Directory state in home memory (for each child): <[(M|S|I), (No | Yes(S|I))]>

Child's state November 17, 2014 Waiting for downgrade response http://www.csg.csail.mit.edu/6.175



## **Processor Hit Rules**

- Load-hit rule p2m.msg=(Load a) & (c.state[a]>I) → p2m.deq; m2p.enq(c.data[a]);
  - Store-hit rule
  - p2m.msg=(Store a v) &
    c.state[a]=M
    → p2m.deq;
    m2p.enq(Ack);





November 17, 2014

## Processing misses: Requests and Responses



November 17, 2014

# Invariants for a CC-protocol design

- Directory state is always a conservative estimate of a child's state
  - E.g., if directory thinks that a child cache is in S state then the cache has to be in either I or S state
- For every request there is a corresponding response, though sometimes a response may have been generated even before the request was processed
- Communication system has to ensure that
  - responses cannot be blocked by requests
  - a request cannot overtake a response for the same address

At every merger point for requests, we will assume fair arbitration to avoid starvation

November 17, 2014