On-Chip Networks I: Topology/Flow Control

Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T.

http://www.csg.csail.mit.edu/6.823

Sanchez & Emer

# History: From interconnection networks to on-chip networks



#### Focus on on-chip networks connecting caches in shared memory processors

Multi-Chip: Supercomputers, Data Centers, Internet Routers, Servers On-Chip: Servers, Laptops, Phones, HDTVs, Access routers

April 23, 2014

Sanchez & Emer

# What's an on-chip network?



#### It transports cache coherence messages and cache lines between processor cores

holds a copy of address A in its \$

## Designing an on-chip network



#### Interconnection Networks Architecture

- How to connect the nodes up (processors, memories, router line cards, SoC modules) – TOPOLOGY
- Which path should a message take? ROUTING
- How is the message actually forwarded from source to destination – FLOW CONTROL
- How to build the routers ROUTER MICROARCHITECTURE
- How to build the links LINK ARCHITECTURE
- How do nodes talk to the network NETWORK INTERFACE

Topology

# **Topological Properties**

- *Routing Distance* number of links on route
- *Diameter* maximum routing distance
- Average Distance
- A network is *partitioned* by a set of links if their removal disconnects the graph
- Bisection Bandwidth is the bandwidth crossing a minimal cut that divides the network in half

## Linear Arrays and Rings



L in ear Array

Torus

Torus arranged to use short wires

Route A -> B given by relative address R = B-A

| Lin                  | ear Array | Ring (1-D Torus) |
|----------------------|-----------|------------------|
| Diameter?            | Ν         | N/2              |
| Average Distance?    | N/2       | N/4              |
| Bisection bandwidth? | 1         | 2                |

- Iorus Examples:
  - FDDI, SCI, FiberChannel Arbitrated Loop, Intel Xeon

L19-8

#### Multidimensional Meshes and Tori



- *d*-dimensional array
  - $-n = k_{d-1} \times \dots \times k_0$  nodes
  - described by *d*-vector of coordinates  $(i_{d-1}, ..., i_0)$
- *d*-dimensional *k*-ary mesh:  $N = k^d$

 $-\mathbf{k} = d\sqrt{N}$ 

described by *d*-vector of radix k coordinate

• *d*-dimensional *k*-ary torus (or *k*-ary *d*-cube)

#### **Routing & Flow Control Overview**

#### Messages, Packets, Flits, Phits



Packet: Basic unit of routing and sequencing

- Limited size (e.g. 64 bits – 64 KB)

For variable packet sizes

Flit (flow control digit): Basic unit of bandwidth/storage allocation

- All flits in packet follow the same path

Phit (physical transfer digit): data transferred in single clock April 23, 2014 Sanchez

Sanchez & Emer

# Routing vs Flow Control

- Routing algorithm chooses path that packets should follow to get from source to destination
- Flow control schemes allocate resources (buffers, links, control state) to packets traversing the network

- Our approach: Bottom-up
  - Today: Flow control, assuming routes are set
  - Next lecture: Routing algorithms

#### **Properties of Routing Algorithms**

- Deterministic/Oblivious
  - Route determined by (source, dest), not intermediate state (i.e. traffic)
- Adaptive
  - Route influenced by traffic along the way
- Minimal
  - Only selects shortest paths
- Deadlock-free
  - No traffic pattern can lead to a situation where no packets move forward

#### (more in next lecture)

#### **Flow Control**

Sanchez & Emer

L19-15

#### Contention



- Two packets trying to use the same link at the same time
   Limited or no buffering
- Problem arises because we are sharing resources
   Sharing bandwidth and buffering

### Flow Control Protocols

- Bufferless
  - Circuit switching
  - Dropping
  - Misrouting
- Buffered
  - Store-and-forward
  - Virtual cut-through
  - Wormhole
  - Virtual-channel

Complexity & Efficiency

# Circuit Switching

- Form a circuit from source to dest
- Probe to set up path through network
- Reserve all links
- Data sent through links
- Bufferless

# Time-space View: Circuit Switching



- Why is this good? Simple to implement
- Why is it not? Wasteful, increased 3X latency for short packets

### Speculative Flow Control: Dropping

- If two things arrive and I don't have resources, drop one of them
- Flow control protocol on the Internet



L19-19

## Time-space Diagram: Dropping



Sanchez & Emer

L19-20

April 23, 2014

## Less Simple Flow Control: Misrouting

 If only one message can enter the network at each node, and one message can exit the network at each node, the network can never be congested. Right?

#### Wrong! Multiple hops cause congestion

- Philosophy behind misrouting: intentionally route away from congestion
- No need for buffering
- Problems?

Livelock: need to guarantee that progress is made

L19-21

### **Buffered Routing**



- Link-level flow control:
  - Given that you can't drop packets, how to manage the buffers?
     When can you send stuff forward, when not?
- Metrics of interest:
  - Throughput/Latency
  - Buffer utilization (turnaround time)

### Techniques for link backpressure

- Naïve stall-based (on/off):
   Can source send or not?
- Sophisticated stall-based (credit-based):
   How many flits can be sent to the next node?
- Speculative (ack/nack):
  - Guess can always send, but keep copy
  - Resolve if send was successful (ack/nack)
    - On ack drop copy
    - On nack resend

#### Packet-Buffer Flow Control: Store-and-Forward

#### • Strategy:

 Make intermediate stops and wait until the entire packet has arrived before you move on

• Advantage:

– Other packets can use intermediate links

L19-24

#### Time-space View: Store-and-Forward



Could be allocated at a much later time without packet dropping

- Buffering allows packet to wait for channel
- Drawback? Serialization latency experienced at each hop/channel

# Virtual Cut-through

- Why wait till entire message has arrived at each intermediate stop?
- The head flit of the packet can dash off first
- When the head gets blocked, whole packet gets blocked at one intermediate node
- Used in Alpha 21364

#### Time-space View: Virtual Cut-through

| (A) | O H B B B T<br>1 H B B B T<br>2 H B B B T<br>3 H B B B T<br>3 H B B B T                             | <ul> <li>Advantages?</li> <li>Reduced latency</li> </ul>                                                                                       |
|-----|-----------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| (B) | 0 1 2 3 4 5 6 7<br>Cycle No breaks<br>allowed<br>0 HBBBT /<br>1 HBBBT /<br>2 HBBBT HBBBT<br>3 HBBBT | <ul> <li>Disadvantages?</li> <li>Allocates channels and<br/>buffers in terms of<br/>packets reducing<br/>utilization and increasing</li> </ul> |
|     | 0 1 2 3 4 5 6 7 8 9 10<br>Cycle                                                                     | buffer contention                                                                                                                              |

# Flit-Buffer Flow Control: Wormhole

- When a packet blocks, just block wherever the pieces (flits) of the message are at that time.
- Operates like cut-through but with channel and buffers allocated to flits rather than packets
  - Channel state (virtual channel) allocated to packet so body flits can follow head flit

#### Time-space View: Wormhole



- Advantages?
- Disadvantages?

Smaller amount of buffer space required May block a channel mid-packet, another packet cannot use bandwidth

# Virtual-Channel (VC) Flow Control

- When a message blocks, instead of holding on to links so others can't use them, hold on to virtual links
- Multiple queues in buffer storage
   Lanes on the highway
- Virtual channel can be thought of as channel state and flit buffers

#### Time-space View: Virtual-Channel



- Advantages?
- Disadvantages?

Significantly reduces blocking

More complex router, fair VC allocation required Next Time:

# Router (Switch) Microarchitecture Routing Algorithms

http://www.csg.csail.mit.edu/6.823

Sanchez & Emer