



## CS230: Digital Logic Design and Computer Architecture Lecture 22: Connecting All the Dots (O3 processor)

https://www.cse.iitb.ac.in/~biswa/courses/CS230/main.html

https://www.cse.iitb.ac.in/~biswa/

Remember out-of-order processor

Inorder fetch, out-of-order execute, inorder commit

Now, we will discuss in details

Static scheduling: Compiler can do

**Computer Architecture** 

## Recap



## Points to remember

- Control & buffers <u>distributed</u> with Function Units (FU)
  - FU buffers called "<u>reservation stations</u>"; have pending operands
- Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming;
  - avoids WAR, WAW hazards
  - More reservation stations than registers, so can do optimizations compilers can't

4

- Results to FU from RS over <u>Common Data Bus</u> that broadcasts results to all FUs
- Load and Stores treated as FUs with RSs as wells
- Decode stage of the pipeline: becomes two stages:

Issue: Decode instructions, check structural hazards

Read operands: Wait until no data hazards, then read operands.

Some processors use the term dispatch and issue. Computer Architecture

## **Reservation Station Components**

Op: Operation to perform in the unit (e.g., + or –)

Vj, Vk: Value of Source operands

- Store buffers has V field, result to be stored

Qj, Qk: Reservation stations producing source registers (value to be written)

- Qj,Qk=0 => ready
- Store buffers only have Qi for RS producing result

Busy: Indicates reservation station or FU is busy

Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register. Computer Architecture 5

## The New Pipeline

Inorder Instruction fetch

Fetched instructions enqueued into a Q called **Instruction Q (IQ)**.

Inorder instruction issue from the IQ

**Outoforder** execution

New concept of register renaming through reservation stations that eliminates WAR and WAW hazards

## An Example



Cycle 1



Load: 2 cycle FP add: 2 cycles FP multiply: 10 cycles FP divide: 40 cycles

Cycle 2



#### can have multiple loads outstanding

## Cycle 3



 Note: registers names are removed ("renamed") in Reservation Stations; MULT issued

## Register Renaming

- Tomasulo provides Implicit Register Renaming
  - User registers renamed to reservation station tags
- Explicit Register Renaming:
  - Use *physical* register file that is larger than number of registers specified by ISA
- Keep a translation table:
  - –ISA register => physical register mapping
  - Physical register becomes free when not being used by any instructions in progress. More later after ROB.

## Explicit Register Renaming

- Rapid access to a table of translations
- A physical register file that has more registers than specified by the ISA
- Ability to figure out which physical registers are free.
  - -No free registers  $\Rightarrow$  stall on issue
- Thus, register renaming doesn't require reservation stations. However:
  - Many modern architectures use explicit register renaming + Tomasulo-like reservation stations to control execution.

# Tomasulo, O3 completion, we need inorder complete (commit) ③



- Instructions fetched and decoded into instruction reorder buffer in-order
- Execution is out-of-order (  $\Rightarrow$  out-of-order completion)
- Commit (write-back to architectural state, i.e., regfile & memory) is in-order

Temporary storage needed to hold results before commit (shadow registers and store buffers)

## Dynamic scheduling with speculative execution



Need as many ports on ROB as register file

## Speculative O3 with Tomasulo and ROB

#### 1. **Issue**—get instruction from FP Op Queue

If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called "dispatch")

#### 2. Execution—operate on operands (EX)

When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called "issue")

#### 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs

& reorder buffer; mark reservation station available.

4. Commit—*When instruction reaches head of the ROB,* update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called "graduation")







## Memory Disambiguation

- Question: Given a load that follows a store in program order, are the two related?
  - (Alternatively: is there a RAW hazard between the store and the load)?
    - Eg: st 0(R2),R5 ld R6,0(R3)
- Can we go ahead and start the load early?
  - Answer is that we are not allowed to start load until we know that address  $O(R2) \neq O(R3)$

### Intel Core Microarchitecture



Intel Core 2 Architecture



