### Outline

- 1. Design example: One-shot pulse generator
- 2. Design Example: GCD
- 3. Design Example: UART
- 4. Design Example: SRAM Interface Controller
- 5. Square root approximation circuit

| RTL Hardware Design<br>by P. Chu | Chapter 12 | 1 | RTL Hardware Design<br>by P. Chu | Chapter 12 | 2 |
|----------------------------------|------------|---|----------------------------------|------------|---|

3

# 1. One-shot pulse generator

**Register Transfer** 

Methodology II

- Sequential circuit divided into
  - Regular sequential circuit: w/ regular next-state logic
  - FSM: w/ random next-state logic
  - FSMD: w/ both
- Division for code development; no formal definition;
- Some design can be coded in different types
- FSMD is most flexible
- One-shot pulse generator as an example

RTL Hardware Design Chapter 12 by P. Chu

### • Refined block diagram of FSMD



· Basic block diagram



• Regular sequential circuit. E.g., mod-10 counter



RTL Hardware Design Chapter 12 6 by P. Chu





- One-shot pulse generator
  - I/O: Input: go, stop; Output: pulse
  - go is the trigger signal, usually asserted for only one clock cycle
  - During normal operation, assertion of go activates pulse for 5 clock cycles
  - If go is asserted again during this interval, it will be ignored
  - If stop is asserted during this interval, pulse will be cut short and return to 0

RTL Hardware Design by P. Chu FSM implementation



Chapter 12

RTL Hardware Design by P. Chu

9

10

12

```
-- next-state logic & output logic
process(state_reg.go.stop)
begin
pulse <= '0';
case state_reg is
when idle =>
    if go='1' then
    state_next <= delay1;
    else
        state_next <= idle;
    end if;
    when delay1 =>
        if stop='1' then
        state_next <=idle;
    else
        state_next <=idle;
    else
    }
}</pre>
```

RTL Hardware Design Chapter 12 by P. Chu

```
state_next <=delay3;
end if;
pulse <= '1';
when delay3 =>
if stop='1' then
state_next <=idle;
else
state_next <=delay4;
end if;
pulse <= '1';
when delay4 =>
if stop='1' then
state_next <=idle;
else
state_next <=delay5;
end if;
pulse <= '1';
when delay5 =>
state_next <=idle;
pulse <= '1';
end case;
end fom_arch;
RTL Hardware Design
Chapter 12
Cha
```



```
architecture fsmd_arch of pulse_5clk is
  constant P_WIDTH: natural:= 5;
  type fsmd_state_type is
  ignal state_reg, state_next: fsmd_state_type;
    signal c_reg, c_next: unsigned(3 downto 0);
begin
    -- state and data registers
    process(clk,reset)
    begin
    if (reset='1') then
       state_reg <= idle;
            c_reg <= (others=>'0');
    elsif (clk'event and clk='1') then
       state_reg <= state_next;
        c_reg <= c_next;
    end if;
```

| RTL Hardware Design | Chapter 12 | 17 |
|---------------------|------------|----|
| by P. Chu           |            |    |

```
-- next-state logic & data path functional units/routing
process(state_reg,go,stop,c_reg)
begin
    pulse <= '0';
    c_next <= c_reg;
    case state_reg is
        when idle =>
        if go*'1' then
            state_next <= delay;
        else
            state_next <= idle;
        end if;
        c_next << (others=>'0');
    when delay =>
        if dos'1' then
        state_next <=idle;
        else
        if dosp'1' then
        state_next <=idle;
        else
        if dosp'1' then
        state_next <=idle;
        else
        if stop=1'1 then
        state_next <=idle;
        else
        if stop=1'1 then
        state_next <=idle;
        else
        if c_reg=P_WIDTH-1) then
        state_next <=idle;
        else
        state_next <=idle;
        state_next <=idle;
        else
        state_next <=idle;
        else
        state_next <=idle;
        state_next
```

- Comparison:
  - FSMD is most flexible and easy to comprehend
- What happens to the following modifications
  - The delay extend from 5 cycles to 100 ccyles
  - The  ${\tt stop}$  signal is only effective for the first 2
  - delay cycles and will be ignored otherwise

- "Programmable" one-shot generator
  - The desired width can be programmed.
  - The circuit enters the programming mode when both go and stop are asserted
  - The desired width shifted in via go in the next three clock cycles

| RTL Hardware Design by P. Chu | Chapter 12 | 19 | RTL Hardware Design by P. Chu | Chapter 12 | 20 |
|-------------------------------|------------|----|-------------------------------|------------|----|
|                               |            |    |                               |            |    |

- Can be easily extended in ASMD chart
- How about FSM and regular sequential circuit?



RTL Hardware Design by P. Chu

## 2. GCD circuit

- GCD: Greatest Common Divisor
   E.g, gcd(1, 10)=1, gcd(12,9)=3
- GCD without division:

$$gcd(a,b) = \begin{cases} a & \text{if } a = b \\ gcd(a-b,b) & \text{if } a > b \\ gcd(a,b-a) & \text{if } a < b \end{cases}$$

RTL Hardware Design by P. Chu 22

#### Pseudo algorithm

```
a = a_in;
b = b_in;
while (a /= b) {
    if (b > a) then
        a = a - b;
    else
        b = b - a;
    end if
}
r = a:
```

RTL Hardware Design Chapter 12 23 by P. Chu

#### • Modified pseudo algorithm w/o while loop

Chapter 12

```
a = a_{in};
        b = b_{in};
swap: if (a = b) then
            goto stop;
        else
            if (b > a) then -- swap a and b
               a = b;
               b = a;
            end if;
            a = a - b;
            goto swap;
        end if;
stop: r = a;
RTL Hardware Design
by P. Chu
                      Chapter 12
                                                24
```



- What is the problem of this code?
- Another observation

$$\gcd(a,b) = \begin{cases} a & \text{if } a = b \\ 2 \gcd(\frac{a}{2}, \frac{b}{2}) & \text{if } a \neq b \text{ and } a, b \text{ even} \\ \gcd(a, \frac{b}{2}) & \text{if } a \neq b \text{ and } a \text{ odd, } b \text{ even} \\ \gcd(\frac{a}{2}, b) & \text{if } a \neq b \text{ and } a \text{ even, } b \text{ odd} \\ \gcd(a - b, b) & \text{if } a > b \text{ and } a, b \text{ odd} \\ \gcd(a, b - a) & \text{if } a < b \text{ and } a, b \text{ odd} \end{cases}$$

| RTL Hardware Design |  |
|---------------------|--|
| by P. Chu           |  |

29



30

- What is the performance now?
- Can we do better with more hardware resources

31

33

### Square root approximation circuit

- A example of data-oriented (computationintensive) application
- Equation:

```
\begin{array}{ll} \sqrt{a^2+b^2} &\approx& \max(((x-0.125x)+0.5y),x)\\ && \text{where } x=\max(|a|,|b|) \text{ and } y=\min(|a|,|b|) \end{array}
```

• 0.125x and 0.5y corresponds to shift right 3 bits and 1 bit

RTL Hardware Design Chapter 12 32 by P. Chu

· Pseudo code:

RTL Hardware Design by P. Chu

|                                  | <pre>a = a_in;<br/>b = b_in;<br/>t1 = abs(a);<br/>t2 = abs(b);<br/>x = max(t1, t2);<br/>y = min(t1, t2);<br/>t3 = x*0.125;<br/>t4 = y*0.5;<br/>t5 = x - t3;<br/>t6 = t4 + t5;<br/>t7 = max(t6, x);</pre> |
|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                  | r = t7;                                                                                                                                                                                                  |
| RTL Hardware Design<br>by P. Chu | Chapter 12                                                                                                                                                                                               |

#### • Direct "data-flow" implementation

```
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity sqrt is
    port(
        a_in, b_in: in std_logic_vector(7 downto 0);
        r: out std_logic_vector(8 downto 0)
    );
end sqrt;
architecture comb_arch of sqrt is
    constant WIDTH: natural:=8;
    signal a, b, x, y: signed(WIDTH downto 0);
    signal t1, t2, t3, t4, t5, t6, t7: signed(WIDTH downto 0);
    Signal t1, t2, t3, t4, t5, t6, t7: signed(WIDTH downto 0);
    signal t1, t2, t3, t4, t5, t6, t7: signed(WIDTH downto 0);
    Signal t1, t2, t3, t4, t5, t6, t7: signed(WIDTH downto 0);
```

begin a <= signed(a\_in(WIDTH-1) & a\_in); -- signed extension b <= signed(b\_in(WIDTH-1) & b\_in); t1 <= a when a > 0 else 0 - a;  $t2 \le b$  when b > 0 else 0 - b;  $x \le t1$  when t1 - t2 > 0 else t2;  $y \le t^2$  when  $t1 - t^2 > 0$  else t1; t1; t3 <= "000" & x(WIDTH downto 3); t4 <= "0" & y(WIDTH downto 1); t5 <= x - t3;  $t5 \le x - t0$ ,  $t6 \le t4 + t5$ ;  $t7 \le t6$  when t6 - x > 0 else х; r <= std\_logic\_vector(t7); end comb\_arch; 35 RTL Hardware Design by P. Chu Chapter 12

- · Requires one adder and six subtractors
- Code contains only concurrent signal assignment statements
- The order is not important.
- Sequence of execution is embedded in the flow of data

Chapter 12

RTL Hardware Design by P. Chu

36

· Data flow graph

RTL Hardware Design by P. Chu

- Shows data dependency
- Node (circle): an operation
- Arches: input and output
- variables · Note that there is limited degree of parallelism
  - At most two operations can be perform simultaneously



- RT methodology can be used to share the operator
- · Tasks in converting a dataflow graph to an ASMD chart
  - Scheduling: when a function (circle) can start execution
  - Binding: which functional unit is assigned to perform the operation
- · In square root algorithm,
  - all operations can be performed by a modified addition unit
  - No function unit is needed for shifting

RTL Hardware Design by P. Chu Chapter 12 38

• Scheduling with two functional units

Chapter 12





 ASMD chart





RTL Hardware Desig by P. Chu

RTL Hardware Design by P. Chu

Chapter 12

42

40



 Scheduling with one functional unit



RTL Hardware Design by P. Chu

· Registers can be shared as well

- reduce the number of unique variables - A variable can be reused if its value is no longer
  - needed
- E.g.,
  - Use r1 to replace a, t1 and y.
  - Use r2 to replace b, t2 and x.
  - Use r3 to replace t5, t6 and t7.





```
    Needs to manually code the data path two
insure functional units sharing
```

- One unit for abs and min
- One unit for abs, min, and +
- Can be implemented by using an adder/subtractor with special input and output routing circuits

RTL Hardware Design by P. Chu

44

46

48

```
case state_reg is
  when idle =>
    if start='1' then
    r1_next <= signed(a_in(WIDTH-1) & a_ii
    r2_next <= signed(b_in(WIDTH-1) & b_ii
    state_next <= s1;
    close
}</pre>
                       -- state & data registers
                       process (clk, reset)
                       begin
                                                                                                                                                                                                                                state_next <= s1;
else
state_next <= idle;
end if;
ready <= '1';
when s1 =>
r1_next <= au1_out; --- t/ = |a|
r2_next <= au2_out; --- t2 = |b|
state_next <= s2;
when s2 =>
                                 if reset='1' then
                                 state_reg <= idle;
r1_reg <= (others=>'0');
r2_reg <= (others=>'0');
r3_reg <= (others=>'0');
elsif (clk'event and clk='1') then
                                                                                                                                                                                                                                state_next <= s2;
when s2 =>
r1_next <= au1_out; --- y=min(i1, i2)
r2_next <= au2_out; --- x=max(i1, i2)
state_next <= a3;
when s3 =>
r3_next <= au2_out; --- i5=x-0.125x
state_next <= s4;
when s4 =>
r3_next <= au2_out; --- i6=0.5y+i5
state_next <= s5;
when s5 =>
r3_next <= au2_out; --- i7=max(i6, x)
state_next <= idle;
d case;
                                          state_reg <= state_next;</pre>
                                           r1_reg <= r1_next;
                                          r2_reg <= r2_next;
                                          r3_reg <= r3_next;
                                 end if;
                       end process;
RTL Hardware Design 
by P. Chu
                                                                   Chapter 12
                                                                                                                                                    45
                                                                                                                                                                                                         RTL Hardw stat
by P. Chu end case;
```

```
-- arithmetic unit 1
-- subtractor
diff <= sub_op0 - sub_op1;
-- input routing
process(state_reg,r1_reg,r2_reg)
begin
case state_reg is
when s1 => -- 0-a
sub_op0 <= (others=>'0');
sub_op1 <= r1_reg; -- a
when others => -- s2: t2-t1
sub_op1 <= r1_reg; -- t2
sub_op1 <= r1_reg; -- t1
end case;
end process;</pre>
```

RTL Hardware Design by P. Chu

| Chapter 12 | 47 | RTL Hardware Design | Chapter 12 |  |
|------------|----|---------------------|------------|--|
|            |    | by P. Chu           |            |  |

# High-level synthesis

- Convert a "dataflow code" into ASMD based code (RTL code).
  - RTL code can be optimized for performance (min # clock cycles), area (min # functional units) etc.

49

- Perform scheduling, binding
- Minimize # registers and muxes
- Mainly for computation intensive applications (e.g., DSP)

RTL Hardware Design Chapter 12 by P. Chu