Archive for October, 2008

Balanced Logic and Glitches

Glitches, the result of different time delays in logic gates, is responsible for part of the dynamic power consumption. A quantitave figure for glitching is difficult to estimate since it depends very much on the structure of our arithmetic block. Given the same bit-length, ripple structures are more affected by glitches than tree structures due to a higher logic depth.  In general, dynamic power is proportional to the product of the activity factor and capacitance, i.e.

P\propto\alpha\cdot C.

Balanced logic is supposed to reduce glitches and therefore power consumption. Most of the times, this does not come for free and the total capacitance is increased due to additional transistors. A very simple analysis shows that reducing power by reducing glitching is not an easy task. Let’s assume that

\alpha = (1-\rho)\cdot\alpha_{g}+\alpha_{t},

where $latex \rho$ is our glitching reduction factor due to a balanced structure, \alpha_g is the glitching activity factor and \alpha_t is the dynamic activity factor (which depends only on the input signal probabilities). On the other hand,

C = (1+\psi)\cdot C_{g}+C_{w},

where \psi is the capacitance increase factor, C_g is the total gate capacitance and C_w is the total wiring capacitance (we assume that only the gate capacitance is affected).

Let C_T = C_g + C_w and \alpha_T = \alpha_g + \alpha_t be the total capacitance and the total actitivity factor, respectively. Then, we obtain,

\alpha\cdot C=C_T\alpha_T + \{\alpha_T\psi C_g - \rho \alpha_g(C_T+\psi C_g)\}.

Now, to get a reduction in power consumption the second term in the right-hand-side must be lower than zero. In other words, the reduction in glitching activity must be bigger than,

\rho > (1+\frac{\alpha_t}{\alpha_g})\cdot\frac{1}{1+\frac{1}{\psi}(1+\frac{C_w}{C_g})}.

In order to get some feeling, let’s assume that the wiring capacitance is neglectable, C_w=0 , and that our balanced logic increases the gate capacitance by 20%, \psi=0.2. It results that it is IMPOSSIBLE to reduce power consumption if the glitching activity accounts for at least 20% of the transition activity, \alpha_g=0.2\alpha_t. (Just for reference, Veendrick, in his textbook, mentions that in a 8-bit ripple adder glitching accounts for as much as 30%.)

Also, this means that in a tree adder like, for example, a Brent-Kung adder, there is little or no benefit when balancing the tree cells.

Even if the wiring capacitance were not neglectable (deep-submicron technologies), say C_w = 3C_g, and we were able to reduce as much as 50% of glitching, then the power reduction would account for as little as 3.8% ! (assuming once again \psi=0.2 and \alpha_g=0.2\alpha_t).

Leave a Comment

Digital Arithmetic in HDL

Digital Arithmetic is mostly about architectures and its implementation. In full-custom design this observation is trivial but in standard-cell-based design is not so obvious. Standard-cell-based design makes use of HDL (Hardware Description Language) code written in Verilog or VHDL which is converted into a netlist (standard-cells and nets described in HDL) by an automatic synthesis tool. Since HDL code is essentially code, a good coding style is obviously very important. As it can be seen in the following, not only a good coding style proves to be important, but thinking in terms of HW implementation proves to be crucial.

As usual, we illustrate by example. A designer is asked to implement a 11-bit counter with increment by 3 and a wrap-around at 1028. The pseudo-code of a naive designer may look like this,


addr_temp = addr_in + 3;
if (addr_temp == 1028) then       addr_out = 0;
else if (addr_temp == 1029) then  addr_out = 1;
else if (addr_temp == 1030) then  addr_out = 2;
else                              addr_out = addr_temp; 

Well, the synthesis tool probably infers three 11-bit equality operators after the adder and a priority decoder (if-then-else). Can we do better? In a first try, we note that 1028 = 11′b100_0000_0100 and do as follows,


addr_temp = addr_in + 3;
addr_out = {(addr_temp[10] & ~addr_temp[2]), addr_temp[9:3],
 (addr_temp[2] & ~addr_temp[10]), addr_temp[1:0]};

Can we do much better? Sure. 

What if we have to design the same but with a wrap-around at 1025?

Anyway, the point is that describing arithmetic in HDL is more than simply writing functional code, it is more about thinking in terms of structures.

Leave a Comment