Archive for September, 2008

Fast end-around-carry adders

Both diminished-one addition ( modulo-{2^n+1} ) and one’s complement addition ( modulo-{2^n-1} ) make use of this type of adders. The last carry output must be added to the final sum, thus, the so-called end-around carry. Prefix adders integrate the end-around carry perfectly as shown in the following figure (taken from http://www.iis.ee.ethz.ch/~zimmi/publications/adder_arch.pdf),

where the c_{out} is fed back into the c_{in} in the last row of muxes. If diminished-one addition is to be implemented, then, the c_{in} must be inverted.

Note that addition has been incremented by just one logic level more. The only disadvantage of a structure like this is the high load seen by c_{in}. Therefore, appropiate drivers/buffers are needed.

For full-custom implementations, the carry-increment adder is my favorite one. A 4-bit and a 5-bit example follows

Leave a Comment

A 2-level Carry-Increment Adder

If our previous full-custom 1-level carry-increment adder does not achieve the required throughput we can do much better. The idea is to design a prefix structure with less logical levels. At the same time, we must try to keep regularity to some extent intact. By “keeping regularity to some extent intact” we mean 

  1. to keep the number of leafcells to a minimum –minimum design effort–,
  2. to have the layout organized into abutted bit-slice cells –small area– and,  
  3. to have locality, i.e. local-routing –low-power  and high-throughput–.

In the following we illustrate by example: the starting point being a 32-bit 1-level carry-increment adder (block scheme 1,2,3,4,5,4,4),

Now, by applying the idempotent and associative properties of the prefix operator we reduce the logical levels from 8 to 6, 

Note that a new leaf cell (lc_pa2) has been introduced and that every carry in bit position i is the computation of carries in bit position i-1 to 0. A more detailed version of the previous prefix structure showing just the first 16 bits follows,

where a carry-in has been added. This figure includes the pre- and post-processing blocks in green, that is, the generation of the g-t-p-signals (generate, transmit and propagate) and the final sum, respectively.

A possible implementation of leaf-cell lc_pa2 is next shown

Leaf cells lc_pa0 and lc_pa1 can be found in a previous post.

Leave a Comment

Static versus Dynamic CMOS Logic Circuits

Not long ago I had the opportunity to meet Pat Bosshart, a Texas Instruments (TI) Fellow, in a MEAD course in Lausanne, Switzerland. He spent more than an hour explainig dynamic logic circuits he used in some of the SPARC datapath designs. At the end of the lecture it was “good” to hear that the design effort is many times bigger than that of static logic designs due to its full-custom nature and lack of automatic tools. Unfortunately, he did not mention when TI gave up using dynamic logic circuits in datapaths (except for latches, storage, etc…) but I guess it has been already a while. (Just a note; the SPARC team who designed the 3rd generation CMT SPARC processor claim a 50% power and area reduction using semicustom static logic instead custom dynamic logic in the floating point adder! All of that keeping timing! See IEEE Journal of Solid-State Circuits, vol. 44 no. 1, Jan 2009).

On the other hand and to my best knowledge IBM has been giving up using dynamic logic in datapath designs (high-performance microprocessors) since the mid-nineties for low-power reasons. (See for example http://www.research.ibm.com/journal/rd/414/schwarz.html; Sorry, as of Aug 2009, IBM denies free access to its publications).

It seems to me that the rule in designing datapaths should be read as follows: “if delay goals are met using all static CMOS circuits, ok. If not, try hard!”

Leave a Comment