High speed self-timed pipelined datapath for square
rooting
G.Cccno
G.Cocolo
Coronelo
.err
tt The authors describe a new high
performance self-timed circuit for asynchronous
square rooting. The new architecture is based on
a modified nonrestoring algorithm. An
asynchronous pipclincd cellular array without
auxiliary system for the identifcation of
exceptions will be demonstrated. The self-timing
approach allows the whole performance to be
greatly improved with respect to synl:hronous
implementation, causing acceptable area over
heads.
1 Introduction
Square-root computations, for continuous operation,
constitute a computational bottleneck in special-pur
pose hardware, such as in real-time image processors
and geometric-transform operators. In the last few
years, many square root algorithms have been studied
and full parallel synchronous hardware implementa
tionshavebeenproposed.
As is well known, the synchronous design approach
assumes that all circuit events are orchestrated by a
central clock. and its period has to be larger than the
worst-case delay of the slowest module. Very long
clock connections have lo be managed, wilh conse
quent power-consuming buffers and unavoidable clock
skew. Moreover, standard synchronous circuits have to
toggle clock lines in the unused portion ofthe circuit in
the current computation, producinguselesspowerdissi
pation. In these systems, detection of idle blocks and
shutting down and restarting high-speed clocks imply
unacceptable hardware andtimeoverheads.
On the other hand, the asynchronous design
approach is characterised by: local synchronisation;
average case performances of the combinational mod
ules used in the circuit; and efective shutdown of the
unused modules during computation. Obviously, these
advantages are not free; they come at the expense of
EE, 1999
lEE Proceedings onlineno. 1999027 1
10.1049!ip-cds:19990271
Paperfrst receved 24th Marehand revised23rdSeptember 1998
G. Cappino, G. Coeorullo . Perri are with the Department of
Elctronies,Computer SeienccandSystes,UnversityofCalabria-Aca
vacatadi Rende, 87036. Rende(CS),Ial
G. Cocorlo is also with IRECE. National Council of Research - Via
Dic1eiao328,80125,Napi ,Italy
P. Corsonello is with the Departent of Electronic Enneering and
Applied Maematics, University ofReggio , Calabia-Lc Feo de Vito,
89060 Regio, Calabria,Italy
16
the silicon area, which is caused by the handshaking
logic and end-completion-sensing modules. Moreover,
asynchronous systems are more difficulttodesign than
synchronous ones. In fact, the designer must pay a
great deal of attention to the dynamic state of the cir
cui. Neertheless, complex asynchronous systems, such
as digital signal processors or microprocessors, have
recently been demonstrated [I].
In many cases, the asynchronous design approach
has been chosen because it reduces power dissipation
and, consequently, reduces thermal problems. In many
other applications, self timing can greatly improve per
formance without signifcantly decreasing power dissi
pation. For example, this happens in the pipelined
data path that ofen runs in the continuous-operation
mode. In these cases, idletime neverexistsfor thecom
putational modules. However, self timing could speed
up the circuits, especially when each of the several
stages of the pipelined datapathcomputesits output in
adifferenttime. In fact, let N be the number of stages
constituting a generic pipelined datapath. Let
'1,
'
20 ...
,
'N and 'av" 'av2' ..., 'aN betheworst-case delayandthe
average-lase delay of the several stages, respectively.
Using the synchronous approach, the designer will
obtain a circuit with a latency equal to N*'clk and a
throughput rate of llrclk, where
'
>
max('I, '2, ... ,
'N)' The same circuit designed in asynchronous fashion
ill c kmpute the oUlpuls after an avtrage lalency equal
to
i- 'a i
at an average throughput rate of li'/Ii =
max( a, Ta2 .
..
, �/vN)'
The circuit known to be implemented, to establish
whether the asynchronous approach is more convenient
than the synchronous one, is nota trivial problem. The
designer must assurehimself that the advantage due to
the average-speed computation of the modules is not
annulled by the time overheads due to handshaking
and completion-detection circuitry. Further to this,
power consumption due to handshaking and comple
tion-sensing modules mustbe taken into account.
Many efcient proposals of self-timed adders, multi
pliers and dividers arc present inliterature. This paper
deals with a new self-timed pipelined cellular array for
square rooting. The circuit is based on a modified non
restoringalgorithmpreviously demonstrated. The asyn
chronous-design approach allows general performance
to be improved with respect to the synchronous sol
tion, with an acceptable area overhead.
2 Background to the algorithm and its
synchronous implementation
Nonrestoring square root algorithms are based on a
step-by-step result digit production by inspecting he
EE Proc.-Cir(uits Dt'vices 5I'S., Vol. 146, No.1. February 1999