Design and characterisation of a CMOS VLSI self-timed multiplier architecture based on a bit-level pipelined-array structure zyxw A.J.Acosta R. Jimenez A. Barriga M.J.Bellido M . Va I encia J.L.Huertas zyxwvutsrqpon Indexing terms: VLSI electronics,Pipelined arrays, Multipliers, Self timed circuits zyxwvutsr Abstract: zyxwvutsrqpo The authors describe the design, integration and characterisation of a bit-level pipelined self-timed multiplier architecture. The differential structure SODS (switched-output differential struciture) has been used for computation blocks and the PLCAR structure (protocol and latching controlled by acknowledge and request) for the interface blocks, introduced in an array-based architecture. A 4 x 4-bit multiplier has been integrated in a 1 . 0 ~ CMOS technology and the proposed architecture has been compared with other asynchronous approaches, showing a considerable improvement, up zyxwvutsrq io 50% in terms of area, speed and power consumption. Compared with a synchronous approach, the main advantage of the proposed architecture is a lower power consumption below a certain incoming input data rate, but at the expense of area and speed. 1 Introduction The present advances in the scale of integration of electronic devices make timing problems more important in the design of VLSI CMOS digital systems. Problems with synchronous designs, such as uncontrollable clock :skew and synchronisation failures, make the asynchronous strategy appear as a valid alternative to the syiichronous implementations [ 1-31. Among several asynchronous techniques, the self-timed one is very promising. Self-timed circuits do not have to transmit any global clock signal; moreover, this technique may be seen as a locally synchronous technique where each block generates the clock signal for the following block. This asynchronous strategy has a potentially higher average speed, because each zyxwvu 0 IEE, 1998 IEE Proceedings online no. (9982125 1997 The authors are with the In,stituto de Microelectronica de Sevilla, Centro Nacional de Microelectronica, Edificio CICA?Avda. Reina Mercedes dn, 41012-Sevilla, Spain Paper first received 14th May 1996 and in final revised form 23rd April module operates at the highest possible speed; and has a greater adaptation to variations in supply voltage, temperature and deviations in the technological process; and it has lower power consumption, because each module only operates when there are data to be processed [l-31. This paper has two basic objectives. The first is to present an array-based architecture, with the design for the pipeline stages, the data-latching schedule and the handshaking protocol. The second is to apply the pro- posed architecture to the design of a self-timed 4 x 4- bit bit-level pipelined multiplier, integrated in a 1 .Ow CMOS technology, with single-rail data transmission. For these purposes, the linear architecture PLCAR [4] has been extended to the two-dimensional case. The global aim is to make a self-timed design, with reduced area and improved performance. On the other hand, it is intended to make an operative self-timed integrated circuit as a demonstration of robustness in designing self-timed circuits with a highly automated layout tool (LAS tool, from CADENCE). Finally, a comparison with other asynchronous and synchronous designs is performed, demonstrating the improvements in terms of power consumption for a specific input-data rate. zy 2 Within current design techniques for self-timed arrays, those that are especially suited for CMOS VLSI design are the ones that explicitly separate the computation part which performs the logic function from the inter- face part which control the data flow between compu- tation blocks [2, 31. Based on this approach, the SODS circuit [5] was chosen for the computation block, and a modification of the PLCAR architecture [4] was used as the interface circuit. This architecture is a modifica- tion of a well known self-timed architecture [3], improving the scheme of data storage. Basic blocks and array architecture 2. zyxwv I Computation block The computation blocks are usually realised with differential structures which generate both the true and the complemented output and a complete signal when the logic function is valid. In such structures, two operation phases are considered: precharge and evaluation. The chosen structure (Fig. 1) is switched- 241 IEE Pvoc.-Circuits zyxwvutsrqponmlkji Devices Sy~r., Vol. 145, No. 4, August 1998