A CMOS flip-flop featuring embedded Threshold logic functions Marius Padure, Sorin Cotofana, and Stamatis Vassiliadis Computer Engineering Laboratory Delft University of Technology Mekelweg 4, 2628 CD, Delft, The Netherlands Email: {marius,sorin,stamatis}@ce.et.tudelft.nl Abstract— This paper describes a semi-dynamic CMOS flip-flop family featuring embedded Threshold Logic func- tions. First, we describe the new Threshold Logic flip- flop concept and circuit operation. Second, we present the concepts of embedded Threshold logic and run-time repro- grammability. Finally, it is proved by Spice simulation re- sults that wide (up to 8 inputs) AND/OR Boolean functions can be embedded in the newly proposed Threshold Logic flip-flop with up to less total latency when compared with the conventional flip-flop featuring the same embedded Boolean functions. Therefore proposed flip-flop is very at- tractive for high-performance pipelined arithmetic units. Keywords—CMOS digital design, flip-flops, Threshold logic, computer arithmetic I. I NTRODUCTION The continual push for higher clock rates and higher per- formance has led microprocessor designers in recent years to design super-pipelined machines with multiple func- tional units that can execute operations concurrently. High clock rates in these machines are often achieved with high granularity pipelining, for which there are relatively few levels of logic gates per pipeline stage. One direct con- sequence of this design trend is that pipeline overhead is becoming more significant. This pipeline overhead is pri- marily due to the latency of the flip-flop or latch used and the clock skew of the system. While the clock skew varies, the latency of the flip-flop cannot be hidden. As an exam- ple, assuming that a flip-flop latency is four gates delay and that the clock cycle of a state-of-the-art microprocessor is 20 gates delay, the flip-flop overhead amounts of the cycle time. This is a substantial penalty that degrades the overall performance of the system, since no useful logic operation is performed on the data when is being latched. The idea of incorporating logic functions into storage el- ements to improve the critical path latency have emerged in the last decade as a potential alternative for meeting the cycle time goal of processors [5], [1]. The challenge has been to develop latch structures that can embed logic functions efficiently, in terms of both total latency (defined as the sum of setup time and clock-to-output latency) and area. While previously published flip-flops have embed- ded simple Boolean functions (AND/OR of few inputs), no attempt has been done to incorporate Threshold Logic functions into the storage elements. It is well known that TL is fundamentally more power- ful that Boolean logic since the TL gate (when envisioned as combinatorial element) can perform more complex and wider functions than the usual Boolean CMOS gates (e.g., NAND, OR) can. More formally, a Threshold Logic Gate (TLG) is defined as an -input processing element such that its output performs the following Boolean function 1 : 0 0 (1) (2) where , and are the set of Boolean input variables, the set of fixed signed integer weights associated with data inputs, and the fixed signed integer threshold, respectively [2]. Several recent theoretical investigations [6], [7] have indicated that computer arithmetic building blocks (e.g., adders and multipliers) can be implemented in TL with smaller number of logic gates and fewer logic stages when compared with traditional Boolean logic counter- parts. Therefore, embedding TL functions in the storage elements may have a direct impact over the pipeline over- head. In this paper we present a new class of flip-flop fea- turing embedded Threshold Logic functions to reduce the pipeline overhead. The main features of the basic design are short latency and a single phase clock scheme. Further- more, this flip-flop has the capability of incorporating re- configurable Threshold Logic functions with a small total latency when compared with conventional flip-flops fea- turing embedded functions capability. This feature greatly reduces the pipeline overhead, since each flip-flop can be All the operators are algebraic. 388