A CMOS ﬂip-ﬂop featuring embedded Threshold logic functions Marius Padure, Sorin Cotofana, and Stamatis Vassiliadis Computer Engineering Laboratory Delft University of Technology Mekelweg 4, 2628 CD, Delft, The Netherlands Email: {marius,sorin,stamatis}@ce.et.tudelft.nl Abstract— This paper describes a semi-dynamic CMOS ﬂip-ﬂop family featuring embedded Threshold Logic func- tions. First, we describe the new Threshold Logic ﬂip- ﬂop concept and circuit operation. Second, we present the concepts of embedded Threshold logic and run-time repro- grammability. Finally, it is proved by Spice simulation re- sults that wide (up to 8 inputs) AND/OR Boolean functions can be embedded in the newly proposed Threshold Logic ﬂip-ﬂop with up to less total latency when compared with the conventional ﬂip-ﬂop featuring the same embedded Boolean functions. Therefore proposed ﬂip-ﬂop is very at- tractive for high-performance pipelined arithmetic units. Keywords—CMOS digital design, ﬂip-ﬂops, Threshold logic, computer arithmetic I. I NTRODUCTION The continual push for higher clock rates and higher per- formance has led microprocessor designers in recent years to design super-pipelined machines with multiple func- tional units that can execute operations concurrently. High clock rates in these machines are often achieved with high granularity pipelining, for which there are relatively few levels of logic gates per pipeline stage. One direct con- sequence of this design trend is that pipeline overhead is becoming more signiﬁcant. This pipeline overhead is pri- marily due to the latency of the ﬂip-ﬂop or latch used and the clock skew of the system. While the clock skew varies, the latency of the ﬂip-ﬂop cannot be hidden. As an exam- ple, assuming that a ﬂip-ﬂop latency is four gates delay and that the clock cycle of a state-of-the-art microprocessor is 20 gates delay, the ﬂip-ﬂop overhead amounts of the cycle time. This is a substantial penalty that degrades the overall performance of the system, since no useful logic operation is performed on the data when is being latched. The idea of incorporating logic functions into storage el- ements to improve the critical path latency have emerged in the last decade as a potential alternative for meeting the cycle time goal of processors [5], [1]. The challenge has been to develop latch structures that can embed logic functions efﬁciently, in terms of both total latency (deﬁned as the sum of setup time and clock-to-output latency) and area. While previously published ﬂip-ﬂops have embed- ded simple Boolean functions (AND/OR of few inputs), no attempt has been done to incorporate Threshold Logic functions into the storage elements. It is well known that TL is fundamentally more power- ful that Boolean logic since the TL gate (when envisioned as combinatorial element) can perform more complex and wider functions than the usual Boolean CMOS gates (e.g., NAND, OR) can. More formally, a Threshold Logic Gate (TLG) is deﬁned as an -input processing element such that its output performs the following Boolean function 1 : 0 0 (1) (2) where , and are the set of Boolean input variables, the set of ﬁxed signed integer weights associated with data inputs, and the ﬁxed signed integer threshold, respectively [2]. Several recent theoretical investigations [6], [7] have indicated that computer arithmetic building blocks (e.g., adders and multipliers) can be implemented in TL with smaller number of logic gates and fewer logic stages when compared with traditional Boolean logic counter- parts. Therefore, embedding TL functions in the storage elements may have a direct impact over the pipeline over- head. In this paper we present a new class of ﬂip-ﬂop fea- turing embedded Threshold Logic functions to reduce the pipeline overhead. The main features of the basic design are short latency and a single phase clock scheme. Further- more, this ﬂip-ﬂop has the capability of incorporating re- conﬁgurable Threshold Logic functions with a small total latency when compared with conventional ﬂip-ﬂops fea- turing embedded functions capability. This feature greatly reduces the pipeline overhead, since each ﬂip-ﬂop can be All the operators are algebraic. 388