A METAL and VIA Maskset Programmable VLSI Design Methodology using PLAs Nikhil Jayakumar Sunil P Khatri Department of EE, Texas A&M University, College Station TX 77843. Abstract In recent times there has been a substantial increase in the cost and complexity of fabricating a VLSI chip. The lithography masks them- selves can cost between $1M and $3M. It is conjectured that due to these increasing costs, the number of ASIC starts in the last few years has declined. In this paper, we address this problem by using an array of dynamic PLAs which require only METAL and VIA mask cus- tomization in order to implement a new design. This would allow sev- eral similar-sized designs to share the same base set of masks (right up to the metal layers) and only have different METAL and VIA masks. We have implemented our methodology for both combinational and sequential designs, and demonstrate that our approach strikes a rea- sonable compromise between ASIC and field programmable design methodologies in terms of placed-and-routed area and delay. Our method has a 2.89× (3.58×) delay overhead and a 4.96× (3.44×) area overhead compared to standard cells for combinational (sequen- tial) designs. 1 Introduction With the relentless reduction in the minimum feature sizes of mod- ern Deep Sub-micron (DSM) VLSI fabrication processes, the com- plexity of fabrication is increasing at an alarmingly rate. Simultane- ously, the number and cost of a full set of masks has been increasing rapidly. It is not uncommon for a full set of lithography masks to cost over $1-3M [1, 2]. This change has contributed to a roughly 25% reduction [2] in Application Specific Integrated Circuit (ASIC) de- sign starts in the last 7 years. It is believed that cell based ASICs are becoming prohibitively expensive except for very high volume appli- cations [2]. In this paper, we introduce a new VLSI design approach to address this problem and minimize the non-recurring expense (NRE) involved with IC design. Our approach utilizes an array of precharged Pro- grammable Logic Arrays (PLAs) with flip-flops co-located at their outputs, as its underlying circuit structure. We envision that a manu- facturer would stock such arrays (of varying sizes), pre-processed up until the metalization step. To create an ASIC for a given design, the manufacturer would technology map this design to the smallest avail- able array. After technology mapping and routing of the design, the METAL and VIA masks (the only masks that require changes) would be generated and used to customize or personalize the array to im- plement the design. At this point, the manufacturer could process the remaining masks, to obtain the final design. Alternately, the manu- facturer could perform all steps of processing, using old masks for all other layers and the new METAL and VIA masks for customization of the design. The latter option might be used by manufacturers who do not have experience in warehousing partially completed wafers. 1 . Since all other masks except METAL and VIA masks remain unal- 1 Also, as the industry starts to move toward the highly absorptive and fragile low-k dielectric materials in the metal stack, the shelf life and the nature of contamination risks are not well known [3] tered, the manufacturer can realize the design in a low cost manner, by amortizing the bulk of the NRE over a large number of designs. Fur- ther, the manufacturer could spend a considerable effort in optimizing these designs for maximum yield, and this effort would be amortized over a large number of designs that share the common masks. Addi- tionally, such an approach could result in a reduced processing time for a new design. Processing for a modern IC can take anywhere from 3 weeks up to a few months [4]. This methodology can therefore help reduce design turnaround time by stockpiling wafers which have been processed up to the metalization step. Also, this methodology simpli- fies the task of engineering change. When a bug is discovered and the design needs to be modified, our methodology would reduce the cost and time to modify the design (since it requires only METAL and VIA mask changes). After METAL and VIA mask customization, the design would be transformed into a network of precharged PLAs [5]. Such an imple- mentation methodology was demonstrated to be fast and area-efficient compared to a standard cell approach. As shown in [5], for a network of PLAs there is a more direct relationship between the cost func- tion being optimized for during logic synthesis (literal count), and the actual PLA implementation. In a standard cell based flow, there is an intervening technology mapping step, which often negates the benefits of technology-independent logic optimization. A network of PLAs on the other hand, allows us to carry forward the benefits of technology-independent multi-level logic synthesis We leverage this feature of the network of PLA design style in our work. In contrast to the work of [5], we are able to handle both com- binational and sequential designs. Also, the mask programmability feature of our approach makes our PLAs design and layout quite dif- ferent from those in [5]. In recent times, PLAs have experienced a renewed interest as a circuit implementation style for high-performance designs. The IBM Gigahertz processor [6] utilized PLAs 2 to implement control logic, due to their high speed and because they provide the ability to quickly implement and modify the design. The remainder of this paper is organized as follows. The next Sec- tion 2 talks about methods similar to our own. Section 3 describes our design flow, while Section 4 describes our experimental results. Finally, in Section 5, we make concluding comments and discuss fur- ther work that needs to be done in this area. 2 Previous Work In the past, gate arrays [7, 8] have been used as an implementation method in which a design can be personalized via and metal cus- tomization. This approach was popular until standard cell based de- sign became the dominant means to design ICs. The speed of our ap- proach is based on the fact that wiring is embedded inside our PLAs. This is not true for gate arrays. Also, the P and N diffusions in any row of the gate array need to be separated, resulting in larger area 2 Note that single PLAs used as opposed to a network of PLAs 0-7803-8702-3/04/$20.00 ©2004 IEEE. 590