Design of a Low Power, Relative Timing based Asynchronous MSP430 Microprocessor Dipanjan Bhadra, Kenneth S. Stevens University of Utah Abstract—Power dissipation is one of the primary design constraints in modern digital circuits. From a magnitude of hand-held portable devices to big data analytics using high- performance computing, low energy dissipation is a key require- ment for most modern devices. This paper showcases an elegant low power circuit design methodology based on Relative Timing driven asynchronous techniques. A low power MSP430 micropro- cessor design based on a novel asynchronous finite state machine implementation is presented. The design showcases the power benefits of the proposed asynchronous implementation over the synchronous counterpart and avoids major architectural modifi- cation which would directly influence the performance or power consumption. The implemented asynchronous MSP430 exhibits a minimum of 8× power benefit over the synchronous design for an almost identical pipeline structure and comparable throughput. The paper further elaborates on the novel asynchronous state machine implementation used for the design and presents an efficient method to design communicating asynchronous finite state machines in clock-less systems. Index Terms—MSP430, Microprocessor, Low-power, Asyn- chronous Circuits, Relative Timing I. I NTRODUCTION Power dissipation has became a primary concern for de- signers targeting circuits for handheld devices and parallel computing systems for big data computations. Asynchronous designs provide an elegant solution to low power circuit design. Handshake signals ensure asynchronous circuits only operate when provided with valid data, and also provide a modular interface that operates independent of frequency domains. Synchronization across clock domains for large SoCs has emerged as a correctness, design time, and power challenge. The global clock distribution in modern designs account for considerable power dissipation and design efforts. Asynchronous approaches mitigate the energy overhead of crossing clock domain boundaries and global clock distribu- tion, save power because circuits are only active when they are assigned a task, and simplify IP integration and validation. Researchers have documented the challenges and advan- tages of asynchronous designs. Over the years the Caltech Asynchronous Microprocessor [1], the Caltech R3000 [2], the Lutonium [3], the DLX design [4], the Amulets [5], [6] the asynchronous 80C51 [7], and the Intel Pentium [8] have showcased different design methodologies and techniques for design and optimization of asynchronous circuits. Some of these designs come at a considerable overhead in terms of added design effort in absence of an automated design flow. Others, like the 80C51 [7], use custom automated design flows such as Tangram [9], [10] and Balsa [11]. Some designs go even further as to use custom gate libraries for computation. Instead, this work leverages relative timing in order to use industry standard synchronous CAD tools and HDL [12]. II. BACKGROUND This paper reports on a new design approach for imple- menting communicating asynchronous state machines to build a power efficient MSP430. The MSP430 is a simple 16-bit mi- croprocessor used for low-cost, low-power embedded systems. The synchronous design is built with emphasis on low power operation. It is provided with multiple operational modes to reduce power consumption at idle states. The synchronous design is based on the openMSP430 from the Opencores repository [13]. To provide a fair comparison, no architectural modifications were made to the clocked data path in the asynchronous design. A custom control network replaces the clock network for the asynchronous design. A fair comparison was achieved by using same EDA flow for both designs. A previous asynchronous MSP430 implementations uses the Balsa tool flow [14]. That design uses a back-end retargeting method to map the design to specific libraries. However, the design optimization phase does not utilize the library information to perform timing driven optimizations to the circuit. Another design is implemented using the Tangram flow [15]. The TiDE design uses a similar HDL to generate the required design. In this work the entire control network for the AFSM design is built using a graph based method which results in a completely asynchronous state machine. The state machines use standard Verilog implementations of combinational logic and registers. A major advantage of our design over the past work is in the design methodology. This MSP430 was designed leveraging standard commercial VLSI tool flow. The design is specified in Verilog. Some minor additions to the flow to support relative timing constraints allow the commercial EDA tools to perform internal timing driven circuit optimizations. This has enabled us to create a novel method of implementing the pipelined interaction between the decode and execute state machines that vastly differs from previous design approaches. III. DESIGN METHODOLOGY The asynchronous design implementation uses a return to zero bundled data design that is shown in Fig. 1. Combina- tional functions are behaviorally specified between pipeline stages which are controlled by a handshaking network for both sequencing and delay. Relative timing (RT) constraints are used to optimize the circuit for performance and to make hazards in the control circuitry unreachable. One such RT constraint is shown in the figure. C L Li Li+1 n n Ctli Ctli+1 reqi acki reqi+1 acki+1 reqi+2 acki+2 delay Fig. 1. Timed (bundled data) handshake design with the related relative tim- ing constraint req i ↑ → L i+1 /D + margin ≺ L i+1 /CLK ↑ highlighted. The maximum delay from req i ↑ to data arriving at the downstream latch data input plus a margin m must be less than the minimum delay from req i ↑ to data arriving at the downstream latch clock pin. 794 978-3-9815370-8-6/17/$31.00 c 2017 IEEE