International Conference of High Performance Computing (HiPC 2005) Design of an Efficient Architecture for Advanced Encryption Standard Algorithm Using Systolic Structures 1 Suresh Sharma, 2 T S B Sudarshan 1 Student, Computer Science & Engineering, IIT, Khragpur 2 Assistant Professor, Computer Science & Information Systems, BITS, Pilani. Abstract: This Paper presents a systolic architecture for Advanced Encryption standard (AES). Use of systolic architecture has improved the hardware complexity and the rate of encryption/decryption. Similarities of encryption and decryption are used to provide a high performance using an efficient architecture. The efficiency of the design is quite high due to use of short and balanced combinational paths in the design. The encryption or decryption rate is 3.2-Bits per clock-cycle and due to the use of pipelined systolic architecture and balanced combinational paths maximum clock frequency for the design is quite high. Index terms:- Advanced Encryption standard (AES), Systolic Architecture, processing elements, regularity, combinational paths. 1. Introduction:- Advanced Encryption standard (AES) [1,12,13] is successor of Data Encryption Standard (DES) [2,12,13]. A symmetric block cipher Rijndael [3] was standardized by National Institute of Standards and Technology (NIST) as AES in November 2001. Due to practical importance of hardware implementation, the different candidates are implemented on FPGAs [4,5,6,9] and on ASICs [7,8,10]. This paper presents a simple and regular hardware architecture based on systolic architecture [11] to provide a throughput of 3.2 Bits per clock cycle for AES-128 encryption and decryption. Systolic Architecture is used for constructing high- speed, special- purpose devices. In a systolic system, data flows from the computer memory in a rhythmic fashion, passing through many processing elements before it returns to memory .A systolic system consists of a set of interconnected cells each capable of performing some simple operations, cells in a systolic system are typically connected to form a systolic array or a systolic tree. Information in a systolic system flows between cells in a pipelined fashion and communication with the memory or external devices occurs only through the boundary cells. The architecture uses similarities of encryption and decryption to provide a high level of performance while keeping the chip size small. The architecture is highly regular and scalable. The key size can easily be changed from 128 to 192 or 256 bits by making very few changes in the design. The paper is organized as follows. Section 2 gives a brief overview of the AES algorithm. In section 3, a summary of available AES architecture is given and the proposed AES hardware architecture is described. The performance of the architecture is compared with other implementations in section 4. Concluding remarks are given in section 5. 2. AES Algorithm: The AES takes a 128-bits data block as input and performs several transformations to encrypt or decrypt the data. In case of encryption, the input block is called plaintext and the returned block is called ciphertext. All intermediate blocks are called states. These are represented as two-dimensional array of bytes. AES encryption and decryption are based on four transformations that are performed repeatedly in a certain sequence. The number of repetitions is based on the key size. 2.1. AddRoundKey transformation:- In the AddRoundKey transformation, a Round Key is added to the State by a simple bitwise XOR operation. The AddRoundKey transformation is self-inverting. 2.2. SubBytes transformation:- The SubBytes() transformation is a non-linear byte substitution that operates independently on each byte of the state using a substitution table (S-box). This S-box, which is invertible, is constructed by composing two transformations: