Square Interconnection Network for Data Permutation Zbigniew Kokosi´ nski Cracow University of Technology, Institute of Control Engineering ul. Warszawska 24, 31-155 Krak´ ow, Poland; zk@usk.pk.edu.pl Abstract In this paper a square cellular network for data per- mutation in a SIMD model is described. It has 2–permuters only, and realizes an arbitrary permutation pattern in two passes. For this network a programming al- gorithm is provided with O(n) sequential time complexity. Due to its regular cellular structure the square network is suitable for VLSI implementation. 1. Introduction In multiprocessor architectures the overall system per- formance is closely related to the structure of interconnec- tions beetween processors and memory modules. A num- ber of network configurations was characterized in survey papers [2, 4, 7, 10, 13]. The most common measures characterizing interconnec- tion networks (INs) are : assymptotic hardware complex- ity, complexity of the setup algorithm and total delay time. A number of SSI related assesments of the hardware com- plexity was verified in VLSI environement. Franklin [5] showed that for both banyan and crossbar INs the chip area grows as O( ). Propagation delay grows as O(n) for the crossbar and approximately O( ) for the banyan. Chen et al. [3] derived O( ) assesment of the area requirements for VLSI realization of the Omega multistage IN. Lin and Shin [9] proved the O( ) wire area lower bound for the shuffle-exchange and cube-connected multistage INs. They discovered also high occurence of long paths in both types of networks. Layouts consum- ing a larger amount of the chip area are more expensive to fabricate and less reliable. Long wires araise propaga- tion delays and hence reduce the throughput of the system. When the scale of integration arises cellular interconnection networks (CINs) can be a good alternative in the design of multiprocessor architectures (f.i. SIMD computers). CINs have many interesting properties from the point of view of VLSI design, i.e. regular form, short local connections be- tween adjacent cells, easy fabrication, simplified testing and diagnosis. The main families of CINs are : triangular, di- amond, rectangular, rhomboidal, prunned rectangular, ap- proximately square, cascaded, etc. [1, 6, 8, 11]. O(nlogn) programming algorithms for the triangular and diamond CINs were described in [12]. Similarly, O(n) programming algorithms for triangular and cascaded CINs were devel- oped in [11]. The present paper shows that ( ) ( ) cellular ar- ray is sufficient to realize an arbitrary permutation when two passes through the network are allowed. The square array provides an efficient two-phase interprocessor com- munication in a model of SIMD computer. It has neither long connections nor criss-crossing between cells. We give below a simple O(n) setup algorithm for this network. 2. A model of SIMD computer One model of SIMD multiprocessor architecture is pre- sented in Fig.1. In this model an interprocessor communication consists of two steps. At first data are writen by PEs (Processing Elements) into registers of SM (Shared Memory). Then, in the second step, the information is read by PEs from SM- registers. In both steps the IN realizes different mappings of inputs onto outputs. The final permutation of data is a IN LM LM LM PE PE PE R R R SM 1 2 n n 2 OUTPUT 1 LM - local memory SM - shared memory PE - processing element IN - interconnection network R - shared memory register INPUT . . . . . . Figure 1. A model of SIMD architecture.