An Efficient Parallel Algorithm for Latin Square Design: A Multi Core CPU Approach Abhay B. Rathod Department of Electronics and Telecomm. Engg Jawaharlal Darda Institute of Engg. and Tech. Yavatmal, India. Sanjay M. Gulhane Department of Electronics and Telecomm. Engg Jawaharlal Darda Institute of Engg. and Tech. Yavatmal, India. Abstract— The theory of Latin squares is very important tool in design theory. Like much of design theory, Latin squares have various applications in statistics, finite geometries and experimental design, to name a few. In this paper, we proposed an efficient parallel algorithm for Latin square design which have desirable properties for parallel array access. These squares provide conflict free access to various subsets of an n x n array using n memory modules. A transversal of such a square is a set of n entries such that no two entries share the same row, column or symbol. We present a general construction method for building parallel Latin square of order n 2 for all n. The proposed algorithm presents a quick parallel method to produce a Latin square design and a parallel conflict access of data in SIMD system. The simulation results of the proposed parallel algorithm for Latin square design were compared with the traditional sequential algorithm Latin square design in terms of speedup and efficiency. The results of parallel Latin Square design were very promising and showed a potential that this design could successfully be applied to the parallel routing problems for conflict free data access. At last, the results show that the parallel versions of former sequential algorithm with simple modifications achieve the super linear speedup up to 200 times for matrix size of 256. Index Terms: Latin square, multi core processor, parallel processing, simulation, parallel memory system, skewing scheme, multistage interconnection network. I. INTRODUCTION A Latin square of order n is an n x n matrix A= (a ij ) with entries a ij Є{1,2,….,n} containing n distinct symbols such that each symbol appears in each row and column exactly once. It is called a Latin square because its numbers could be Latin letters in what might have a counterpart with Greek letters, satisfying the all pairs property! [1]. Euler was the first person to define Latin squares and investigate its properties mathematically. In this paper we present a parallel Latin square design algorithm and compare it with sequential one on multi core processor. In memory module of SIMD processors, conflict free parallel access of shared data plays vital role in the overall system performance [2]. The parallel array access problem is to store an n x n array into m memory modules such that no memory conflict occurs when various subsets of size n of the array (rows, columns, diagonals and n 1/2 x n 1/2 sub arrays etc.) are accessed [3]. The main focus of this paper is the storing of a matrix of data in memory modules so that various portions of it can be accessed with high parallelism. To achieve conflict free parallel access of shared data we use perfect Latin square which have very useful properties for parallel array access. Using perfect Latin squares as skewing schemes, many interesting subsets of an array (rows, columns, diagonals and sub squares) can be accessed without memory conflicts [3]. A skewing scheme is a mapping of elements of an n x n array into memory modules so as to provide conflict free access to various subsets of the array. Subsets of particular importance are rows, columns, diagonals and sub arrays since they are frequently used in many scientific computations [3]. A good data skewing scheme should have the following features [2]: 1. Given the row and column indices of a data element, the computation of its address the memory module in which the element is stored and location within the module should be fast and efficient with very little hardware requirements. 2. It should provide simultaneous data paths between processors and memory modules, during the parallel access of data. 3. The skewing scheme should be represented compactly so that overheads are minimized. 4. The skewing scheme should allow efficient re- skewing of data between phases of computation when necessary. Lots of analytical work is done on Latin square design and skewing scheme. However, to our knowledge, no one has presented simulation results of Latin square design using multi- core processors, which is the main focus of this paper. The paper is organized as follows: Section 2 gives literature survey of previous work. Section 3 describes the Latin square design and skewing scheme to achieve parallel conflict free access of data. Simulation results will be presented in Section 4. Finally, Section 5 concludes the paper with a discussion and directions for further research. II. RELATED WORKS Y. Yuanyuan, and J. Wang in [8] presented a method to build a Latin square matrix for a given multistage network and on this network, data can be routed on using the tag information represented by the elements in the Latin square matrix. Boppana, Rajendra V., and C. S. Raghavendra in their technical report [2] showed a generalized solution to the problem of conflict free access of various templates of data of a matrix in an SIMD multiprocessor system. For this the authors used scrambled skewing schemes. Further, the authors report DOI: 10.24178/ijsms.2017.2.2.27 International Journal of System Modeling and Simulation (ISSN Online: 2518-0959) IJSMS Vol 2(2) Jun 2017 IJSMS 27