IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 3, Issue 1 (July-Aug. 2012), PP 20-23 www.iosrjournals.org www.iosrjournals.org 20 | Page Modified Pure Radix Sort for Large Heterogeneous Data Set A. Avinash Shukla 1 , B. Anil Kishore Saxena 2 Abstract: We have proposed a Modified Pure Radix Sort for Large Heterogeneous Data Set. In this research paper we discuss the problems of radix sort, brief study of previous works of radix sort & present new modified pure radix sort algorithm for large heterogeneous data set. We try to optimize all related problems of radix sort through this algorithm. This algorithm works on the Technology of Distributed Computing which is implemented on the principal of divide & conquer method. I. Introduction Sorting is a computational building block of fundamental importance and is the most widely studied algorithmic problem. The importance of sorting has led to the design of efficient sorting algorithms for a variety of architectures. Many applications rely on the availability of efficient sorting routines as a basis for their own efficiency, while some algorithms can be conveniently phrased in terms of sorting. Radix sort is an algorithm that sorts numbers by processing individual digits. n numbers consisting of k digits each are sorted in O (n · k) time. Radix sort can either process digits of each number starting from the least significant digit (LSD) or the most significant digit (MSD). The LSD algorithm first sorts the list by the least significant digit while preserving their relative order using a stable sort. Then it sorts them by the next digit, and so on from the least significant to the most significant, ending up with a sorted list. While the LSD radix sort requires the use of a stable sort, the MSD radix sort algorithm does not (unless stable sorting is desired). MSD radix sort is not stable. It is common for the counting sort algorithm to be used internally by the radix sort; Hybrid sorting approach, such as using insertion sort for small bins improves performance of radix sort significantly. II. Review Of Related Literature Rajeev Raman [1] illustrated the importance of reducing misses in the standard implementation of least-significant bit first in (LSB) radix sort, these techniques simultaneously reduce cache and TLB misses for LSB radix sort, all the techniques proposed yield algorithms whose implementations of LSB Radix sort & comparison- based sorting algorithms. Danial [2] explained the Communication and Cache Conscious Radix sort Algorithm (C3-Radix sort). C3-Radix sort uses the distributed shared memory parallel programming Models. Exploiting the memory hierarchy locality and reduce the amount of communication for distributed Memory computers. C3-Radix sort implements & analyses on the SGI Origin 2000 NUMA Multiprocessor & provides results for up to 16 processors and 64M 32bit keys. The results show that for small data sets compared to the number of processors, the MPI implementation is the faster while for large data sets, the shared memory implementation is faster. Shin-Jae Lee [3] solved the load imbalance problem present in parallel radix sort. Redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently the fastest internal sorting method for distributed-memory based multiprocessors. However, as the computation time is balanced, the communication time becomes the bottleneck of the overall sorting performance. The proposed algorithm preprocesses the key by redistribution to eliminate the communication time. Once the keys are localized to each processor, the sorting is confined within processor, eliminating the need for global redistribution of keys & enables well balanced communication and computation across processors. Experimental results with various key distributions indicate significant improvements over balanced radix sort. Jimenez- Gonzalez [4] introduced a new algorithm called Sequential Counting Split Radix sort (SCS-Radix sort). The three important features of the SCS-Radix are the dynamic detection of data skew, the exploitation of the memory hierarchy and the execution time stability when sorting data sets with different characteristics. They claim the algorithm to be 1:2 to 45 times faster compare to Radix sort or quick sort. Navarro & Josep [5] focused on the improvement of data locality. CC-Radix improved the data locality by dynamically partitioning the data set into subsets that fit in cache level L2. Once in that cache level, each subset is sorted with Radix sort. The proposed algorithm is about 2 and1:4 times faster than Quick sort and Explicit Block Transfer Radix sort. Nadathur Satish [6] proposed the high-performance parallel radix sort and merge sort routines for many-core GPUs, taking advantage of the full programmability offered by CUDA. Radix sort is the fastest GPU sort and merge sort is the fastest comparison-based sort reported in the literature. For optimal performance, the algorithm exploited the substantial fine-grained parallelism and decomposes the computation into independent tasks. Exploiting the high-speed on chip shared memory provided by NVIDIA’s GPU architecture and efficient data-parallel primitives, particularly parallel scan, the algorithms targeted the GPUs. N. Ramprasad and Pallav Kumar Baruah [7] suggested an optimization for the parallel