International Journal of Electrical and Computer Engineering (IJECE) Vol. 11, No. 1, February 2021, pp. 375~382 ISSN: 2088-8708, DOI: 10.11591/ijece.v11i1.pp375-382 375 Journal homepage: http://ijece.iaescore.com Straggler handling approaches in mapreduce framework: a comparative study Anwar H. Katrawi 1 , Rosni Abdullah 2 , Mohammed Anbar 3 , Ibrahim AlShourbaji 4 , Ammar Kamal Abasi 5 1,3 National Advanced IPv6 Center (Nav6), Universiti Sains Malaysia, Malaysia 2,5 School of Computer Sciences, Universiti Sains Malaysia, Malaysia 4 Department of Computer and Engineering, Jazan University, Saudi Arabia Article Info ABSTRACT Article history: Received Mar 22, 2020 Revised May 21, 2020 Accepted Aug 5, 2020 The proliferation of information technology produces a huge amount of data called big data that cannot be processed by traditional database systems. These Various types of data come from different sources. However, stragglers are a major bottleneck in big data processing, and hence the early detection and accurate identification of stragglers can have important impacts on the performance of big data processing. This work aims to assess five stragglers identification methods: Hadoop native scheduler, LATE Scheduler, Mantri, MonTool, and Dolly. The performance of these techniques was evaluated based on three benchmarked methods: Sort, Grep and WordCount. The results show that the LATE Scheduler performs the best and it would be efficient to obtain better results for stragglers identification. Keywords: Big data Hadoop MapReduce Spark Straggler This is an open access article under the CC BY-SA license. Corresponding Author: Anwar H. Katrawi, National Advanced IPv6 Center (Nav6), Universiti Sains Malaysia, 11800 USM, Penang, Malaysia. Email: akatrawi@student.usm.my 1. INTRODUCTION With the excessive growth in information and data, their analysis becomes a challenge and more complex due to the increased volume of structured and unstructured data that are produced by the internet of things (IoT), social media, multimedia etc. Application such as MapReduce is a fault tolerant, scalable and simple framework for data processing that enables its users to process these massive amounts of data effectively [1, 2]. MapReduce is a significant model of preparing and generating a set of enormous information. This is because; it gives simple utilization environment, offer solution to ad hoc and to misses like Data sorting, Web indexing among several others. MapReduce is utilized in Big Information Applications in bigger Companies such as Yahoo and Google among several others. The MapReduce is unlisted as a section of one structure or the other. The reason for creating stragglers is the diversity in accessibility in the CPU, I/O discord or network traffic. When the map and reduce are completed, that is when the MapReduce Framework is accomplished [3, 4]. In MapReduce Framework the job is not accomplished till very reduce and map undertakings are completed. Moreover, the quantity of the stragglers weakens with the wide-range of the time occupation [5-8]. In a heterogeneous environment, some compute nodes are faster than the other. Slower compute nodes are called stragglers node and these fast nodes will finish their tasks early and wait for the stragglers to finish.