Reliability-Aware Genetic Algorithm for Scheduling Independent Tasks in Grid Computing Environment Wael Abdulal, S. Ramachandram Osmania University, CSE Dept., EC Hyderabad 500-007, India wael.abdulal@ymail.com schandram@gmail.com Abstract— The main issues in Grid System are performance and Reliability. Achieving high performance Grid Computing requires techniques to efﬁciently and adaptively allocate tasks and applications to available resources in a large scale, highly heterogeneous, reliable and dynamic environment. Due to oper- ational grid technology which expands the range and scale of grid applications, operational grid systems must exhibit high reliability; thus they must be able to continuously provide correct service. These goals will be made more difﬁcult as grid systems grow in scale and become more heterogeneous and dynamic in nature. This paper proposes a novel Reliability- Aware Genetic Scheduling Algorithm in Grid environment. This algorithm minimizes Makespan, Flowtime and Time To Release as well as it maximizes Reliability of Grid Resources. It takes Transmission time and waiting time in Resource Queue into account. Moreover, it uses Stochastic Universal Sampling or Rank Roulette Wheel Selection and Single Exchange Mutation to outperform other Genetic Algorithms, speeds up convergence and provides better solutions than other Genetic Algorithm solutions. Interestingly, Genetic Algorithm based on Stochastic Universal Sampling has superior solutions over all remaining Genetic Algorithms. According to simulation results the proposed algorithm reduces total execution time of tasks, increases the Reliability of whole Grid System and boosts user satisfaction. Key–Words:Genetic Algorithm (GA), Makespan, Grid, Time To Release (TTR), Reliability, Flowtime. I. I NTRODUCTION Grid technology has emerged as an important tool for solving compute intensive problems. Due to operational grid technology which expands the range and scale of grid applications, operational grid systems must exhibit high reliability; thus they must be able to continuously provide correct service. Moreover, it is important that the speciﬁcations used to build these systems fully support reliable grid services. With the increased use of grid technology, achieving these goals will be made more difﬁcult as grid systems grow in scale and become more heterogeneous and dynamic in nature. Efforts to develop reliability methods for large-scale, heterogeneous, dynamic, grid environment are still in progress. These efforts have focused on the following distinct functional areas of grid systems: • Reliability of computational hardware and software that comprise the grid and provide the means to execute user applications, • Reliability capabilities initiated by end users from within applications they submit to the grid for execution and • Reliability of grid networks for messaging and data transport across communication links [1]. Ensuring reliability has centered on providing fault tolerance- deﬁned as the ability to ensure continuity of service in the presence of faults, or events that cause a system to operate erroneously. The emphasis on fault tolerance is partly due to the characteristics of grid system environments which tend toward higher likelihood of failures and partly due to the existence of redundant resources in grid systems, which provide opportunities to switch to functioning resources when failures occur. The main issues in Grid System are performance and Relia- bility. Achieving high performance Grid Computing requires techniques to efﬁciently and adaptively allocate tasks and applications to available resources in a large scale, highly heterogeneous, reliable and dynamic environment. Nowadays, it is not possible to make sure that the set of tasks running on a big system can crash because of hardware failure. Several concepts can be involved to solve this problem. One idea is based on task duplication where each task is executed more than once in order to decrease the probability of failure by increasing the number of required resources. Alternatively, it is possible to checkpoint the set of tasks of application and restart the application after a failure. However, in case of failure the application, it is delayed more by the restart mechanism which requires to restart the application on a subset of resources and repeat some communications and computations. Therefore, in order to minimize the effect of the restart mechanism it is important to reduce the Grid system’s failures. Moreover, even in the case where there is no checkpoint restart mechanism, it is better to assure that the Reliability of resources is kept as high as possible. It is hard to deﬁne all the aspects in terms of a single objective. Providing multiobjective often gives a better solution for a con- sidered problem. The experimental results of this study shows