Cluster Comput DOI 10.1007/s10586-017-1018-x Modeling and predicting execution time of scientific workflows in the Grid using radial basis function neural network Farrukh Nadeem 1 · Daniyal Alghazzawi 1 · Abdulfattah Mashat 2 · Khalid Fakeeh 1 · Abdullah Almalaise 1 · Hani Hagras 3 Received: 1 November 2016 / Revised: 14 April 2017 / Accepted: 24 June 2017 © Springer Science+Business Media, LLC 2017 Abstract With the maturity of electronic science (e-science) the scientific applications are growing to be more complex composed of a set of coordinating tasks with complex depen- dencies among them referred to as workflows. For optimized execution of workflows in the Grid, the high level middleware services (like task scheduler, resource broker, performance steering service etc.) need in-advance estimates of workflow execution times. However, modeling and predicting work- flow execution time in the Grid is complex due to several tasks in a workflow, their distributed execution on multi- ple heterogeneous Grid-sites, and dynamic behaviour of the shared Grid resources. In this paper, we describe a novel method based on radial basis function neural network to model and predict workflow execution time in the Grid. We model workflows execution time in terms of attributes describing workflow structure and execution runtime infor- mation. To further refine our models, we employ principle component analysis to eliminate attributes of lesser impor- tance. We recommend a set of only 14 attributes (as compared with total 21) to effectively model workflow execution time. Our reduced set of attributes improves the prediction accu- B Farrukh Nadeem fabdullatif@kau.edu.sa Abdulfattah Mashat asmashat@uj.edu.sa Hani Hagras hani@essex.ac.uk 1 Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia 2 University of Jeddah, Jeddah, Saudi Arabia 3 School of Computer Science and Electronic Engineering, The Computational Intelligence Centre, University of Essex, Colchester, UK racy by 16%. Results of our prediction experiments for three real-world scientific workflows are presented to show that our predictions are more accurate than the two best methods from related work so far. Keywords Scientific workflow applications · Distributed execution of scientific workflows · Workflow execution time prediction in the Grid 1 Introduction Computational Grids enable application developers to aggre- gate heterogeneous computational and storage resources scattered around the globe for large-scale scientific and engineering research. Scientific workflow applications (later referred as scientific workflows or just workflows) have recently emerged as an important paradigm for representing and managing complex scientific computations. Typically a workflow consists of a set of tasks (software executions or data transfers), which are coordinated by control and data flow dependencies to solve a complex problem. Workflow applications are usually executed on the Grid through work- flow management systems like Askalon [14], GridFlow [4], Pegasus [10] etc. for their automatic execution. The runtime environment of such workflow management systems sched- ule and manage the execution of workflow tasks on multiple Grid-sites with the objective to minimize the workflow exe- cution time. Workflow execution time is widely considered as a metric to measure workflow performance [14]. Workflow schedulers, enactment engines and perfor- mance analysis services are commonly part of the runtime environments that rely on execution time modeling of scien- tific applications in order to take crucial strategic decisions and to determine the causes for performance problems. 123