International Journal of Computer Applications (0975 8887) Volume 139 No.12, April 2016 42 Team Selection Strategy in IPL 9 using Random Forests Algorithm C. Deep Prakash Deptt. Of Electrical Engg. Dayalbagh Educational Institute Agra-282005, India C. Patvardhan Deptt. Of Electrical Engg. Dayalbagh Educational Institute Agrs-282005,India C. Vasantha Lakshmi Deptt. Of Physics and CS Dayalbagh Educational Institute Agrs-282005,India ABSTRACT IPL 9 is scheduled to be held in April 2016. T20 cricket is relatively new and the strategies and techniques are evolving. This is evident in the better performances by both bowlers and batsmen in successive IPL seasons. This paper presents a detailed analysis of the data of IPL upto season 8 and overall T20 career data of players upto January 2016 to design performance indices for batsmen and bowlers in IPL 9. Categorization of players is done based on their roles in the team and the indices are determined separately for each category using Random Forests Algorithm. A heuristic is designed to enable selection of the best playing 11 out of the available team using the performance indices designed in this work. The algorithm is effective in enabling the best 11 to be selected within the constraints of the rules in the IPL tournament. Keywords IPL 9, Team selection, Random Forests, Heuristic 1. INTRODUCTION Test cricket, ODI and Twenty20 (T20) are the three facets of the game of cricket. T20 is a 20 overs a side match which is usually over in 4 hours. T20 cricket was an immediate success in England where it was first introduced in 2003. It has become extremely popular especially because the short format allows one to enjoy the complete match in one evening. 20 overs (120 = 20 x 6 legal ball deliveries) are allowed in T20 matches for each batting side to score from if they have wickets. The side that scores more runs within the stipulated overs wins. Indian Premier League (IPL), a T20 tournament, was started in 2008 by Board of Control of Cricket in India (BCCI) [1]. The IPL created eight franchises assigned to eight of the largest cities in India. The teams were franchisee driven. The players were selected through competitive bidding from a pool of available players. The BCCI has been organizing the IPL T20 cricket tournament in each year. 8 IPL tournaments have been held till date and the 9 th edition in scheduled to be held beginning in April, 2016. The use of analytical methods is very useful in cricket. Batting, bowling and fielding are the three main departments of the game. There is a huge demand for cricket related statistical studies because of the popularity of the game and the staggering amounts of money involved. These statistics give clear picture of the performance of various players. Followers of the game, especially in India, are keen followers of its statistics also. Some studies related to cricket reported in the literature are as follows. Optimal batting strategies using dynamic programming model was developed by Clarke [2]. Alternative batting averages when batsman remains not-out in one-day cricket was proposed by Kimber and Hansford [3] and Damodaran [4]. Barr and Kantor [5] proposed a method based on batting averages and strike rates. Borooah and Mangan [6] explored batting performance for test matches. Norman and Clark [7] and Ovens and Bukeit [8] applied mathematical modeling approach to optimize the batting order of a team. Lewis [9] analyzed player performance using Duckworth/Lewis percentage values. Van Staden [10] used a graphical method to analyze batting and bowling performance in cricket. Lakkaraju and Sethi [11] described a Sabermetrics style principle to analyze batting performance in cricket. Lemmer [12-14] considered performance analysis using averages and strike rates for bowling and batting. Saikia et al. [15] evaluated the performance on all-rounders in IPL. IPL season 9 is to start in April 2016. There is a huge buzz going around regarding the players to look for in this season of IPL. Lot of money is involved in the IPL. Every cricket fan has his own set of favorite players to watch. Before the start of the season, team evaluations and some understanding of how they stand in terms of their likelihood of winning is useful not only as a favorite pastime of the fans but also commercially. In this work, an attempt is made to apply the well-known analytical techniques towards this end. The batting and bowling performance of players has been predicted based on their past IPL performances and overall T20 career performances. This work can help the franchises select their best 11 for the tournament in order to maximize their chances of winning. A heuristic based approach is designed for selecting the best possible playing 11 for each team. These teams are then used for predicting the match results using the results of the eighth season. Some of the results obtained from the detailed mathematical analysis are quite different from what could be expected by a cursory glance at the teams. An attempt is made is to explain these results and provide insights into the factors that affect the performance of the teams. These are useful to gain a better understanding of the underlying mechanics of T20 cricket outcomes. The rest of the paper is organized as follows. In section 2, the statistics of previous IPL matches are examined to find the changing scenarios and trends. In section 3, the relative importance of the factors that define batting and bowling performances is determined using machine learning based approach and a composite performance index is defined. The top batsmen and bowlers are identified according to these indices. In section 4, a heuristic that attempts to maximize the batting and bowling performance of the team for selecting the playing eleven is proposed. Some conclusions and insights from this analysis are presented in section 5.