ELEKTROTEHNIŠKI VESTNIK 86(4): 197-202, 2019 ORIGINAL SCIENTIFIC PAPER Importance of the training dataset length in basketball game outcome prediction by using naive classification machine learning methods Tomislav Horvat 1 , Josip Job 2 1 University North, Department of Electrical Engineering, 104. brigade 3, 42000, Croatia 2 Josip Juraj Strossmayer University in Osijek, Char of Visual Computing, Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Kneza Trpimira 2B, 31000 Osijek E-mail: tomislav.horvat@unin.hr Abstract. The focus of the paper is on using naive machine learning algorithms for predicting the NBA game outcomes. In order to complete a convincing result, the data of nine full NBA seasons are scraped for the proposed model training and result evaluation. The aim of the paper is to present the possibilities of naive machine learning methods and to define the length of the training phase as well as of the evaluation phase to be optimal for predicting the NBA games outcome. The research serves as an initial stage in the development of a doctoral dissertation on the outcome prediction in sport. The proposed supervised classification machine learning methods is used and two possible outcomes (win or loss) are predicted. The data segmentation is used as an evaluation method for a training dataset occurring chronologically prior to the testing dataset. The best results are achieved by using a single training season and one to three evaluation seasons and all the played games during the training phase. Keywords: basketball, classification, machine learning, NBA, outcome prediction Vpliv dolžine testnih podatkov pri naivnem algoritmu strojnega učenja na napoved izidov košarkarske tekme V prispevku predstavljamo naivni algoritem strojnega učenja za napovedovanje izidov v košarkarski ligi NBA. Pri razvoju algoritma smo uporabili rezultate tekem v devetih sezonah. Namen prispevka je predstaviti možnosti metod naivnega strojnega učenja ter določiti dolžino faze usposabljanja in faze ocenjevanja, ki bodo optimalne za napovedovanje izidov iger NBA. Uporabljene so predlagane metode nadzorovane klasifikacije strojnega učenja in predvidena sta dva rezultata (zmaga ali poraz). Segmentacija podatkov se uporablja pri ocenjevanju učenja pred testiranjem. Najboljši rezultati so doseženi na podlagi podatkov iz ene sezone in ene do treh ocenjevalnih sezon ter vseh odigranih iger v fazi učenja. 1 INTRODUCTION Nowdays, the sport outcome prediction is very popular, especially in sport betting among fans and sport workers around the world. This is particularly evident for the most popular sports such as basketball, football and soccer. A lot of researchers have proposed various algorithms to predict game outcomes, but their prediction ranges vary not only from sport to sport but also from the same sports leagues and seasons. It is almost impossible to determine the boundaries of the prediction possibilities, so it is important to determine the predictions results using simple prediction methods. The possible outcome number, competitiveness of sports and thus the possibilities of predictions vary from sport to sport, therefore, satisfactory outcome prediction results depend on the type of sport, but also on the competitiveness of the competition itself. The paper presents initial prediction results based on NBA game outcome prediction and will serve as a starting point for proposing a more advanced NBA league prediction algorithm. The research will serve as an initial stage in the development of a doctoral dissertation on the outcome prediction in sport. The proposed supervised machine learning methods will be used, more precisely the classification machine learning methods in which two possible outcomes will be predicted. Arthur Samuel, the founder of machine learning, defines machine learning as a field of computer science that gives the computer the ability to learn without being explicitly programmed [1]. A newer definition defines machine learning as a method of programming computers to optimize the performance criterion using example data or past experience [2]. There are various types of machine learning, but outcome prediction in sport is mostly used by supervised machine learning. The goal of supervised learning is to develop a predictive model that based on both the input and output data predicts future events on the previously unseen data. Sports predictions are usually treated as a classification problem by which one class is predicted [3], and rare cases are predicted by numerical values. Received 30 May 2019 Accepted 11 July 2019