Indonesian Journal of Electrical Engineering and Computer Science Vol. 29, No. 3, March 2023, pp. 1750~1757 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v29.i3.pp1750-1757 1750 Journal homepage: http://ijeecs.iaescore.com Prediction of movie success based on machine learning and twitter sentiment analysis using internet movie database data Jyoti Tripathi, Sunita Tiwari, Anu Saini, Sunita Kumari Department of Computer Science and Engineering, G.B. Pant DSEU Okhla I Campus, New Delhi, India Article Info ABSTRACT Article history: Received Jun 25, 2022 Revised Oct 27, 2022 Accepted Oct 30, 2022 Nowadays, predicting the success of a new movie is a crucial task. In this work, the hybrid approach considers the movie features as well as sentiment expressed in the movie review to predict the success rate of a movie. Multiple movie features such as title, director, star cast, and writer. Are considered for prediction. The related raw data is collected from the internet movie database (IMDb) website and after pre-processing, the collected data is used to generate the supervised machine learning model. Different supervised learning models are compared and the one with the best results is used further. The mean squared error, root mean squared error and r2 score of the models generated are comparable with existing models. Further, sentiment analysis of the movie-related tweets is performed. The accuracy of best sentiment analysis model is 88.47%. Finally, the two models are combined to give the success prediction rating of new movies and the results of the hybrid model are encouraging. The proposed model may be used to find the top-rated movies of a particular calendar year. Keywords: Decision tree Entertainment industry Naïve Bayes Random forest Regression Supervised learning Support vector machines This is an open access article under the CC BY-SA license. Corresponding Author: Anu Saini Department of Computer Science and Engineering, G.B. Pant DSEU Okhla I Campus New Delhi-110020, India Email: anuanu16@gmail.com 1. INTRODUCTION Movies, online videos and television are most popular source of entertainment across the globe especially in India [1]. Movie industry involve huge sum of investment in terms of money, time and effort [2]. Movie industry is producing hundreds of movies every year. Therefore, it is crucial to predict success of a movie in early stages. Success or failure of a movie is based on multiple factors. A huge amount of information related to movies such as actors, directors, critic review, user reviews, ratings, writer, budget, genre, Facebook likes, number of views on YouTube for movie trailer, and fan following on twitter. are available on web. Success of movie in this era depends on the revenue generated in first few weeks [3]. The revenue generation in initial weeks is greatly influenced by online reviews and ratings of the movie. Since first few weeks are very crucial for the success of movie, and the movie production people put in lot of efforts on the publicity and building people’s opinion. In this work, we aim to use the available information to predict the success rate of a movie in early stages. The internet movie database (IMDb) is a rich source of information which contains the data about almost all the movies. To predict the success of a movie, the supervise machine learning algorithms are used. Different machine learning algorithms are used to build the prediction model and the results obtained from each mode are compared over root mean squared error (RMSE), mean square error (MAE) and R2 score. Further, social media such as twitter, Instagram and Facebook has become a great source of influence on people’s opinion. A huge amount of data is generated through such sources and they are important means of gathering the movie