An Empirical Study to Predict the Quality of Wikipedia Articles Imran Khan 1(&) , Shahid Hussain 2 , Hina Gul 3 , Muhammad Shahid 4 , and Muhammad Jamal 4 1 Virtual University of Pakistan, Lahore, Pakistan ikniazi786@gmail.com 2 COMSATS University, Islamabad, Pakistan shussain@comsats.edu.pk 3 IQRA National University, Peshawar, Pakistan hinaafridi1984@gmail.com 4 Govt College No 1, DIKhan, Pakistan bluefiber08@gmail.com, mustafvimasood@gmail.com Abstract. Wikipedia is considered a common way to deliver content in a more effective way as compared to other types of an encyclopedia. However, the quality threat remains an issue regarding the Wikipedia articles. The basic aim of propose research to perform an empirical study to predict the quality of Wikipedia articles. In the proposed methodology, we consider few metrics such as article length (total number of word in an article), number of edits, article age (in the day) and article ranking and perform few statistical tests analyze the quality of Wikipedia articles. Moreover, we observe a signicant correlation of proposed metrics with the rating of articles in order to identify their quality. Keywords: Wikipedia Á Correlation Á Linear regression Á Article length Á Number of edits Á Article age 1 Introduction Wikipedia is worldwide most trusted online, open source, the nonprot organization that owns a large number of articles that almost every topic with a huge viewership (with the total number of 35,147,128 registered users). Wikipedia is one of the socially produced Big Data example. According to the Liu and Ram [8] in September 2017, more than 5,472,000 articles were available on English Wikipedia, now data is pro- duced quicker than ever, and up to now, more than 2.5K Petabyte of data are generated on a day, which brings forth the generally coursed idea of Big Data. By the end of the year 2018, it is more than about 5,763,800 unique articles on English Wikipedia 1 . Wikipedia is a profoundly unique framework; we can change article content as often as possible. Accordingly, the quality of an article is a period subordinate work and a solitary article may contain high and low-quality content in various scopes of its 1 https://en.wikipedia.org/wiki/Special:Statistics © Springer Nature Switzerland AG 2019 Á. Rocha et al. (Eds.): WorldCIST'19 2019, AISC 932, pp. 485492, 2019. https://doi.org/10.1007/978-3-030-16187-3_47