International Journal of Computer Applications (0975 – 8887) Volume 135 – No.4, February 2016 13 Analyzing Social Media Data to Explore Students’ Academic Experiences Priya Lande Vidyalankar Institute of Technology Mumbai Vipul Dalal Vidyalankar Institute of Technology Mumbai ABSTRACT The casual conversational style used by the students on any front stage environment can educate extensively about their learning process. The collection of data from such an open environment can bring out many important and unknown factors about students‟ behaviour, their opinions, their feelings their concerns pertaining to their educational system. The inspection of such data can be said to be very provocative. The reflection of students‟ feelings over the social network, however, has to undergo the human eye to get properly interpreted, which is possible but upto a certain extent, as a result of ever-growing data. In this paper, problems of engineering students have been considered. This has been worked upon by analysing engineering students‟ tweets from the hashtag #enggproblems on Twitter. Analysis was carried out over 15,000 tweets. These problems were related to heavy study load, negative emotions, sleep problems, lack of social engagement, diversity issues etc. A multi-label classifier was executed to classify and categorize tweets. This technique can dig up into the casual conversations of students and educate about the factors that affect the learning process of students. General Terms Multi-label classification using naïve bayes classifier. Keywords Naïve bayes multi-label classifier, twitter analysis, education. 1. INTRODUCTION Social media data has apparently been playing as an integral part of the urban crowd. The internet is exploited due to its ease of access. A simple click can actually copy the views/feelings in one‟s mind on the internet. People aged in the group of 15-45 are the most active users of the internet. These consists of mainly students, businessmen etc. Students have a lot many reasons to access the internet be it project work, form filling, seeking any study related information, besides all this they also need a resort to entertainment. Eventually, for today‟s youth entertainment is click, post and share etc. if they like something they will post it and even if they do not like something they will post about it too. Thus, a complete democratic platform for students and everybody else is online social networking sites. The most popularly used social networking websites Facebook, Twitter, Instagram. Every second on an average, around 6000 tweets are tweeted on twitter, which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day and 200 billion tweets per year. Here, the amount of data that is generated has no scale and no vocabulary boundary too. Students post their views spontaneously online, which too has vocabulary overhead and scalability issues. The inspection of such data can give immense scope to understand students‟ feelings, their concerns and their opinions too. A complete manual analysis may result into incompatibility with the ever-growing data [1]. On the other hand, a complete automatic algorithm cannot provide in depth meaning of data. 2. LITERATURE SURVEY Earlier offline procedures were carried out to study such problems [2][3]. These problems included surveys, focus groups, interviews and other such classroom programs. Such programs are generally carried out in front stage environment. A front stage environment is a controlled environment, where a person is likely to express superficially and not transparently [4][5], whereas a backstage environment is a relaxed environment, where one has no pressure to answer a question in a particular way. Such a platform can be online social network like Facebook, Twitter which is very frequently used by the students and it is their spontaneous hub too. Twitter is one of the many popular social networking websites. There is a provision of API which is free of cost, which can be used to stream data. Therefore, the analysis of tweets can be done on twitter. Twitter allows 140 characters per tweet so its conciseness also helps in easy streaming of data. A hashtag is a word that begins with a „#‟ which means all the content related to the hashtag name will be tagged or added in that particular hashtag. Analysis was carried out on engineering students because engineers are said to be the future of any nation. Their learning process has to be strong and has to be upgraded for better adaptability to technology [6]. The hashtag #enggproblems was taken into consideration and was examined. Here the students posted more about their problems faced in their learning system. The tweets were worked upon as a large process where the tweets were said to fall under various category such as heavy study load, diversity issues, lack of social engagement, negative emotion and sleep problems. These categories were built by human examination of tweets falling under #enggproblems. The human inspection of such a data is framed as inductive content analysis. [7] The main goal of this study is to: 1. Categorize and correctly classify students‟ tweets into the proper category. This helps to understand the problems faced by the students in their learning process. 2. The statistical study of the classification can help the educational system to make necessary improvements into their system so that students‟ learning experience is a hassle free one. In [8], automated identification and classification of diverse type of sentiments is carried out on short fragments of text extracted from twitter. The paper proposes a supervised classification framework which exploits twitter smileys and hashtags for providing learning to labels. The twitter processed data allowed for sentiment type identification. Here, the twitter data is classified as smileys where mixed