Mining Sentiments from Songs Using Latent Dirichlet Allocation Govind Sharma and M. Narasimha Murty Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560012, Karnataka, India {govindjsk,mnm}@csa.iisc.ernet.in Abstract. Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally clas- sified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classi- fied into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we con- sider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website. Keywords: Latent Dirichlet Allocation, music analysis, sentiment min- ing, variational inference. 1 Introduction It is required that with the swift increase in digital data, technology for data mining and analysis should also catch pace. Effective mining techniques need to be set up, in all fields, be it education, entertainment, or science. Data in the entertainment industry is mostly in the form of multimedia (songs, movies, videos, etc.). Applications such as recommender systems are being developed for such data, which suggest new songs (or movies) to the user, based on previously accessed ones. Listening to songs has a strong relation with the mood of the listener. A particular mood can drive us to select some song; and a song can invoke some sentiments in us, which can change our mood. Thus, song-selection and mood are interdependent features. Generally, songs are classified into gen- res, which do not reflect the sentiment behind them, and thus, cannot exactly estimate the mood of the listener. Thus there is a need to build an unsupervised system for mood estimation which can further help in recommending songs. Sub- jective analysis of multimedia data is cumbersome in general. When it comes to songs, we have two parts, viz. melody and lyrics. Mood of a listener depends on J. Gama, E. Bradley, and J. Hollm´ en (Eds.): IDA 2011, LNCS 7014, pp. 328–339, 2011. c Springer-Verlag Berlin Heidelberg 2011