Sentiment Analysis in Code Review Comments Anne Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan Abstract— This paper proposed a novel method to predict the emotions and team dynamics by inspecting the code review comments. The code reviews from source control repository were extracted for development activity from the past three years for two open source projects. The comments were tokenized and features were extracted to detect sentiments such as encouragement, annoyance and fear. The emotion polarity of the code review comments were measure in terms of neutral vs rude (harshness). The technique used provided an accuracy of 67% in terms of recognition of harsh comments. Keywords— Token, sentiment, review, code, polarity, SVM I. INTRODUCTION Many community-supported sites contain code review comments by volunteer programmers contributing to the code base. Sometimes the code review comments can sound harsh and may result in misunderstandings. This paper examines automatic sentiment analysis of code reviewer comments and predicts the emotional polarity of the comment as neutral vs rude (or harsh). II. METHOD The code review comments of 3 open source software code bases were downloaded. There were a total of 1536 comments in total. Each comment was manually annotated (expert human annotators) as neutral vs harsh. The annotation was done by 3 annotators to avoid inter annotator disagreement. The features defined for the text analysis were frequency of harsh words. The master list of harsh words used as reference was constructed using bag of words method applied on data sets for angry, rude, harsh sentiments. The number of syntax related comments, the number of comments on the code, the number of revisions and rollbacks were also tracked for each code change. The feature vector was used to train an SVM classifier for detecting class associated with rude emotion polarity. III. CONCLUSIONS The automatic classification of positive vs negative code review comments showed 67% accuracy. The accuracy degraded by 3% when the analysis was performed on the test data from another code review repository website that was not used during initial training. This was because the bag of words did not contain some of the words found in the test data. Additionally, the language used in some of the code reviews was technical syntax related and contained programming language specific jargon and abbreviations. This introduced a lot of misclassifications. There was also difference in grammar and positioning of the contradiction phrases. Future scope would include evaluation between international reviews so that subtle difference between American English and British English is captured. REFERENCES [1] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Drunken Abnormal Human Gait Detection using Sensors, Computer Science and Emerging Research Journal, vol 1, 2013. [2] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Fear Detection with Background Subtraction from RGB-D data, Computer Science and Emerging Research Journal, vol 1, 2013. [3] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Code Definition Analysis for Call Graph Generation, Computer Science and Emerging Research Journal, vol 1, 2013. [4] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Multi-View Point Drowsiness and Fatigue Detection, Computer Science and Emerging Research Journal, vol 2, 2014. [5] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Group Emotion Detection using Edge Detecttion Mesh Analysis, Computer Science and Emerging Research Journal, vol 2, 2014. [6] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Polarity Analysis of Restaurant Review Comment Board, Computer Science and Emerging Research Journal, vol 2, 2014. [7] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Sentiment Analysis in Code Review Comments, Computer Science and Emerging Research Journal, vol 3, 2015. [8] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Temporal Analysis of News Feed Using Phrase Position, Computer Science and Emerging Research Journal, vol 3, 2015. [9] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Decision Rule Driven Human Activity Recognition, Computer Science and Emerging Research Journal, vol 3, 2015. [10] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Depression and Sadness Recognition in Closed Spaces, Computer Science and Emerging Research Journal, vol 4, 2016. [11] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Dynamic Probabilistic Network Based Human Action Recognition, Computer Science and Emerging Research Journal, vol 4, 2016. [12] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Fight and Aggression Recognition using Depth and Motion Data, Computer Science and Emerging Research Journal, vol 4, 2016. [13] A.Veenendaal, Elliot Daly, Eddie Jones, Zhao Gang, Sumalini Vartak, Rahul S Patwardhan, Sensor Tracked Points and HMM Based Classifier for Human Action Recognition, Computer Science and Emerging Research Journal, vol 5, 2016. [14] A. S. Patwardhan, 2016. “Structured Unit Testable Templated Code for Efficient Code Review Process”, PeerJ Computer Science (in review), 2016. [15] A. S. Patwardhan, and R. S. Patwardhan, “XML Entity Architecture for Efficient Software Integration”, International Journal f or Research in Applied Science and Engineering Technology (IJRASET), vol. 4, no. 6, June 2016. [16] A. S. Patwardhan and G. M. Knapp, “Affect Intensity Estimation Using Multiple Modalities,” Florida Artificial Intelligence Research Society Conference, May. 2014. [17] A. S. Patwardhan, R. S. Patwardhan, and S. S. Vartak, “Self-Contained Cross-Cutting Pipeline Software Architecture,” International Research Journal of Engineering and Technology (IRJET), vol. 3, no. 5, May. 2016. [18] A. S. Patwardhan, “An Architecture for Adaptive Real Time Communication with Embedded Devices,” LSU, 2006.