1 Depression Detection Using Sentiment Analysis Techniques in Arabic Text Afaf Hussein Abdelrahman 1 , Doaa Elzanfaly 2 , Mai El Defrawi 3 , Samah Hamed 4 Department of Information System, Helwan University , Egypt 1 afafmseleem@gmail.com, 2 doaa.saad@fci.helwan.edu.eg, 3 mai.eldefrawi@gmail.com Abstract—Depression is a mood disorder marked by persistent melancholy and interest loss. It is one of the most dangerous mental health issues that humans may encounter. Machine learning is used to detect depression. Analyzing depression is not a simple task, most people are not usually willing to go to a psychiatrist. or They may not even know that they suffer from depression. Sentiment analysis usually relies on machine learning to analyze human texts. The main goal is to determine if the general sentiment of a piece of text is positive, negative, or neutral. However, there are other factors of consideration, including the frequency and redundancy of the text. There are only a few studies in Arabic, where most of the work is usually done in English. In this paper, Arabic tweets are analyzed to determine depression. This research proposes a new model to identify the level of depression based on several elements while focusing on the timing and frequency of the tweet. A dataset using tweepy API is created to determine the level of depression. This research shows good results as precision is 0.97, recall is 100%, and f1-score is 0.98. Our results outperforms the current techniques by about 30%. Index Terms—Depression; sentiment analysis, twitter, supervised learning; machine learning I. INTRODUCTION Depression has become a prominent global public health concern, especially in low- and middle-income countries [1]. Major depressive disorder (MDD) is the most prevalent in the world. According to the World Health Organization, about 350 million people in the world are affected by this condition [2, 3, 4]. A person may exhibit any combination of little interest or pleasure in doing things, poor appetite or overeating, feeling bad about yourself, or that you are a failure or have let yourself or your family down. Trouble falling or staying asleep, or sleeping too much, or trouble concentrating on things to be classified as depressed [5]. Social media has become very important. Everyone, whether young or old, uses it to express their thoughts and feelings, such as on Twitter and Facebook.in In Arab countries, it is found that the idea of visiting a psychiatrist is still feared by people. Social media is resorted by them to express their feelings if a problem is perceived. Due to social media's widespread use, there may be ways to lessen the prevalence of undiagnosed mental illnesses like stress, anxiety, and depression [6]. Users' tweets are analyzed using sentiment analysis and machine learning to identify signs of depression. Sentiment analysis involves the use of natural language processing (NLP) techniques with machine learning to automatically determine the sentiment expressed in a given text, such as positive, negative, or neutral [7]. Machine learning employs two primary techniques: supervised learning, which constructs models from labeled data with predefined input-output pairs for training, and unsupervised learning, which builds models using only input data, uncovering patterns within unlabeled datasets without specified output tags [8], as shown in Fig 1. Fig. 1. Machine learning techniques [8] Supervised learning involves training a model with input/output pairs or labeled data to predict future outcomes accurately based on past data [9]. Because supervised learning produces accurate predictions [10], regression analysis and classification are the two main uses for it. Classification is a supervised learning approach used to analyze a given data set and to build a model that separates data into a desired and distinct number of classes [11]. Always in classification When the number of classes is less, the efficiency is higher, so most research focuses on a maximum of three classes. According to [12], Support Vector Machine (SVM), Convolution Neural Network (CNN), Logistic Regression (LR), Random Forests (RF), Decision Trees (DT), are common techniques used for classification. International Journal of Computer Science and Information Security (IJCSIS), Vol. 22, No. 5, October 2024 https://google.academia.edu/JournalofComputerScience https://sites.google.com/site/ijcsis/ ISSN 1947-5500