1 Spam Detection using Reference Text: A Preliminary Study for Spam Ground Truth Generation Arunabha Tarafdar 1,3( arunabhat.phd19.cs@nitp.ac.in), Chayan Halder 2 , and Dinesh Dash 1 1 National Institute of Technology, Patna, India 2 Ramakrishna Mission Vivekananda Centenary College, Rahara 3 Jain (Deemed-to-be) University, Bangalore June 19, 2023 Abstract Spam detection is a large area of study that has been approached from many different angles. Spam has been a threat to the normal operation of the internet since the late 1990s and most recently. Today, spam is not just found in emails; it also affects several other platforms, including social media and chat web platforms. In recent years, there have been significant changes in both the variety and meaning of spam. We are throwing light on the topic of word spam in digital photographs distributed through an online chat platform in this paper. In this article, we’ll talk about spam texts as well as how to spot them. Keywords: Spam, common sub string, wishes, advertisements. 1 Introduction Unwanted and unsolicited messages or contents of various types sent in bulk or singly through known and unknown sources are considered spam. Simply said, spam is undesired in all forms [1]. Texts or images that match the data but are not required by the user may be included in a pool of similar data. An end user can receive mostly two types of spam. One, generated from an unknown source and another from known sources. Filtering out spam message that are generated from unknown sources can be done by detecting the domain from where the message is generated and labelling them as spam depending about the trustfulness of the domain. Where filtering spam depending on contents of the messages is a different task. Consider one text message among many messages that might, notify you that you have won a particular lottery [2]. Elaborating the scenario, a text here we are concerned with the contents of the message rather than the origin domain. In cases like these the text length varies for each message that make the