International Journal of Computer Applications (0975 8887) Volume 100No.10, August 2014 47 Sentiment and Emotion Analysis for Context Sensitive Information Retrieval of Social Networking Sites: A Survey D.I. George Amalarethinam, Ph.D Director MCA and Associate Professor of Computer Science, Jamal Mohamed College, Tiruchirappalli, Tamil Nadu, India. V. Jude Nirmal Assistant Professor, Department of IT, St. Joseph’s College (Autonomous), Tiruchirappalli, Tamil Nadu, India ABSTRACT Context Sensitive Information Retrieval (CSIR) is quite a challenging issue because of the complexities involved in the process from semantics and ontology to the huge amount of processing capacity required to make it possible in real time. Understanding the semantic gap (where context is neglected) plays a major role in elimination false positives and improving the true positives in the information retrieval process. With big data becoming ubiquitous due to the volume, velocity and variety of data being presented and analysed in almost all the domains today, context sensitive analysis and interpretation of big data becomes important. This paper presents a comprehensive survey of the existing techniques for big data analysis based on massively parallel processing techniques like GPGPUs (CUDA), Hadoop Map- Reduce and also Data Warehousing. This paper presents a discussion about the datasets that are available for research and also the applications that could be thought of by context sensitive analysis of social media data. Also this paper provides research directions for context sensitive information retrieval and sentiment analysis in big data based on massively parallel processing architecture. Keywords Context Sensitive Information Retrieval, Sentiment Analysis, Emotion Analysis, CUDA, Hadoop, Parallel mining 1. INTRODUCTION The information age has brought about an explosion of data collected from various sources. These data are information rich, but it is difficult to efficiently process this data and extract valuable information from them. Information extraction was carried out on the basis of syntactic structures rather than semantics. The ability to extract information based on the context. Context Sensitive Information Retrieval has also gained prominence in the recent years. With the vast amount of data available, meaningful information extraction will help in a variety of ways which will be discussed in the later chapters. Semantics is referred to as the study of meanings, in other words, it is meaningful computing. It uses Natural Language Processing to support the process of information retrieval. In order to extract meaningful information, a semantic system uses the content of search, location, word variation, intent of text, synonyms, concept matching and natural language queries. When providing a semantic based query the IR system analyzes the searcher’s intent and the contextual meaning to provide more relevant results. The process of retrieving information from a document or a set of documents is usually performed by the process of querying. In context sensitive system a single query is not sufficient to retrieve the results, it may require more than one query. Hence a feedback mechanism is usually imposed on the system to ensure high accuracy levels. Shen et al in [1] proposed that Feedback incorporation in an information retrieval system can be either implicit or explicit. The explicit or relevance feedback requires the user to explicitly provide inputs regarding the process of retrieval. They will be intimated to rank documents, mark similar documents or categorize documents. Even though the above method [1] is effective, it is not often successful, because the users do not usually come forward to provide such information. The implicit feedback on the other hand evaluates the queries and the query modifications provided by the user. The user does not always get the required result in a single query. They perform many modifications in the queries to retrieve the desired result. These modifications can provide the essential information required for the system. In short, the implicit feedback exploits the available information by using the user’s history data for its analysis. 1.1. Multi-lingual Semantics Language is the means of communication. Hence performing searches based on one’s own language certainly has many advantages and will provide more accurate results. A language mediated search is not only tempting but also is effortless and effective. Web inherently supports multiple languages inherently, but there is no data integration mechanism present[2]. The problem is to bridge the gap between language specific information needs of the users and language independent semantic context, which can help in universal access of data. The most apparent hindrance is that ontologies are language specific. Hence incorporating multi-lingual searches at this point are difficult if not impossible. 1.2. Social Network Outburst The first documented social networking site, launched in 1995 was classmates.com. As of November 2011, it boasts of 50,000,000 registered users. Fig 1 shows that the growth of social networking has been increased about 203% in 2012- 2013. There are many undocumented sites, hence the beginning of social networking cannot be accurately dated. Some of the social networking sites that gained prominence are Friendster, Hi5 and LinkedIn, that were launched during (2002-2003). The actual outburst occurs after the introduction of MySpace, Orkut and Facebook (2003-2004). And now