SEMO: Searching Majority Opinions on Movies using SNS QA Threads Jukyoung Lee Korea University Seoul, Korea rudolph0724@korea.ac.kr Yonghwa Choi Korea University Seoul, Korea yonghwachoi@korea.ac.kr Suhkyung Kim Korea University Seoul, Korea suhkyungkim@korea.ac.kr Seongsoon Kim Korea University Seoul, Korea seongkim@korea.ac.kr Jaewoo Kang Korea University Seoul, Korea kangj@korea.ac.kr ABSTRACT Many people seek majority opinions by searching for question- answers that are uploaded by others or uploading their own questions on social media sites. However, people have to read through a large number of documents returned by search services to find the majority opinions. Moreover, even when users upload questions on social media sites, they cannot immediately obtain answers. To address these problems, we present Searching Majority Opinions System (SEMO), a novel majority opinion-based search system that uses QA threads uploaded on SNS and cQA websites. SEMO re- turns entities based on majority opinions for opinion-finding queries in real time. We also tackled a data sparsity prob- lem using a novel query component expansion approach. To prove SEMO’s usefulness in finding majority opinions, we implemented a prototype of SEMO for the movie domain. We believe that our method can cause a paradigm shift in opinion-finding query search and help people make decisions. SEMO is available at http://semo.korea.ac.kr/ Keywords Majority Opinions Search, Social Question Answer, Entity Search 1. INTRODUCTION People are often curious about the opinions of others, when making decisions. Nowadays, many are heavily de- pendent on commercial search engines for this reason [3]. When utilizing web search engines, users scan through a list of documents provided by a search engine and find an answer from one of them. This process is quite effective for “fact- finding” queries, such as “What is the capital of Canada?” or “Who was the president of the U.S. in 1975?” since the Copyright is held by the author/owner(s). WWW 2016 Companion, April 11–15, 2016, Montréal, Québec, Canada. ACM 978-1-4503-4144-8/16/04. http://dx.doi.org/10.1145/2872518.2890553. users are likely to obtain answers after reading only a small number of documents. However, this process of current search engines fails to deliver satisfactory results for opinion-finding queries such as “What is a good thriller movie?” or “What is the best Christopher Nolan movie?”Different from fact-finding queries, opinion-finding queries do not have a correct answer. To ob- tain relevant answers to opinion-finding questions, we have to find majority opinions. However, due to time constraints, users tend to derive answers from a limited number of docu- ments. Therefore, the answers may not represent the opin- ions of the majority. One possible way to find majority opinions is to use com- munity Question Answering (cQA) sites or social networking service (SNS) sites. There exists a large number of question and answers written by users on cQA sites such as Yahoo! Answers 1 and Reddit 2 . Moreover, most of the cQA sites provide search services, so users can explore questions that are related to their interests. In addition, social network- ing service sites such as Facebook and Twitter are utilized as a means of obtaining information. Morris et al. pointed out that 10% of SNS users have posted questions on SNS sites, and more than 40% of them were questions asking for opinions or recommendations [4]. To find the opinions of the majority, users may utilize search services provided by cQA or SNS sites. However, like commercial search engines, these sites return a large number of documents as a result and leave users with the burden of reading and analyzing voluminous documents. Additionally, these sites may not have answers to users’ questions. Many of the questions posted on cQA or SNS sites are answered late or remain unanswered. To address these problems and help SNS and cQA sites find the opinions of the majority, we suggest the following two methods. First, after processing and analyzing data, we return entities as a result, rather than returning numerous text documents. We expect that this will save users time and thus help them make decisions since they do not have to read countless documents. Second, we expand an input query to subqueries and aggregate the result of the subqueries. We suppose that this method can handle the cases where questions that matched an input query do not exist. 1 https://answers.yahoo.com/ 2 https://www.reddit.com/ 219