Exploring Effective Methods for On-line Societal Risk Classification and Feature Mining Nuo Xu 1,2 and Xijin Tang 1,2(B ) 1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China xunuo1991@amss.ac.cn , xjtang@iss.ac.cn 2 University of Chinese Academy of Sciences, Beijing 100049, China Abstract. China has to face lots of societal conflicts during periods of social and economic transformation. It is crucial to exactly detect soci- etal risk for the mission to a harmonious society. On-line community con- cerns have been mapped into respective societal risks and support vector machine model has been used for risk multi-classification on Baidu hot news search words (HNSW). Different from traditional text classifica- tion, societal risk classification is a more complicated issue which relates to socio-psychology. Conditional random fields (CRFs) model is applied to access to societal risk perception more accurately. We regard the risks of all the terms throughout a hot search word as a sequential flow of risks. The experimental results show that CRFs model has superior per- formance with capturing the contextual constraints on HNSW. Besides, state features can be extracted based on CRFs model to study distribu- tions of terms in each risk category. The distribution rules of geographical terms are found and summarized. Keywords: Societal risk classification · HNSW · Paragraph Vector · Conditional random fields · Feature mining 1 Introduction In the Web 2.0 era, Internet users are both content viewers and content produc- ers. Search engines have been the most common tool to access to information. The contents of high searching volume of search engine reflect the netizens’ atten- tion. Baidu is now the biggest Chinese search engine. Baidu hot news search words (HNSW) are based on real-time search behaviors of hundreds of millions of Inter- net users and released at Baidu News Portal, reflecting the Chinese current con- cerns and ongoing societal topics. In such way, we utilize HNSW as a perspec- tive to analyzing societal risk which refers to the risk problems raising the con- cern of the whole society. Traditional research on societal risk was studied from the angle of cognitive psychology based on the psychometric paradigm and ques- tionnaires [1], which is generally expensive and time-consuming to be conducted. Zheng et al. constructed a framework of societal risk indicators including 7 cat- egories which are national security, economy/finance, public morals, daily life, c Springer Nature Singapore Pte Ltd. 2017 X. Cheng et al. (Eds.): SMP 2017, CCIS 774, pp. 65–76, 2017. https://doi.org/10.1007/978-981-10-6805-8_6