Special Speech Synthesis for Social Network Websites Csaba Zaink´ o, Tam´ as G´ abor Csap´ o, and G´ eza N´ emeth Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Hungary {zainko,csapot,nemeth}@tmit.bme.hu Abstract. This paper gives an overview of the design concepts and im- plementation of a Hungarian microblog reading system. Speech synthesis of such special text requires some special components. First, an eﬃcient diacritic reconstruction algorithm was applied. The accuracy of a former dictionary-based method was improved by machine learning to handle ambiguous cases properly. Second, an unlimited domain text-to-speech synthesizer was applied with extensions for emotional and spontaneous styles. Chat or blog texts often contain ”emoticons” which mark the emo- tional state of the user. Therefore, an expressive speech synthesis method was adapted to a corpus-based synthesizer. Four emotions were gener- ated and evaluated in a listening test: neutral, happy, angry and sad. The results of the experiments showed that happy and sad emotions can be generated with this algorithm, with best accuracy for female voice. Key words: diacritic restoration, emotional speech synthesis, microblog reading system, chat-to-speech 1 Introduction This paper gives an overview of the design concepts and implementation steps of a Hungarian microblog text-to-speech reading system. Microblog websites (e.g. Twitter, http://twitter.com) and chat-like talking applications are very popular nowadays. In chat applications, where little talk is written, it is advan- tageous to use speech instead of always keeping track of the dialog. The user can do something else than looking at the screen, and he will still know what is being said in the chat channel. This scenario is mainly useful when messages do not arrive very often. A microblog reader system can also be useful in mobile environment, because there is no possibility to continuously watch the display or the user does not have a free hand to handle the device (e.g. during car driving, or sport activities like running). Another possible situation is if the user is work- ing with a full screen desktop application and he needs real time information from social networks. Loud reading demands only short time attention, and task changing is not necessary. This system is very useful for visually impaired and blind people, as well. However, chat-to-speech synthesis conveys some new problems. In current web-based social networks people tend to use the special form of their language