An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning Lin Gui 1,2 , Jiachen Du 1 , Zhishan Zhao 3 , Yulan He 2 , Ruifeng Xu 1(B ) , and Chuang Fan 1 1 Harbin Institute of Technology (Shenzhen), Shenzhen, China xuruifeng@hit.edu.cn 2 Aston University, Birmingham, UK 3 Baidu Inc., Beijing, China Abstract. Multi-task learning (MTL) models, which pool examples arisen out of several tasks, have achieved remarkable results in language processing. However, multi-task learning is not always effective when compared with the single-task methods in sequence tagging. One possi- ble reason is that existing methods to multi-task sequence tagging often reply on lower layer parameter sharing to connect different tasks. The lack of interactions between different tasks results in limited performance improvement. In this paper, we propose a novel multi-task learning archi- tecture which could iteratively utilize the prediction results of each task explicitly. We train our model for part-of-speech (POS) tagging, chunking and named entity recognition (NER) tasks simultaneously. Experimental results show that without any task-specific features, our model obtains the state-of-the-art performance on both chunking and NER. Keywords: Multi-task learning · Interactions · Sequence tagging 1 Introduction Sequence tagging is one of the most important topics in Natural Language Pro- cessing (NLP), encompassing tasks such as part-of-speech tagging (POS), chunk- ing, and named entity recognition (NER). In recently years, neural network (NN) based models have achieved impressive results on various sequence tagging tasks, including POS tagging [1, 2], chunking [3, 4], and NER [5, 6]. One of the challenges for sequence tagging tasks is that there is not enough training data to train a good model. Heavy handcrafted features and language- specific knowledge resources are costly to develop in new sequence tagging tasks [5]. To overcome this problem, multi-task learning (MTL) models have been proposed. MTL is an important mechanism that aims to improve the generalization of model performance by learning a task together with other related tasks [7]. Sev- eral NN based MLT models have been applied to various sequence tagging tasks c Springer Nature Switzerland AG 2018 M. Zhang et al. (Eds.): NLPCC 2018, LNAI 11109, pp. 288–298, 2018. https://doi.org/10.1007/978-3-319-99501-4_25