1 Supervised Urdu Word Segmentation Model Based on POS Information Sadiq Nawaz Khan 1, *, Khairullah Khan 1 and Wahab Khan 2 1 Department of Computer Science, University of Science & Technology Bannu, Pakistan 1 Department of Computer Science, University of Science & Technology Bannu, Pakistan 2 Department of Computer Science & Software Engineering, IIU, Islamabad 44000, Pakistan Abstract Urdu is the national language of Pakistan, also the most widely spoken and understandable language of the globe. In order to accomplish successful Urdu NLP a robust and high-performance NLP tools and resources are utmost necessary. Word segmentation takes on an authoritative role for morphologically rich languages such as Urdu for diverse NLP domains such as named entity recognition, sentiment analysis, part of speech tagging, information retrieval etc. The morphological richness property of Urdu adds to the challenges of the word segmentation task, because a single word can be composed of null or a few prefixes, a stem and null or a few suffixes. In this paper we present supervised Urdu word segmentation scheme based on part of speech (POS) information of the corresponding words. For experiments conditional random fields (CRF) with contextual feature is used. The performance of the proposed system is evaluated on 300K words, results shows evidential improvements on baseline approach. Keywords: Urdu, Word segmentation, supervised learning, conditional random fields Received on 10 May 2018, accepted on 04 September 2018, published on 10 September 2018 Copyright © 2018 Sadiq Nawaz Khan et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited. doi: 10.4108/eai.19-6-2018.155444 * Corresponding author. Sadiqnawaz97@gmail.com 1. Introduction Nowadays Natural Language Processing plays a vital role in every field of computer science. Human beings are trying to simulate human knowledge by computer system. For this purpose, NLP researchers struggle by introducing knowledge through which computers understand and use natural language. To achieve desired tasks different types of advanced tools and procedures are applied to make computer systems more cognizable. Various disciplines lie in NLP fundamentals such as electronic and electrical engineering, linguistics, information and computer sciences, mathematics, psychology and artificial intelligence (AI) etc [1]. Natural Language Processing applications are widely used which mainly consist of different fields of studies, like word segmentation, speech recognition, text processing and summarization, CLIR (cross language information retrieval), user interfaces, voice recognition and artificial intelligence etc. Information retrieval (IR) recognizes desired valuable information from a huge collection of data while information extraction (IE) is used to process document(s) for identification of such entities or events that are pre- specified or a technique that processes a document(s), to identify pre-specified entities or events. Artificial intelligence is a sub-field of computer science in which we study the development of hardware and software that simulates human intelligence. For every NLP application Word Segmentation has vital role. Word segmentation is capable of separation written or oral text into meaningful word tokens. It identified words boundaries in a spoken language. In Research Article EAI Endorsed Transactions on Scalable Information Systems EAI Endorsed Transactions on Scalable Information Systems Online First