Conference Proceedings NCETCS1469 NCETCS’14 [276] SLP: A Novel way of Telugu Linguistics Processing using Semantic Web Technologies Teja Santosh Dandibhotla Srinivas Reddy Annappalli Dept. Of Computer Science & Engineering, Dept. Of Computer Science & Engineering, GITAM University, GITAM University, Hyderabad Hyderabad santosh87@gitam.edu srinivas.reddy@gitam.edu Abstract - Current Natural Language Processing (NLP) techniques provide support for part of speech (PoS) tagging through various open source tools like Word Net 2.0 and also programmed web applications support through APIs and Analyzers. With benefits and liabilities from these works, the work can be extended towards using Semantic Web (Web 3.0) technologies as the traditional linguistic processing tasks can be mapped to web 3.0 technologies leading to efficient information retrieval by first annotating the Telugu words with the help of “AnnCorra” PoS tagging. RDF is developed through “Relationship” vocabulary and mapping the obtained PoS tags with Penn RDF/OWL vocabulary so that the machine can now understand the Telugu specific entities (resources) over the web. SPARQL queries can be posed to retrieve the relationships. The obtained relationships are correct as these are understood by the machine. Keywords: PoS tagging, Web 3.0, RDF, OWL, SPARQL I. INTRODUCTION Natural Language Processing techniques are heavily used in today‟s many applications like online reviews[1], speech recognition[2], text retrieval[3] and etc. in order for the data to be correctly identified and retrieved as relevant content from unstructured documents. In order for the retrieved data to be understood in regional languages, language interoperable user interface APIs are mapped for the data. With the profound impact of Google‟s Transliteration and its support to regional languages (Indian Languages in particular), a deluge of documents have come into existence. Lot of research work has done and happening on identifying the document related to regional search query keyword(s) with no compromise on IR parameters: precision and recall. Popular technique like Latent Semantic Indexing [4] is used in current IR combined with clustering to achieve the before said thing. The documents retrieved follow the standard NLP technique: PoS tagging [5]. PoS tagging allows “word sense disambiguation” that can increase IR performance to a greater extent. This feature greatly supports the development of web 3.0 in particular data retrieval. Various applications have developed to annotate the Part-of-Speech of the word namely DBpedia Spotlight [6], Open NLP [7], GATE [8], Open Calais [9] and etc. These used various NLP APIs (either singly or in multiple) to efficiently annotate PoS words. These were limited to English language only. For efficient annotation of the given Telugu query, we take the support of „AnnCorra‟ [10] work to annotate Telugu words and will include them as RDF literal which are considered as PoS through Penn RDF/OWL [11] vocabulary. When entity extraction gets completed, SPAQRL query [12] is then executed to verify the results. II. RELATED WORK