107 Niladri Sekhar Dash Research Cell : An International Journal of Engineering Sciences, Issue December 2016 ISSN: 2229-6913 (Print), ISSN: 2320-0332 (Online) -, Web Presence: http://www.ijoes.vidyapublications.com © 2016 Vidya Publications. Authors are responsible for any plagiarism issues. Culling Scientific and Technical Terms from Text Corpora for Compiling a TermBank in Bangla Niladri Sekhar Dash Linguistic Research Unit Indian Statistical Institute Kolkata, India Email: ns_dash@yahoo.com ABSTRACT In this paper I describe a few steps that we adopt to develop a digital TermBank after culling the Scientific and Technical Terms (STTs) from a text corpus of Bangla. Following the stages and methods of processing and analysis of corpus we are successful to develop a TermBank which now contains nearly 10,000 terms to be used in various works of linguistics and language technology. The strategy we use can be effectively applied on corpora of other Indian languages for same purposes. This confirms its utility and relevance in NLP works for Indian languages. Keywords: Scientific and technical terms, corpus, POS tagging, collocation, lemmatization, TreeBank, terminology, frequency 1. Introduction The development of a comprehensive digital database of scientific and technical terms (STTs) in a language is important in works of linguistics and language technology, such as, termbank compilation, linguistic resource generation, machine translation, machine learning, information retrieval, knowledge representation, text classification, language planning, online language education, dictionary compilation, text composition, and mass literacy (Sager 1994). Keeping such activities in mind, we have developed, as a project of our NLP activities, a comprehensive database of nearly 10,000 STTs extracted from a Bangla corpus of scientific texts compiled with data collected from the TDIL corpus developed for the language. To be precise in presentation, we first define the concept of scientific term (Section 2) and technical term (Section 3) to draw a line of distinction between the two. Next, we describe methods we use to process the corpus (Section 4), and the architecture we use for TermBank compilation (Section 5). In conclusion (Section 6), we identify people who use this TermBank to address various needs of linguistics and language technology (Wright and Budin, 1997, pp. 370). 2. Scientific Term The expression scientific term refers to single and multiword units that are used in different scientific texts in specialized senses. Although the literal meaning of the expression refers to specialized terms used in scientific texts, it is not confined to the fields of science only. Rather it encompasses all the specialized terms used in any discipline of human knowledge.