HBRP Publication Page 1-12 2021. All Rights Reserved Page 1 Advancement of Computer Technology and its Applications Volume 4 Issue 3 An Implementation of Advanced NLP for High-Quality Text-To- Speech Synthesis Sharmi Islam 1 , Mustahid Hasan 2 , Md. Ismail Jabiullah 3* Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh. * Corresponding Author E-mail Id:- drismail.cse@diu.edu.bd ABSTRACT In this paper, we utilize Bengali voice information (Bangladesh and Kolkata) to convert it in to text format. In order to build a Speech to text conversion framework one should give two key parts: a NLP (Natural Language Processing) stage, basically works on the information on the input speech, and a text generation stage to produce the desired output. These two distinct levels must exchange both data and commands to supply Text. As completing task relies on many distinct scientific areas, any achievement toward standardization can minimize the effort and increase the dynamic of the results. The development in correspondence advancements AI (specially Machine learning and Deep learning) led researcher in convolutional neural network (CNN), which is standing out enough to be noticed because of its high performance. Nonetheless, most normal issue with deep learning architectures such as CNN is that they require large amount of data for training. This paper gives an overview of the NLP stage in the speech to txt framework for Bangla language built by our collective, and describes the integration into the database. Keywords:-Speech, databases, speech recognition, training, natural language processing, speech classification, speech enhancement, CNN INTRODUCTION Now-a-days consistent with World Health Organization [1], visually impaired people are in outrageous growth due to one of the main causes of vision impairment, uncorrected refractive errors, and Cataracts. Globally, it is estimated that approximately 253 million people live with some form of vision impairment as shown below in Fig 1. According to the latest survey provided by the World Health Organization, there are more than 750,000 people with visual impairment in Bangladesh. Many solutions have been devised, however, they’re either too high in cost which makes them unavailable and affordable to most of the people, or inefficient products that can’t be used for people with rural places in Bangladesh. The biggest challenge for a visually impaired person, especially someone with total vision loss according to National Academies Press (US), is to navigate around new places safely and as mentioned by Abbas et al. [2] To achieve things of independence, which is one of the ultimate goals that a person with a disability might strive to achieve. Many solutions have been proposed to achieve this goal, but unfortunately most of them are very expensive which makes them inaccessible to most of the people, especially in countries with high poverty rates. Also, a challenge is most of Bengali people don’t know English very well. In this paper, we proposed an voice assistant [4] using a pre-trained CNN [5,6] as a type of deep learning methods [7] for