International Journal of Recent Advances in Engineering & Technology (IJRAET) _______________________________________________________________________________________________ _______________________________________________________________________________________________ ISSN (Online): 2347 - 2812, Volume-4, Issue -7, 2016 18 Speech Feature Extraction and Matching Technique Suhasini S Goilkar Department of Electronics and Telecommunication Engineering Assistant Professor, Finolex College of Engineering and Technology Ratnagiri, Maharashtra, India- 400077 Abstract - The ultimate goal of the present investigation is to study the speech coding techniques for better understanding the natural spoken language considering the obvious constraints such as speaker dependency, isolated words, limited vocabulary and artificial grammar. The speech communication technology between human and computer is experiencing a revolutionary progress in the information industry. For analysis, synthesis, coding and recognition purpose the speech signals have to be converted into the digital form. The speech signals are continuous time and amplitude waveforms, which are then sampled and quantized. In the present work, two speech coding techniques have been used, the linear predictive coding technique for feature extraction. Dynamic Time Warping is a cost minimization matching technique, in which a test signal is stretched or compressed according to a reference template. Keywords: Speech Recognition, Linear Predictive Coding, Dynamic Time Warping. I. INTRODUCTION Speech is one of the most convenient and efficient communication tools between the human beings. In particular, as personal computers become popular and able to process multimedia information, speech attracts extensive interests as a means of friendly user interfaces. High performance speech recognition, unlimited text-to- speech synthesis is the key factors for a successful human interface using speech. The speech waveforms are time continuous and amplitude continuous waveforms, which are then sampled and quantized. Dynamic Time Warping is an efficient technique for isolated word recognition and can be adapted to connected word recognition. It is a good method to determine the similarity between two temporal sequences, due to its capacity to align sequences with different lengths. [1] II. SPEECH ANALYSIS AND SYNTHESIS Speech analysis and synthesis consists of study of speech production, finding parameters of speech, analyze these parameters, synthesize parameters, find efficient ways to finding these parameters. Speech analysis and synthesis could also include speech compression, spectral analysis of speech, linear predictive analysis, sampling of speech, analog-to- digital and digital-to-analog conversion of speech, digitalization of speech. 2.1 Speech production Speech sounds are air pressure vibrations produced by air exhaled from the lungs and modulated and shaped by the vibrations of the glottal cords and the vocal tract as it is pushed out through the lips and nose. Speech signals, in addition to communicating the linguistic information, convey a multitude of other information including; gender, age, accent, intent, emotion, humor and the state of health of the speaker. All this information is conveyed primarily within the traditional telephone bandwidth of 4 KHz. 2.2 Speech communication Sounds are the most natural form of communication for humans and many animal species. Speech sounds have a rich temporal-spectral variation. In contrast, animals can only produce a relatively small repertoire of basic sound units. Animal sounds have a less varied spectral- temporal composition than speech, and consist of a series of signaling calls or howls instead of complex language with a grammar. Just as written language sequence of elementary alphabet, speech is a sequence of elementary acoustic symbols i.e. phonemes that convey the spoken form of language. Speech signals convey more than spoken words. The information conveyed in speech includes: Acoustic, phonetic symbols, Gender Information, age, accent, Speaker‟ s Identity, Emotion and Health, Prosody. 2.3 Isolated word recognition This is also called discrete recognition system. In this system, there has to be a pause between uttered words. Therefore the system does not have to care about finding boundaries between words. Uttered words are analyzed and compared to prepared models of the words in the vocabulary. Telephony and command applications where there is only need to dictate a specific digit or a word is ideal for this system. With short pauses between spoken words, are primarily used in small vocabulary command control applications such as name-dialing, Internet navigation, and voice-control of computer menus or accessories in a car. Isolated word recognition systems may use models trained on whole word examples or constructed from concatenation of sub-