REFERENCE PATERNS HMM LOGIC DECISION PATERN MATCHING SIGNAL ANALISYS VQ CODEBOOK SAMPLING LOW-PASS FILTER PRE-ENFASIS FILTER VQ LPC/CEPSTRAL ANALISYS VECTOR “CODEBOOK” A FPGA-BASED VITERBI ALGORITHM IMPLEMENTATION FOR SPEECH RECOGNITION SYSTEMS. Fabian Luis Vargas 1 Rubem Dutra Ribeiro Fagundes 2 Daniel Barros Junior 3 vargas@ee.pucrs.br rubem@ee.pucrs.br dbarros@ee.pucrs.br DEE – Departamento de Engenharia Elétrica – Faculdade de Engenharia Pontifícia Universidade Católica do Rio Grande do Sul - PUCRS Av. Ipiranga 6681, prédio 30 sala 152 Porto Alegre - RS - Brasil 1 Professor in Electrical Department, Engineering School at Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre – RS - Brazil 2 Professor in Electrical Department, Engineering School at Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre – RS - Brazil 3 Researcher in the Electrical Department, Engineering School at Pontificia Universidade Católica do Rio Grande do Sul, Porto Alegre – RS - Brazil ABSTRACT This work proposes a speech recognition system based on a hardware/software co-design implementation approach. The main advantage in this approach is an expressive processing time reduction in speech recognition, because part of the system is implemented by dedicated hardware. This work also discuss another way to implement “Hidden Markov Models” (HMM), a probabilistic model extensively used in speech recognition systems. In this new approach, the Viterbi algorithm, used to compute the HMM likelihood score, will be “built in” together with the HMM structure designed in Hardware, and implementing probabilistic state machines that will run as parallel processes each one for each word in the vocabulary handled by the system. So far, we have a dramatic speed up performance, getting meseaures around 500 times faster than a classic implementation with the correctness comparable with others isolated word recognition systems. 1. INTRODUCTION 1.1 Speech Recognition System Structure A speech recognition system (SRS) is basically a pattern recognition system dedicated to detect speech, or in other words, to identify language words into a sound signal achieved as input from the environment. Words Speech Feature Extraction Figure 1: Speech Recognition System Figure 1 shows the main steps to process a front-end speech recognition system. In the signal analysis step a speech sampling will be made with an A/D converter. Those samples are processed in order to extract some relevant features from speech signal input. [FAGU1993] [RABI1993]. The next step, pattern matching, makes a comparison among source reference patterns (also sets of signal parameters from reference patterns) previously stocked on the system and scores the likelihood of this reference patterns against the input set. The next step, decision logic, chooses one of those reference sets that match with the signal parameters set from the input (usually called “test set”). 2. SECTION II 2.1 Signal analysis implementation. Signal analysis is responsible for signal sampling, its conversion in a digital representation, and vector quantization. At end, the speech signal will be replaced by sequences of label-codes. ( figure 2) SPEECH SOUND FS SAMPLING RATE WINDOWING CODE SEQUENCE Figure 2: Signal Analysis main tasks Figure 2 shows the six sequential tasks to be executed in the signal analysis step, which are: