IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 9, SEPTEMBER 1997 723 Auditory Feature Extraction Using Self-Timed, Continuous-Time Discrete-Signal Processing Circuits Nagendra Kumar, Student Member, IEEE, Gert Cauwenberghs, Member, IEEE, and Andreas G. Andreou, Member, IEEE Abstract—A compact integrated subsystem for accurate real- time measurement of level-crossing time-intervals, suitable for multiresolution feature extraction from an analog cochlear filter bank is presented. The subsystem is inspired by the function of the inner hair cells in the mammalian cochlea and is based on continuous-time discrete-signal processing circuits. Experimental results from a fabricated array of nine elements demonstrate instantaneous frequency-to-voltage conversion over a range cov- ering the audio band. The power consumption is less than 20 W per cell from a 5-V supply, when the system is biased to operate over the speech frequency range. Index Terms—Analog integrated circuits, neural network hard- ware, very-large-scale integration. I. INTRODUCTION V ERY LARGE SCALE INTEGRATION signal processing systems are often classified into analog, digital or mixed- mode. With an emphasis on low power real-time VLSI, there has been an intense discussion as to how much processing should be done in analog and how much in digital to achieve optimum performance [1]. Missing in most of these discus- sions is the fact that it is the design of the algorithm that gives an advantage to either analog or digital implementation. Certain algorithms map well to analog, while others map well to digital hardware. There is another class of signal processing algorithms [2]–[6] that requires an entirely new hardware design paradigm. These algorithms necessitate an event based, asynchronous signal processing approach where the signal can take only discrete values, but is continuous in time. For example, signal could be quantized to two discrete levels, and change value at the event of a zero crossing. This paradigm is called continuous-time discrete-signal (CTDS) signal processing. It is a mixed-mode approach whereby algorithms exploit the robustness of discrete signal representations but preserve the continuity of events in the time domain. CTDS signal processing can only be approximated in a digital Manuscript received June 15, 1995; revised August 16, 1996. This work was supported by the National Science Foundation under Grant ECS-9313934 (P. Werbos, Program Monitor), by the Center for Language and Speech Processing at Johns Hopkins University, and by Lockheed-Martin Corporation. This paper was recommended by Associate Editor S. Kiaei. The authors are with the Department of Electrical and Computer Engineer- ing, The Johns Hopkins University, Baltimore, MD 21218 USA. Publisher Item Identifier S 1057-7130(97)06581-6. implementation by using a fast clock and thus oversampled in time. By using event based computation, unnecessary power dissipation can be eliminated by avoiding high-speed global clocks and associated unnecessary switching events. The signal representation, and the system organization that follows from it, is similar to self-timed asynchronous digital design methodologies [7] and also with the address event representation (AER) [8] for interchip communication [9], [10]. A small CTDS system for centroid computation of visual stimuli was also presented in [11]. An analog VLSI event based system for speech processing and feature extraction has also been reported in [12]. However, in the latter system, a digital clock is used to provide time-stamp to each zero crossing and the spectral shape is extracted by relying on tuning characteristics of the cochlear filters. There is evidence that biological systems employ an analo- gous representation [13] that is natural when computation must rely on individual components (neurons) that have limited intrinsic bandwidth, have a limited dynamic range and operate without global clocks. In particular, the location of zero- crossings of a signal in the time domain reveals much of its characteristics in the spectral domain [14]. An appropriately smoothed version of the time interval between consecutive zero crossings yields the inverse of a dominant frequency present in the signal [2]. Inner-hair cells attached to the basilar membrane [15], [16] are believed to encode the formant and tone information from audio signals through such zero- crossing intervals. Neural models of auditory processing in the inner-hair cells using level-crossing signal representations have been presented in the literature [5], [6]. In Ghitza’s model [5], output from time interval measurements between level crossings are aggregated across cochlear channels to produce an ensemble interval histogram (EIH) spectral measure that has robust properties in the presence of noise. This paper presents a compact subsystem for accurate and real time measurement of level-crossing time-intervals. The subsystem is suitable for multiresolution feature extraction from an analog cochlear filter bank. The CTDS approach deviates from the standard approach where audio signals are digitized by an analog to digital converter and the signal processing is performed by a specialized digital signal mi- croprocessor [17]. The inputs of the subsystem are analog as it interfaces to a silicon cochlea [18], [19] and the outputs are also analog as they will be subsequently processed by 1057–7130/97$10.00 1997 IEEE