Scalable Connectivity Processor for Computer Music Performance Systems Rimas Avizienis, Adrian Freed, Takahiko Suzuki & David Wessel Center for New Music and Audio Technologies (CNMAT) University of California Berkeley 1750 Arch St., Berkeley, CA 94709 {rimas, adrian, takahiko, wessel}@cnmat.berkeley.edu, www.cnmat.berkeley.edu Abstract Standard laptop computers are now capable of sizeable quantities of sound synthesis and sound processing, but low-latency, high quality, multichannel audio I/O has not been possible without a cumbersome external card cage. CNMAT has developed a solution using the ubiquitous 100BaseT Ethernet that supports up to 10 channels of 24-bit audio, 64 channels of sample-synchronous control-rate gesture data, and 4 precisely time-stamped MIDI I/O streams. Latency measurements show that we can get signals into and back out of Max/MSP in under 7 milliseconds. The central component in the device is a field programmable gate array (FPGA). In addition to providing a variety of computer interface capabilities, the device can function as a cross-coder for a variety of protocols including GMICS. This pa- per outlines the motivation, design, and implementation of the connectivity processor. 1. Context and Prior Work Hardware development for computer music performance sys- tems has followed the standard pattern of technology evolution passing through the first two phases of design focus, function and price, to the final phase: usability. We have identified size and connectivity as primary usability issues for computer music performance systems. Laptop computers have recently become available with fast signal processing capabilities at a moderate cost and although their size makes them very attractive for musi- cal performance, their constrained expansion capabilities limit connectivity. Currently there are no commercially available, low-latency, high-reliability, compact, multi-channel audio solu- tions for laptop computers. Furthermore the architecture of cur- rent laptops and most computers makes it impossible to syn- chronize acquired gestural data and sound I/O to satisfy the low latency/jitter needed for satisfactory reactive performance sys- tems: 10±1ms (Freed, et al., 1997). The 10 ms latency criterion is not difficult to meet, but a maximum latency variation of ±1ms is difficult to achieve, especially when the stimulus gesture is represented as a MIDI event or a low rate signal from a non- sample-synchronous input source like a data acquisition card rather than a sample-synchronized audio input signal. The only computers with the requisite unified clock management and operating systems support for such tight synchronization are from Silicon Graphics. Unfortunately, even the smallest configu- rations of their machines, the O2 and Octane are too large and expensive for most performing musicians. One of our key design goals was to eliminate virtually all latency variation in low sample rate inputs like those from gestural input devices. 2. Introduction We introduce here a new connectivity processor which solves the synchronization problem mentioned above and is readily connected to a laptop or any computer system with a 100BaseT Fast Ethernet port or digital audio port such as ADAT or AES/EBU. The conventional approach for building a system to integrate gesture and sound is to combine a microcontroller, a DSP chip with A/D and D/A converters, and network interface chips for MIDI, Ethernet, AES/EBU, etc. Although this approach lever- ages the strengths of each chip, each processor comes with its own specialized low-level programming tools and development systems which complicate development and ultimately add cost as each chip has its own requirements for surrounding memory and associated glue chips to integrate these components. Our alternative, more flexible approach supports scalable im- plementations from a few channels of audio and gestures to hun- dreds of channels by integrating all digital functions on a single field programmable gate array (FPGA). These functions are determined by compiling high-level hardware descriptions in VHDL into FPGA configurations (Skahill, 1996). VHDL is a acronym for VHSIC Hardware Description Language where VHSIC is an acronym for Very High Speed Integrated Circuits. This approach allows the considerable investment in developing the interface logic to each peripheral to be easily leveraged on a wide variety of FPGA's from different vendors and of different sizes up to millions of gates. FPGA’s are a better match in this application than signal processors or general-purpose microproc- essors because most of the communication protocols needed in multimedia and gesture systems are bit-serial. Signal processors and general-purpose processors operate on bytes and words and are not as well adapted to high-speed bit-serial protocols. We have developed and tested VHDL descriptions for process- ing serial audio data for the SSI, S/PDIF, AES/EBU, AES-3, and ADAT industry standards. For gestural transductions we have VHDL modules that communicate with multichannel A/D con- verters, MIDI, and RS232 and RS422 serial ports. Although others have previously developed hardware language descrip- tions of many of these protocols for proprietary systems, our library of modules represents the first complete, independent suite available in VHDL. Our library also includes the glue to conform the asynchronous clocks required by each module to a unified synchronous sample rate clock. Our suite makes possible some unusual cross-coding strategies such as embedding ges- tural data in audio streams, thereby increasing temporal preci- sion by exploiting isochronous data paths in the control proces- sor.