IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-19, NO. 3, MAY 1973 257 Information Theory in the Sixties Invited Paper ANDREW J. VITERBI Absiract-This retrospective survey describes the major trends in information theory research during the decade of the sixties, with emphasis on chanael and source coding. INTRODUCTION AND CATEGORIZATION N OSTALGIA on the origins of information theory after a quarter of a century is permissible, and even fashionable. But a retrospectiveon a decade just completed may be premature, if not presumptuous. Yet this was the period during which information theory came of age figuratively as well as temporally. For, while in the late fifties a few prophets and optimists foresaw the potential technological impact of the field, it took another decadeto complete the clarification of the theory and the education of at least a nucleus of communication engineers who could go forth and apply the fundamental concepts to real-world problems. In this sense, the sixties were much lessan era of basic innovation than the preceding decade, but rather a period of digestion, understanding, and of considerable construction on previously established foundations. Any survey of this varied field, with its several ramifica- tions, must begin with a categorization of the subject matter. We shall do this by arbitrarily, but hopefully objectively, dividing the field into central and noncentral areas. Central, of course, is the Shannon theory with its direct and im- mediate preoccupation with sourceand channel coding. All of probability theory, stochastic processes, and mathemat- ical statistics, including decision and estimation theory, are fundamental, and in large part prerequisite to an under- standing and appreciation of information theory, but they can hardly be consideredcentral. Examples of noncentral fields that utilize information-theoretic results and tech- niques are pattern recognition, inost of analog modulation theory, radar, and the analysis of practical communication systems, not to mention the distant suburbs of thermo- dynamics, psychology, and physiology. The Wiener theory has long been popularly consideredto be a core discipline of information theory, and indeed it did serve initially as a natural companion and intellectual partner of the Shannon theory.’ But in the decade of the sixties, it became increas- ingly clear that Wiener theory belongedto a wider audience than just information and communication theory. In fact it Manuscript received December 11, 1972. The author is with the Department of System Science, University of California, Los Angeles, Calif. 90024. 1 The key observation in this respect, made by Brockway McMillan in a 1970 seminar at Bell Laboratories, was that Wiener’s use of an a priori distribution on the message space, heresy to mathematical statisticians at the time, was fundamental to the evolution of modern communication theory. was in 1960 that the emergence of recursive filtering as a discipline within control theory, and its connection to optimal control, gave new impetus to the Wiener theory and established for it a role outside the main stream of information theory. To maintain reasonable space and time limitations, we shall concentrate only on the core areas of information theory and summarize very briefly at the end the major trends in the noncentral fields. Also, with the exception of references to the hardcover literature, we avoid the cult of personality by omitting all author surnamesother than the inevitable one of Shannon. In fact, references are included only to serve as examples for work cited. No attempt is made to reference the most significant work exhaustively or even systematically. CHANNEL CODING The central theme of information theory continues to follow closely the original outline of Shannon’s initial papers [l] : discrete sourcecoding, coding for a noisy chan- nel, and source coding with a fidelity criterion. Of these three main problems, it was the second that progressed most during the sixties. The problem of discrete source coding was resolved, in its most basic form, by the very definition of entropy. While improved source coding al- gorithms appearedin the fifties and sixties, and research is still active today on further refinementsand applications, a rudimentary coding algorithm was available from the start. This was not the case for noisy channel coding. The celebrated coding theorem was supported only by an existence proof. Thus it was natural that throughout the fifties and sixties this was the problem of greatest weight and urgency. In the fifties, much effort was devoted to generating explicit constructions for classesof codes that might produce asymptotically small error probabilities for all rates up to channel capacity. In the sixties, for the most part there .was less obsession with this seemingly futile goal; rather, channel coding researchseemed to break up into at least three distinct schools, each with its own literature, enthusiasts,and followers. Probably the largest was that concerned with algebraic, or error-correcting, codes. In the fifties this had been an active area, beginning with theealgebraic representation[2] of all linear error-correcting codes, and culminating at the end of the decade with the important class df Bose- Chaudhuri-Hocquenghem (BCH) codes [3], [4]. Even though algebraic codeswere clearly limited to a few, some- what simplistic, channel models,their developmentappeared in the early sixties to be an area of considerablepotential