1198 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 8, AUGUST 1998 A Modular Neural Network Vector Predictor for Predictive Image Coding Lin-Cheng Wang, Member, IEEE, Syed A. Rizvi, Member, IEEE, and Nasser M. Nasrabadi, Senior Member, IEEE Abstract—In this paper, we present a modular neural network vector predictor that improves the predictive component of a pre- dictive vector quantization (PVQ) scheme. The proposed vector prediction technique consists of ﬁve dedicated predictors (ex- perts), where each expert predictor is optimized for a particular class of input vectors. An input vector is classiﬁed into one of ﬁve classes, based on its directional variances. One expert predictor is optimized for stationary blocks, and each of the other four expert predictors are optimized to predict horizontal, vertical, 45 , and 135 diagonally oriented edge-blocks, respectively. An integrating unit is then used to select or combine the outputs of the experts in order to form the ﬁnal output of the modular network. Therefore, no side information is transmitted to the receiver about the selected predictor or the integration of the predictors. Experimental results show that the proposed scheme gives an improvement of 1.7 dB over a single multilayer perceptron (MLP) predictor. Furthermore, if the information about the predictor selection is sent to the receiver, the improvement could be up to 3 dB over a single MLP predictor. The perceptual quality of the predicted images is also signiﬁcantly improved. Index Terms—Mixture of experts, modular vector prediction, neural networks, predictive vector quantization. I. INTRODUCTION P REDICTION is a powerful tool that is commonly used by a class of image compression techniques, called predictive coding schemes. In a typical predictive coding scheme, a predictor is used to estimate the current pixel by using the information from several past pixels. The estimated (predicted) pixel is then subtracted from the original pixel to form a residual signal that has a lower entropy. The residual signal is then encoded by a coarse quantizer. The success of prediction relies on the presence of a high correlation among neighboring pixels. A stationary signal (image) is predictable, while a white-noise random signal (image) is totally unpredictable. The pixels in natural images are considered to be highly correlated because objects and scenes of natural images tends to have certain consistency in structure. Thus, image prediction is, in principle, possible. Manuscript received October 10, 1996; revised December 1, 1997. L.-C. Wang is with SONY Semiconductor Company of America, San Jose, CA 95134 USA (e-mail: lwang@ssa-de.sel.sony.com). S. A. Rizvi is with the Department of Applied Sciences, College of Staten Island, City University of New York, Staten Island, NY 10314 USA. N. M. Nasrabadi is with the U.S. Army Research Laboratories, Adelphi, MD 20783-1197 USA. Publisher Item Identiﬁer S 1057-7149(98)05311-1. A particular predictive coding scheme is the predictive vector quantization (PVQ) scheme [1], which is a vector extension of the scalar differential pulse coded modulation (DPCM) technique [17]. PVQ has been used for compression of images [1], [2] and has shown excellent promises in terms of high compression performance and peak signal-to-noise ratio (PSNR). In PVQ, an image is partitioned into several contiguous, nonoverlapping blocks (vectors). A PVQ coding scheme, as shown in Fig. 1, uses a vector predictor that predicts the current block from the previously encoded blocks and constructs a residual block (the difference between the original and predicted blocks). The residual block is then quantized (encoded) with the use of a relatively small vector quantization (VQ) codebook [1]. The index of this best- matched codevector (with least distortion) is then transmitted to the decoder. The decoder, a simple look-up table, fetches the codevector that corresponds to the transmitted index. The predicted block is produced by the other identical vector predictor at the receiver. Thus, both sides can have the same reconstruction. The image compression is achieved by transmitting (storing) the index, which requires a relatively small number of bits as compared to the number of bits required to represent all the raw pixels in the original block. A variation of the above PVQ scheme is obtained when a set of multistage VQ codebooks are used, this technique is referred to as predictive residual vector quantization (PRVQ) [2]. The performance of a predictive coding scheme can be improved by increasing the prediction gain in the areas where a traditional predictor provides a relatively low prediction gain (blocks containing object boundaries and textural information, etc.). Furthermore, the human visual system is very sensitive to the edge degradation, which is usually introduced by a predictive coding scheme. Therefore, a good prediction of edge pixels (blocks) is important for achieving a good perceptual quality in a predictively encoded image. Conventional (linear) predictors exploit only ﬁrst-order correlation (locally similar luminance values) and do provide a reasonable performance. However, a better performance can always be achieved if we are able to exploit the higher order (structural) correlations as well. A neural network predictor is supposed to be able to exploit structural correlations due to the nonlinear processing of the input signal [4]. However, a single neural network predictor generally does not provide a signiﬁcant improvement in the perceptual quality of the predicted image. This behavior of a neural network predictor can be attributed to two main problems: i) the ﬁrst-order correlation dominates in the training 1057–7149/98$10.00  1998 IEEE