SCIMA 2003 - International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications Provo, Utah, USA, 17 May 2003 Simplifying OCR Neural Networks with Oracle Learning Joshua Menke and Tony Martinez Department of Computer Science Brigham Young University, Provo, UT, 84604 Email: josh@axon.cs.byu.edu, martinez@cs.byu.edu Abstract – Often the best model to solve a real world problem is relatively complex. The following presents oracle learning, a method using a larger model as an oracle to train a smaller model on unlabeled data in order to obtain (1) a simpler ac- ceptable model and (2) improved results over standard train- ing methods on a similarly sized smaller model. In particular, this paper looks at oracle learning as applied to multi-layer perceptrons trained using standard backpropagation. For optical character recognition, oracle learning results in an 11.40% average decrease in error over direct training while maintaining 98.95% of the initial oracle accuracy. I. INTRODUCTION As Le Cun, Denker, and Solla observed in [3], often the best ar- tificial neural network (ANN) to solve a real-world problem is relatively complex. They point to the large ANNs Waibel used for phoneme recognition in [2] and the ANNs of Le Cun et al. with handwritten character recognition in [1]. “As applications become more complex, the networks will presumably become even larger and more structured” [3]. The following research presents the oracle learning algorithm, a training method that seeks to create less complex ANNs that (1) still maintain an ac- ceptable degree of accuracy, and (2) provide improved results over standard training methods. Designing a neural network for a given application requires first determining the optimal size for the network in terms of accuracy on a test set, usually by increasing its size until there is no longer a significant decrease in error. Once found, the preferred size for more complex problems is often rela- tively large. One method of reducing the complexity is to use a smaller ANN still trained using standard methods. Using ANNs smaller than the optimal size results in a decrease in ac- curacy. The goal of this research is to increase the accuracy of these smaller, less resource intensive ANNs using oracle learn- ing. As an example consider designing an ANN for optical char- acter recognition in a small, handheld scanner. The network has to be small, fast, and accurate. Now suppose the most ac- curate digit recognizing ANN given the available training data has 2048 hidden nodes, but the resources on the scanner al- low for only 64 hidden nodes. One solution is to train a 64 Fig. 1. Oracle Learning Summary hidden node ANN using standard methods, resulting in a com- promise of significantly reduced accuracy for a smaller size. This research demonstrates that applying oracle learning to the same problem results in a 64 hidden node ANN that does not suffer from nearly as significant a decrease in accuracy. Or- acle learning uses the original 2048 hidden node ANN as an oracle to create as much training data as necessary using un- labeled character data. The oracle labeled data is then used to train a 64 hidden node network to approximate the 2048 hid- den node network. The results in section IV show the oracle learning ANN retains 98.9% of the 2048 hidden node ANN’s accuracy on average, while being 1 32 the size. The resulting oracle-trained network (OTN) is almost 18% more accurate on average than the standard trained 64 hidden node ANN. Although the previous example deals exclusively with ANNs, oracle learning can be used to train any model using a more accurate model of any type. Both the oracle model and the oracle-trained model (OTM) in figure 1 can be any machine learning model (e.g. an ANN, a nearest neighbor model, a bayesian learner, etc.). In fact, the oracle model can be any ar-