ACCEPTED MANUSCRIPT SMCA-13-04-0221 1 Multi-source Data Ensemble Modeling for Clinker Free Lime Content Estimate in Rotary Kiln Sintering Processes Weitao Li, Dianhui Wang, Senior Member, IEEE, and Tianyou Chai, Fellow, IEEE Abstract—Clinker free lime (f-CaO) content plays a crucial role in determining the quality of cement. However, the existing methods are mainly based on laboratory analysis and with significant time delays, which makes the closed-loop control of f-CaO content impossible. In this paper, a multi-source data ensemble learning-based soft sensor model is developed for online estimation of clinker f-CaO content. To build such a soft sensor model, input flame images, process variables, and the corresponding output f-CaO content data for a rotary cement kiln were collected from No. 2 rotary kiln at Jiuganghongda Cement Plant which produces 2,000t of clinker per day. The raw data were pre-processed to distinguish the flame image regions of interest (ROI) and remove process variable outliers. Three types of flame image ROI features, i.e., color, global configuration, and local configuration features, were then extracted without segmentation. Further, a kernel partial least square (KPLS) technique was applied for extracting the compressed score matrix features from the concatenated flame image features and filtered process variables to avoid high dimensional, nonlinear, and correlated problems. Feed-forward neural networks with random weights were employed as base learners in our proposed ensemble modeling framework, which aims to enhance the model’s relia- bility and prediction performance. A total of 157 flame images, the associated process variable data, and the experimentally measured f-CaO content data were used in our experiments. A comparative study on the f-CaO content estimator built by various feature compressed techniques and learner models and robustness analysis were carried out. The results indicate that the proposed multi-source data ensemble soft sensor model performs favorably and has good potential in real world applications. Index Terms—f-CaO content, soft sensor, multi-source data, ensemble modeling, neural networks with random weights. I. I NTRODUCTION T HE rotary kiln, as a large-scale heat exchange facility, is widely used in metallurgical, cement, chemical, and environment protection industries. A major issue in the rotary kiln sintering process is the online index measurement for the system output, i.e., clinker quality. Unfortunately, there is no analyzer instrument available so far for real time sensing of clinker quality due to its special structure. Some relevant W. T. Li is with Department of Electric Engineering and Automation (Hefei University of Technology), Hefei, Anhui Province 230009, China (e-mail: wtli@hfut.edu.cn) D. H. Wang is with Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, VIC 3086, Australia; He is also with The State Key Laboratory of Synthetical Automation for Process Industries (Northeastern University), Shenyang, Liaoning Province 110004, China (e-mail: dh.wang@latrobe.edu.au) T. Y. Chai is with The State Key Laboratory of Synthetical Automation for Process Industries (Northeastern University), Shenyang, Liaoning Province 110004, China (e-mail: tychai@mail.neu.edu.cn) work utilizing either process variables or flame image features has been done based on statistical approaches [1]–[3]. It is well known that clinker quality directly affects the quality of cement. Thus, it is important to effectively and efficiently estimate f-CaO content as feedback to design controllers. Generally, f-CaO content is obtained by offline lab analysis from manual sampling at 1h period intervals. Therefore, sig- nificant time delays take place between the clinker quality real-time control and the availability of the f-CaO content feedback signals. This makes f-CaO content-based closed- loop control impossible. So far, open-loop control schemes are implemented by observing burning zone flame images and process variables. Operators estimate the current burning state via flame images and process variables, and then regulate the manipulated variables to drive the controlled variables to fall into preset ranges so that the f-CaO content can be estimated. Nevertheless, the accuracy of the estimated f-CaO content can be affected by an operator’s mental state, work experience and attitude. Fluctuant f-CaO content will significantly impact the stability of the cement quality, and lab analysis can only be a reference and guide for operators in subsequent operations. Therefore, any method for online estimation will greatly help in reducing the amount of clinker rejections and implementing closed-loop control strategies. With the development of computational intelligence tech- niques, soft sensor techniques have received considerable attention in process industries. The soft sensor, as a signal reconstruction modeling approach, is used to distinguish hard- to-measure process variables from online easy-to-measure process variables. Moreover, recent developments in measure- ment techniques enable us to collect, store and analyze a large amount of process data, and make data-driven-based soft sensor modeling methods possible. Although soft sensor techniques have been applied to various domain applications [4]–[6], they share common components and properties, such as input variable selection and estimator design. It is important to select a subset of the whole feature to build a robust model with better generalization capability. As for regressor design, many learner models, such as support vector regressor (SVR) [7] and random vector functional-link networks (RVFL) [8], [9], can be employed. Based on the operator’s experience, it is believed that both flame images and process variables have a close rela- tionship with the clinker f-CaO content. From an operator’s understanding, the color and configuration features of ROI, i.e., the material region and flame region, of burning zone