Neurocomputing 69 (2006) 862–865 Letters A reliable method for the diagnosis of gastric carcinoma Loris Nanni DEIS, IEIIT—CNR, Universita` di Bologna, Viale Risorgimento 2, Bologna, Italy Received 29 June 2005; received in revised form 22 August 2005; accepted 24 August 2005 Available online 21 November 2005 Abstract Predicting the different levels of gastric carcinoma from clinical and histopathological investigations is an important problem in bioinformatics and a challenging task for machine learning algorithms. In this paper, we have investigated an ensemble of classifiers and tested it on a real-world dataset. A genetic algorithm is applied to select the most relevant features. The obtained results are very encouraging, our results improve the average predictive accuracy obtained in previously published works. r 2005 Elsevier B.V. All rights reserved. Keywords: Gastric carcinoma; Ensemble of classifiers; Feature selection 1. Introduction Cancer of the stomach, also called gastric cancer, is a disease in which cancer (malignant) cells are found in the tissues of the stomach. Sometimes cancer can be in the stomach for a long time and can grow very large before it causes any symptoms. In the early stages of the stomach cancer, a patient may have indigestion and stomach discomfort, in more advanced stages of cancer of the stomach, the patient may have blood in the stool, vomiting, weight loss, or pain in the stomach. Some factors that increase the chances of getting stomach cancer are a stomach disorder, called atrophic gastritis, disorder of the blood, called anemia, or a hereditary condition of growths, called polyps, in the large intestine. Stomach cancer is difficult to detect in its early stages because its early symptoms are absent or mild. Unfortunately, this is a highly aggressive cancer and overall survival rate is very low. If there are symptoms of cancer, a physician will usually order an upper gastrointestinal X-ray or look inside the stomach with a gastroscope. This procedure is called gastroscopy, and it is useful in the detection of most stomach cancers. Early gastric cancer is defined as gastric cancer confined to the mucosa or submucosa, regardless of the presence or absence of lymph node metastasis [8]. In advanced gastric cancers (AGC), as defined by Bormann, the tumour is invaded into the proper muscle layer beyond the stomach [3]. Moreover, knowledge of these types permits a preliminary assessment of tumour spread. According to Bormann Classification, AGCs are divided into four groups, Bormann I, Bormann II, Bormann III, and Bormann IV. In [3], the authors propose an inductive supervised learning algorithm called benefit maximizing classifier on feature projections (BCFP) in order to diagnose the gastroenterological tumours. Moreover, the authors pro- pose to use a genetic algorithm (GA) to find a relevant subset of features in the data. In this paper, we investigate ensemble of classifiers [4] and we propose an ensemble of classifiers for handling missing features values. Moreover, we show that using only a subset of all histopathological features, it is possible to obtain a very low error rate. The obtained results are very encouraging, our results improve the average predictive accuracy obtained in previously published works. 2. System Patient records collected for diagnosis and prognosis typically contain values of clinical and histopathological investigations. The features used in this domain are represented as a vector of 68 features [3]. Sixty-one of these features are categorical. We have substituted each categorical value with a fixed numerical value. The dataset ARTICLE IN PRESS www.elsevier.com/locate/neucom 0925-2312/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2005.08.001 E-mail address: lnanni@deis.unibo.it.