Data Mining Approaches to Diffuse Large B–Cell Lymphoma Gene Expression Data Interpretation Jes´ us S. Aguilar-Ruiz 1 , Francisco Azuaje 2 , and Jos´ e C. Riquelme 3 1 University of Seville, Seville, Spain aguilar@lsi.us.es 2 University of Ulster, North Ireland fj.azuaje@ulster.ac.uk 3 University of Seville, Seville, Spain riquelme@lsi.us.es Abstract. This paper presents a comprehensive study of gene expres- sion patterns originating from a diffuse large B–cell lymphoma (DLBCL) database. It focuses on the implementation of feature selection and clas- sification techniques. Thus, it firstly tackles the identification of relevant genes for the prediction of DLBCL types. It also allows the determina- tion of key biomarkers to differentiate two subtypes of DLBCL samples: Activated B–Like and Germinal Centre B–Like DLBCL. Decision trees provide knowledge–based models to predict types and subtypes of DL- BCL. This research suggests that the data may be insufficient to accu- rately predict DLBCL types or even detect functionally relevant genes. However, these methods represent reliable and understandable tools to start thinking about possible interesting non–linear interdependencies. 1 Introduction Lymphomas are divided into two general categories: Hodgkin’s disease (HD) and non–Hodgkin’s lymphoma (NHL). Over the past 20 years HD rates have declined, accounting for only 1% of all cancer. By contrast, NHL cases have increased by more than 50% during the same period in the United States [10]. They represent 4% of all cancer cases, becoming the fifth most common malig- nancy in that country. An analysis of NHL incidence trends between 1985 and 1992 in seven European countries showed an average increase of 4.2% per year, in the absence of an increase in the incidence of HD. In Spain, their death rate per 100.000 people during the periods 1965–69 and 1995–98 increased 212.7% for men and 283.9% for women [13]. These figures reveal the significance of developing advanced diagnostic and prognostic systems for these diseases. In the last two decades, a better understanding of the immune system and the genetic abnormalities associated with NHL have led to the identification of several previously unrecognized types of lymphoma. However, this is a com- plex and expensive task. For instance, distinctions between Burkitt’s lymphoma Y. Kambayashi et al. (Eds.): DaWaK 2004, LNCS 3181, pp. 279–288, 2004. c Springer-Verlag Berlin Heidelberg 2004