Computers in Biology and Medicine 37 (2007) 1672 – 1675 www.intl.elsevierhealth.com/journals/cobm A web server for automatic analysis and extraction of relevant biological knowledge Juan Cedano a , , 1 , Mario Huerta a , 1 , Irene Estrada b , Frederic Ballllosera b , Oscar Conchillo a , Pedro Delicado c , Enrique Querol a a Institut de Biotecnologia i Biomedicina and Departament de Bioquímica i Biología Molecular, UAB, Spain b Escola Tècnica Superior d’Enginyeria, UAB, Spain c Departament d’Estadística i Investigació Operativa UPC, Spain Received 1 September 2006; received in revised form 27 March 2007; accepted 29 March 2007 Abstract Motivation: This application aims at assisting researchers with the extraction of significant medical and biological knowledge from data sets with complex relationships among their variables. Results: Non-hypothesis-driven approaches like Principal Curves of Oriented Points (PCOP) are a very suitable method for this objective. PCOP allows for obtaining of a representative pattern from a huge quantity of data of independent variables in a very flexible and direct way.A web server has been designed to automatically realize ‘non-linear pattern’ analysis, ‘hidden-variable-dependent’ clustering, and new samples local-dispersion-dependent’ classification from the data involving new statistical techniques using the PCOP calculus. The tools facilitate the managing, comparison and visualization of results in a user-friendly graphical interface. Availability: http://ibb.uab.es/revresearch2007 Elsevier Ltd. All rights reserved. Keywords: Web server; Principal curves analysis; Multivariate analysis; Clustering; Classification 1. Introduction There are many free software packages which implement many state-of-the-art machine learning/data mining algorithms such as Tooldiag, LNKnet or Weka [1]. With our web-based applications, a new in-depth analysis can be performed when the data shows complex relationships among the variables as well as high levels of noise, and by simply using a browser. The first use of this web framework is to study the com- plex interaction of the different variables at the same time in a multi-dimensional space. This is possible because our ‘non- linear pattern’ analysis tool seeks out the inner pattern of any continuous cloud of sample points in the case that there exists a curve to describe its sample distribution. Next, the ‘hidden- variable-dependent’ clustering, clusters the input sample-data for their contribution to the non-linear inner pattern. In this way, Corresponding author. Tel.: +34 935812807; fax: +34 935812011. E-mail address: jcedano@servet.uab.es (J. Cedano). 1 Both authors equally contributed to this report. 0010-4825/$ - see front matter 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiomed.2007.03.008 the sample data that constitute the different local behaviours can be examined separately and accurately. Finally, the ‘local- dispersion-dependent’ classification classifies new samples data for their belonging to a specific inner-pattern or to another one previously obtained. In this way, once a new sample is classified, we can assign the properties of the previously well- known data set to the new and unknown sample. As we will see, these tools can provide new analytical information very useful for medical and biological work and research. 2. The principal curve of oriented points calculus The mathematics behind this system makes use of the Prin- cipal Curves of Oriented Points (PCOP) calculus to obtain the non-linear, inner-pattern relationships [2,3]. The variables can be independent because the PCOP method identify a hidden dependent variable to order the data (in opposition to other non-linear analyses like regression curves). The PCOP is de- fined on the generalization, at the local level, of the principal- components variance properties. From the sample-space data,