Classifying very-high-dimensional data with random forests of oblique decision trees Thanh-Nghi Do, Philippe Lenca, St´ ephane Lallich, and Nguyen-Khang Pham Abstract The random forests method is one of the most successful ensemble meth- ods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can ex- pect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investigate a new approach for supervised classification with a huge number of nu- merical attributes. We propose a random oblique decision trees method. It consists of randomly choosing a subset of predictive attributes and it uses SVM as a split function of these attributes. We compare, on 25 datasets, the effectiveness with clas- sical measures (e.g. precision, recall, F1-measure and accuracy) of random forests of random oblique decision trees with SVMs and random forests of C4.5. Our proposal has significant better performance on very-high-dimensional datasets with slightly better results on lower dimensional datasets. Thanh-Nghi Do Institut Telecom; Telecom Bretagne UMR CNRS 3192 Lab-STICC Universit´ e europ´ eenne de Bretagne, France Can Tho University, Vietnam e-mail: tn.do@telecom-bretagne.eu Philippe Lenca Institut Telecom; Telecom Bretagne UMR CNRS 3192 Lab-STICC Universit´ e europ´ eenne de Bretagne, France e-mail: philippe.lenca@telecom-bretagne.eu St´ ephane Lallich Universit´ e de Lyon, Laboratoire ERIC, Lyon 2, France e-mail: stephane.lallich@univ-lyon2.fr Nguyen-Khang Pham IRISA, Rennes, France Can Tho University, Vietnam e-mail: pnguyenk@irisa.fr 1