Int. J. Bioinformatics Research and Applications, Vol. 1, No. 3, 2006 319 Copyright © 2006 Inderscience Enterprises Ltd. Improved protein fold assignment using support vector machines Robert E. Langlois, Alice Diec, Ognjen Perisic, Yang Dai and Hui Lu* Department of Bioengineering, University of Illinois at Chicago, 60607 Illinois, USA Fax: 312 413 2018 E-mail: rlangl1@uic.edu E-mail: adiec1@uic.edu E-mail: operis1@uic.edu E-mail: yangdai@uic.edu E-mail: huilu@uic.edu *Corresponding author Abstract: Because of the relatively large gap of knowledge between number of protein sequences and protein structures, the ability to construct a computational model predicting structure from sequence information has become an important area of research. The knowledge of a protein’s structure is crucial in understanding its biological role. In this work, we present a support vector machine based method for recognising a protein’s fold from sequence information alone, where this sequence has less similarity with sequences of known structures. We have focused on improving multi-class classification, parameter tuning, descriptor design, and feature selection. The current implementation demonstrates better prediction accuracy than previous similar approaches, and has similar performance when compared with straightforward threading. Keywords: fold recognition; support vector machines; machine learning; proteomics; structure prediction. Reference to this paper should be made as follows: Langlois, R.E., Diec, A., Perisic, O., Dai, Y. and Lu, H. (2006) ‘Improved protein fold assignment using support vector machines’, Int. J. Bioinformatics Research and Applications, Vol. 1, No. 3, pp.319–334. Biographical notes: Robert Ezra Langlois is a second year PhD student of Bioinformatics in Department of Bioengineering at University of Illinois at Chicago. He earned BS Degree in Bioengineering at UIC, May 2003. Currently he is supported by a NIH training grant: Cellular Signaling in Cardiovascular System. His research interests include machine learning, protein folding, structure prediction, protein function prediction, and binding prediction of signaling proteins. Alice Diec earned her Masters Degree in Bioinformatics from the Department of Bioengineering at UIC, October 2004. Currently, she is working in the Washington University Genome Center. Ognjen Perisic is a third year PhD student of Bioinformatics in Department of Bioengineering at UIC. His research interests are in computational biophysics, free energy calculation, non-equilibrium statistical physics in biology, and protein structure prediction.