2011, Vol.16 No.1, 073-078 Article ID 1007-1202(2011)01-073-06 DOI 10.1007/s11859-011-0714-2 Evaluation of the Occurrence Possibility of SNP in Brassica napus with Sliding Window Features by Using RBF Networks □ HU Xuehai 1 , LI Ruiyuan 2 , MENG Jinling 2 , XIONG Huijuan 1 , XIA Jingbo 1 , LI Zhi 1† 1. College of Science, Huazhong Agricultural University, Wuhan 430070, Hubei, China; 2. National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, Hubei, China © Wuhan University and Springer-Verlag Berlin Heidelberg 2011 Abstract: We extract some physical and chemical features re- lated to the occurrence of single nucleotide polymorphism (SNP) from three groups of sliding windows around SNP site, and then make the predictions about accuracy by using radial basis function (RBF) networks. The result of the forward sliding windows sug- gests that the accuracies and Matthews correlation coefficient (MCC values) ascend with the increasing of length of sliding windows. The accuracies range from 73.27 % to 80.69 %, and MCC values range from 0.465 to 0.614. The backward sliding windows and the sliding windows with fixed length three are de- signed to find the crucial sites related to SNP. The results imply that the occurrence possibility of SNP relies heavily on the above physical and chemical features of sites which are at a distance around 20 bases from the SNP site. Compared with the support vector machine (SVM), our RBF network approach has achieved more satisfactory results. Key words: single nucleotide polymorphism (SNP); radial basis function (RBF) network; Brassica napus; sliding windows CLC number: Q 755; TP 183 Received date: 2010-04-20 Foundation item: Supported by Discipline-Crossing Research Foundation of Huazhong Agricultural University(2008XKJC006) and the Fundamental Re- search Funds for the Central Universities of China Biography: HU Xuehai, male, Ph.D., research direction: data mining and bioinformatics. E-mail: huxuehai@mail.hzau.edu.cn † To whom correspondence should be addressed. E-mail: hzau_lizhi@mail.hzau. edu.cn. 0 Introduction The research on molecular markers is one of the pivotal facets in genomic studies. Among various types of available markers, single nucleotide polymorphisms (SNP), i.e., single base differences in the nucleotide se- quence, represent the most common class of genome variation and are one of the most popular markers in re- cent years. They provide an abundant source of DNA variation and are efficient for marker-assisted selection in crop-breeding programs. An important aspect on SNP technology is the pre- diction of SNPs. In 1999, based on the idea of statistical inference, Marth et al [1] developed a general method to find SNPs. Firstly, their method matches the gene se- quences of different varieties to the anchor gene, and then they found the SNP sites by using the so-called “PolyBayes” algorithm. Afterwards, many authors [2-4] made predictions by using the machine-learning technique. For example, Kong et al [3] used support vector machine (SVM), and Unneberg made use of artificial neural networks (ANN) [4] . During the implementation of most of machine-learning methods, feature extraction based on a given DNA se- quence is an essential step. It is important to note that the key factors of success are the information and feature which we extract from the DNA sequences and whether they have strong relationship with SNP. Among the ex- isting methods, we take notice of the result of Kong et