2011, Vol.16 No.1, 073-078
Article ID 1007-1202(2011)01-073-06
DOI 10.1007/s11859-011-0714-2
Evaluation of the Occurrence
Possibility of SNP in Brassica napus
with Sliding Window Features by
Using RBF Networks
□ HU Xuehai
1
, LI Ruiyuan
2
, MENG Jinling
2
,
XIONG Huijuan
1
, XIA Jingbo
1
, LI Zhi
1†
1. College of Science, Huazhong Agricultural University,
Wuhan 430070, Hubei, China;
2. National Key Laboratory of Crop Genetic Improvement,
Huazhong Agricultural University, Wuhan 430070, Hubei, China
© Wuhan University and Springer-Verlag Berlin Heidelberg 2011
Abstract: We extract some physical and chemical features re-
lated to the occurrence of single nucleotide polymorphism (SNP)
from three groups of sliding windows around SNP site, and then
make the predictions about accuracy by using radial basis function
(RBF) networks. The result of the forward sliding windows sug-
gests that the accuracies and Matthews correlation coefficient
(MCC values) ascend with the increasing of length of sliding
windows. The accuracies range from 73.27 % to 80.69 %, and
MCC values range from 0.465 to 0.614. The backward sliding
windows and the sliding windows with fixed length three are de-
signed to find the crucial sites related to SNP. The results imply
that the occurrence possibility of SNP relies heavily on the above
physical and chemical features of sites which are at a distance
around 20 bases from the SNP site. Compared with the support
vector machine (SVM), our RBF network approach has achieved
more satisfactory results.
Key words: single nucleotide polymorphism (SNP); radial basis
function (RBF) network; Brassica napus; sliding windows
CLC number: Q 755; TP 183
Received date: 2010-04-20
Foundation item: Supported by Discipline-Crossing Research Foundation of
Huazhong Agricultural University(2008XKJC006) and the Fundamental Re-
search Funds for the Central Universities of China
Biography: HU Xuehai, male, Ph.D., research direction: data mining and
bioinformatics. E-mail: huxuehai@mail.hzau.edu.cn
† To whom correspondence should be addressed. E-mail: hzau_lizhi@mail.hzau.
edu.cn.
0 Introduction
The research on molecular markers is one of the
pivotal facets in genomic studies. Among various types
of available markers, single nucleotide polymorphisms
(SNP), i.e., single base differences in the nucleotide se-
quence, represent the most common class of genome
variation and are one of the most popular markers in re-
cent years. They provide an abundant source of DNA
variation and are efficient for marker-assisted selection
in crop-breeding programs.
An important aspect on SNP technology is the pre-
diction of SNPs. In 1999, based on the idea of statistical
inference, Marth et al
[1]
developed a general method to
find SNPs. Firstly, their method matches the gene se-
quences of different varieties to the anchor gene, and
then they found the SNP sites by using the so-called
“PolyBayes” algorithm.
Afterwards, many authors
[2-4]
made predictions by
using the machine-learning technique. For example,
Kong et al
[3]
used support vector machine (SVM), and
Unneberg made use of artificial neural networks (ANN)
[4]
.
During the implementation of most of machine-learning
methods, feature extraction based on a given DNA se-
quence is an essential step. It is important to note that the
key factors of success are the information and feature
which we extract from the DNA sequences and whether
they have strong relationship with SNP. Among the ex-
isting methods, we take notice of the result of Kong et