L. Lamontagne and M. Marchand (Eds.): Canadian AI 2006, LNAI 4013, pp. 110 – 121, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Bayesian Learning for Feed-Forward Neural Network
with Application to Proteomic Data: The Glycosylation
Sites Detection of the Epidermal Growth Factor-Like
Proteins Associated with Cancer as a Case Study
Alireza Shaneh
1
and Gregory Butler
1
1
Research Laboratory for Bioinformatics Technology,
Department of Computer Science and Software Engineering, Concordia University,
1515 St-Catherine West, Montreal, Quebec H3G 2W1, Canada
{a_dariss, gregb}@cs.concordia.ca
http://www.cs.concordia.ca
Abstract. There are some neural network applications in proteomics; however,
design and use of a neural network depends on the nature of the problem and
the dataset studied. Bayesian framework is a consistent learning paradigm for a
feed-forward neural network to infer knowledge from experimental data.
Bayesian regularization automates the process of learning by pruning the un-
necessary weights of a feed-forward neural network, a technique of which has
been shown in this paper and applied to detect the glycosylation sites in epi-
dermal growth factor-like repeat proteins involving in cancer as a case study.
After applying the Bayesian framework, the number of network parameters de-
creased by 47.62%. The model performance comparing to One Step Secant
method increased more than 34.92%. Bayesian learning produced more consis-
tent outcomes than one step secant method did; however, it is computationally
complex and slow, and the role of prior knowledge and its correlation with
model selection should be further studied.
1 Introduction
Proteomic data are huge, sparse and redundant, and dealing with these characteristics
is a challenge and requires powerful methods to infer knowledge. Soft computing
techniques have been extensively used to mine and extract as much necessary infor-
mation as possible from a large set of protein sequences. Generally speaking, the soft
computing methods used to solve optimization problems are divided into computa-
tional, statistical, and metaheuristic frameworks. Computational approaches optimize
the learning algorithm by predicting the future state of a solution based on the past
evaluation of data in both supervised and unsupervised ways. Artificial neural net-
works are obvious examples of such methods. Statistical learning theory emphasizes
on the statistical methods used for automated learning; for example, kernel-based
methods such as support vector machines [24] find an optimal separating hyperplane
by mapping data to a higher dimensional search space. Metaheuristic algorithms are