An Optimization Rule for In Silico Identification of Targeted Overproduction in Metabolic Pathways Mouli Das, C.A. Murthy, and Rajat K. De Abstract—In an extension of previous work, here we introduce a second-order optimization method for determining optimal paths from the substrate to a target product of a metabolic network, through which the amount of the target is maximum. An objective function for the said purpose, along with certain linear constraints, is considered and minimized. The basis vectors spanning the null space of the stoichiometric matrix, depicting the metabolic network, are computed, and their convex combinations satisfying the constraints are considered as flux vectors. A set of other constraints, incorporating weighting coefficients corresponding to the enzymes in the pathway, are considered. These weighting coefficients appear in the objective function to be minimized. During minimization, the values of these weighting coefficients are estimated and learned. These values, on minimization, represent an optimal pathway, depicting optimal enzyme concentrations, leading to overproduction of the target. The results on various networks demonstrate the usefulness of the methodology in the domain of metabolic engineering. A comparison with the standard gradient descent and the extreme pathway analysis technique is also performed. Unlike the gradient descent method, the present method, being independent of the learning parameter, exhibits improved results. Index Terms—Local minima, Newton-Raphson method, underdetermined problem, metabolic pathways, learning parameter Ç 1 INTRODUCTION I T is well known that the enzyme-catalyzed biochemical reactions within the cell are grouped as metabolic pathways [1]. The importance of modeling these biochem- ical pathways has been extensively described in the literature. Consequently, mathematical (computational) modeling approaches for analyzing functionality and regulation of biochemical pathways are rapidly gaining importance. Various optimization algorithms such as the Levenberg-Marquardt method, genetic programming, si- mulated annealing and evolutionary algorithms have been applied to infer optimal pathways in biochemical models [2], [3], [4]. The flux balance analysis (FBA) technique, a constraint-based optimization approach applied to genome- scale metabolic models, can be used to make predictions of flux distributions and optimal pathways based on linear optimization [5]. Many learning algorithms find their roots in function minimization that can be classified into local minimization and global minimization [6]. Local minimization algo- rithms, such as gradient descent (GD), are fast but usually converge to local minima. In contrast, global minimization algorithms have heuristic strategies to help escape from local minima. Many techniques in data mining and machine learning follow a GD paradigm in the iterative process for optimization [7]. However, several drawbacks of the GD learning method have been observed: 1. its convergence speed is usually too low, 2. its convergence accuracy is hard to control, 3. it is easily stuck in bad local minima, and 4. the choice of proper learning constant largely depends on trial and error [8]. One common approach is to upgrade the normal GD learning, which is a first-order learning algorithm, to a second-order one. Since the second-order method is an optimization algorithm with quadratic convergence speed, it can be used to improve the learning speed and accuracy of the normal backpropagation (BP) [6]. We have recently presented a supervised second-order learning algorithm, a modification of the Newton-Raphson method that identifies some optimal metabolic pathways accurately and efficiently leading to the overproduction of a biochemical product of interest [9]. The learning method can be considered as a nonlinear global optimization problem in which the goal is to minimize a nonlinear objective function that involves the weights using heuristic strategies [10]. In applying our proposed second-order learning method to biochemical modeling, we implemented two subsidiary improvements: 1) Since it is difficult to model nonlinear biochemical systems, we adopted a second-order derivative transformation in the updated learning rule, thereby facilitating the optimization. 2) Our proposed method always converges to the global minimum experi- mentally (in contrast to local optima). However, for cases where the first-order derivative is zero, there may be a local minimum and also a point of inflection. The second-order method will not perform in such situations. 3) Our proposed 914 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 10, NO. 4, JULY/AUGUST 2013 . The authors are with the Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata, West Bengal 700 108, India. E-mail: mouli.das@gmail.com, {murthy, rajat}@isical.ac.in. Manuscript received 18 July 2012; revised 23 Apr. 2013; accepted 22 May 2013; published online 10 June 2013. For information on obtaining reprints of this article, please send e-mail to: tcbb@computer.org, and reference IEEECS Log Number TCBB-2012-07-0172. Digital Object Identifier no. 10.1109/TCBB.2013.67. 1545-5963/13/$31.00 ß 2013 IEEE Published by the IEEE CS, CI, and EMB Societies & the ACM