Parameter Estimation in a Cell Cycle Model for Frog Egg Extracts Jason W. Zwolak ∗ , John J. Tyson ∗∗ , and Layne T. Watson ∗ Departments of Computer Science ∗ and Biology ∗∗ Virginia Polytechnic Institute and State University Blacksburg, Virginia 24061-0106 e-mail: jzwolak@vt.edu Keywords: Computational biology, ordinary differen- tial equations, parameter estimation Abstract Cell cycle models used in biology can be very com- plex, involving many parameters with initially unknown values. The values of the parameters vastly affect the accuracy of a model in representing real biological cells. Typically people search for the best parameters of a model using the computer only as a tool to run simula- tions. In this paper methods and results are described for a computer program that searches for parameters to a specific model using well tested algorithms. The code for this program uses ODRPACK for parameter estimation and LSODAR to solve the differential equations that comprise the model. 1. INTRODUCTION Computational models of cell growth and division involve digital representation of a complex network of biochemical reactions within cells. These reactions can be described by a system of nonlinear ordinary differen- tial equations, according to the principles of biochemical kinetics. Rate constants and binding constants enter as parameters in the differential equations, and must be estimated by fitting solutions of the equations to experimental data. This work concerns some classical experiments on activation of MPF (M-phase promoting factor) in frog egg extracts. MPF is a dimer of cyclin and Cdc2 (a protein kinase that drives egg nuclei into mitosis). In the experimental preparation, a fixed amount of cyclin is added to an extract containing an excess of Cdc2 sub- units. If the amount of cyclin added is below a threshold, MPF activity never appears. Above the threshold, MPF is activated but only after a characteristic time lag. The time lag decreases abruptly as total-cyclin-added increases above the threshold. The goal is to fit this data with a reasonable model of the underlying biochemistry, which keeps track of cyclin monomers, Cdc2 monomers, and the phosphorylation state of cyclin/Cdc2 dimers. ODRPACK, based on the orthogonal distance be- tween experimental data and the model, is used for the nonlinear regression to estimate the unknown rate con- stants (ODE parameters). The ability of this algorithm to arbitrarily weight data values, and to treat both the abscissa and ordinates as uncertain, is crucial, given the sparsity and uncertainty of available biological data. Constructing the model’s predictions of experimental data requires simulating MPF activity as a function of time after addition of cyclin. These simulations yield the cyclin threshold for MPF activation, and the time lag (the time necessary for MPF activity to reach one- half of its asymptotic value, for supra-threshold amounts of cyclin added to the extracts). The complete calculation is expensive, because the ODE’s are stiff, and must be solved numerous times for the nonlinear regression. Also, because of local min- ima, the nonlinear regression must be done from many starting points to adequately explore the parameter space. There are potential sources for parallelism in the ODE solution itself, the estimation of partial derivatives of the ODE solution, and multiple starting points for regression. Numerical results are presented for a four- component, ten-parameter model. The model described in this paper is a realistic model of the biochemical kinetics of MPF activity in frog egg extracts. However, the model ignores a number of other proteins that affect the cell cycle. To study more complete models of cell cycle control, more components must be added to the model, and other measurable phenomena incorporated in the cost function. As the modeling fidelity is increased, the mathematical and computational complexities of the problem grow rapidly. Efficient and accurate tools for parameter estimation will be needed to build computational models of the complex control networks operating within cells, which is one of the main goals of bioinformatics in the postgenomic era. Section 2 outlines the biological model and provides the experimental data for said model. An overview of the code along with descriptions of the tools (ODRPACK and LSODAR) used by the code can be found in Sec- tion 3. Section 4 contains a more detailed pseudocode for the algorithm. The results of the parameter estimation are in Section 5. The conclusion and future work are in Section 6.