Computers and Chemical Engineering 28 (2004) 425–434 Pharmaceutical product design using combinatorial optimization S. Siddhaye a , K. Camarda a, , M. Southard a , E. Topp b a Department of Chemical and Petroleum Engineering, University of Kansas, Lawrence, KS 66045, USA b Department of Pharmaceutical Chemistry, University of Kansas, Lawrence, KS 66045, USA Received 17 June 2002; received in revised form 11 August 2003; accepted 11 August 2003 Abstract A two-step computational method for designing new molecules in medicinal chemistry is described. In the first step, topological indices are used to develop structure-based correlations for properties of interest. Zeroth and first order connectivity indices are employed to develop linear correlations for three physical properties of interest in pharmaceutical chemistry: octanol–water partition coefficient (OWPC), melting point and water solubility. These correlations are then used within an optimization framework to design molecules having the desired properties. This step involves formulating a mixed integer linear program (MILP) which includes the property correlations, structural constraints which ensure that a stable, connected molecule is formed, and an objective function which minimizes the deviation from a set of property targets. A new data structure, known as a partitioned adjacency matrix, is employed to allow the connectivity index definitions to be written linearly, such that they can be included in an MILP and solved using a standard branch-and-bound method. The connectivity of the molecule is ensured by the inclusion of network flow constraints within the formulation. Three examples show the efficacy of this approach. © 2003 Elsevier Ltd. All rights reserved. Keywords: Molecular design; Optimization 1. Introduction The current experimental trial and error approach for the discovery of new drug molecules starts with the identifica- tion of a large number of potential candidate molecules using heuristic rules which generate variants on an initial structure, often from a natural product. These candidate molecules are then synthesized and tested to determine if they possess the desired biological effect. However, the majority of these lead compounds are typically ineffective, often due to the fact that physical properties such as solubility are not in the required range. Since this synthesis and testing approach is clearly time-consuming and expensive, a computational method for the discovery and screening of pharmaceutical compounds is desirable. The challenges involved in the development of such a method are significant: physical and biochem- ical properties must be estimated from only a molecular structure, and a large combinatorial optimization problem must be solved in order to find the best molecular structure for a given pharmaceutical application. This work applies computer-aided molecular design (CAMD) techniques to Corresponding author. E-mail address: camarda@ku.edu (K. Camarda). the pharmaceutical design problem. While previous re- searchers have employed a rule-based approach along with computational property predictions (Kier & Hall, 1976) or a brute-force analysis of a tremendous number of alternatives (Hairston, 1998), this research applies a new formulation to solve the drug design problem as a mixed integer linear pro- gram (MILP), which greatly improves the efficiency of the method while still finding globally optimal solutions. The property estimations are achieved using four topological in- dices, which are numerical values that accurately describe a molecular structure and can thus serve as descriptors for cor- relations. Structural feasibility and connectivity constraints are added to the formulation to ensure that a stable, con- nected molecule is formed. To solve the optimization prob- lem formulated to find a novel molecular structure, a new data structure known as a partitioned adjacency matrix has been used to convert the problem into a mixed integer linear program, which can then be solved by standard techniques. CAMD methods have been used by many resear- chers for the design of a wide variety of molecules. Venkatasubramanian, Chan, and Caruthers (1994) used a group contribution method to predict properties, and em- ployed a genetic algorithm to solve the mixed integer nonlinear programs (MINLP) which resulted. Maranas 0098-1354/$ – see front matter © 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.compchemeng.2003.08.011