Please cite this article in press as: Palczewska, A., et al. Towards model governance in predictive toxicology. International Journal of Information Management (2013), http://dx.doi.org/10.1016/j.ijinfomgt.2013.02.005 ARTICLE IN PRESS G Model JJIM-1227; No. of Pages 17 International Journal of Information Management xxx (2013) xxx–xxx Contents lists available at SciVerse ScienceDirect International Journal of Information Management j our nal ho me p age: www.elsevier.com/locate/ijinfomgt Towards model governance in predictive toxicology Anna Palczewska a , Xin Fu b,∗ , Paul Trundle a , Longzhi Yang a , Daniel Neagu a , Mick Ridley a , Kim Travis c a School of Computing, Informatics and Media, University of Bradford, Richmond Road, BD7 1DP Bradford, UK b Department of Management Science, School of Management, Xiamen University, Xiamen 361005, China c Syngenta Jealott’s Hill International Research Centre, Bracknell RG42 6EY, UK a r t i c l e i n f o Article history: Available online xxx Keywords: Model governance Data governance Predictive toxicology Information representation Knowledge management Quality assessment a b s t r a c t Efﬁcient management of toxicity information as an enterprise asset is increasingly important for the chemical, pharmaceutical, cosmetics and food industries. Many organisations focus on better information organisation and reuse, in an attempt to reduce the costs of testing and manufacturing in the product development phase. Toxicity information is extracted not only from toxicity data but also from predictive models. Accurate and appropriately shared models can bring a number of beneﬁts if we are able to make effective use of existing expertise. Although usage of existing models may provide high-impact insights into the relationships between chemical attributes and speciﬁc toxicological effects, they can also be a source of risk for incorrect decisions. Thus, there is a need to provide a framework for efﬁcient model management. To address this gap, this paper introduces a concept of model governance, that is based upon data governance principles. We extend the data governance processes by adding procedures that allow the evaluation of model use and governance for enterprise purposes. The core aspect of model governance is model representation. We propose six rules that form the basis of a model representation schema, called Minimum Information About a QSAR Model Representation (MIAQMR). As a proof-of- concept of our model governance framework we develop a web application called Model and Data Farm (MADFARM), in which models are described by the MIAQMR-ML markup language. © 2013 Elsevier Ltd. All rights reserved. 1. Introduction Efﬁcient access to integrated platforms for toxicological mod- elling is increasingly important for the chemical, pharmaceutical, cosmetics and food industries. It supports the decision mak- ing process for product discovery and development, e.g. drugs, pesticides, cosmetics and food protection. The whole process of product development may last for approximately ten years and is divided into four phases: discovery, proﬁle, evaluation and support. In the ﬁrst phase, from millions of chemical com- pounds, thousands are selected according to their biological, chemical or physical properties. This chemical compounds group is proﬁled against various targets (e.g. biochemical and phys- iological targets related to metabolism, growth, development, nervous communication) and tens of them pass to the evalua- tion phase. After the evaluation phase usually only very limited number of chemicals are selected as a product that can be ∗ Corresponding author. Tel.: +86 5922181207. E-mail addresses: a.m.wojak@bradford.ac.uk (A. Palczewska), xfu@xmu.edu.cn (X. Fu), p.r.trundle@bradford.ac.uk (P. Trundle), L.Yang8@bradford.ac.uk (L. Yang), D.Neagu@bradford.ac.uk (D. Neagu), M.J.Ridley@bradford.ac.uk (M. Ridley), kim.travis@syngenta.com (K. Travis). introduced into the market. Thus, many organisations focus on better information organisation and reuse in order to reduce the cost of testing and manufacturing in the product development phase. Over several years, many different types of computational meth- ods, such as structure–activity relationship (SAR); quantitative structure activity relationship (QSAR); kinetic methods and expert systems have been developed to identify or predict toxic effects on human beings, animals and the environment. A large number and variety of models could be, and are still, created thanks to the con- tinuously increasing amount of available experimental data that covers various domains of chemical space. Currently, good quality models are considered to be a cost efﬁcient alternative to in vivo and in vitro testing. In order to ensure the safety of humans, animals, and the environment, they may never become a complete substitute for in vivo experiments. However, these models can be used to reduce the cost and negative impacts of animal testing. Thus, for domains such as pharmacy, cosmetics or food production experimental tox- icity data and toxicity models have become valuable information assets. The collection of data and predictive models and their man- agement is required to support the decision to exclude chemicals that may fail in the proﬁle and evaluation phases. Having such a wealth of previously developed models at our disposal can bring a number of beneﬁts if we are able to make 0268-4012/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ijinfomgt.2013.02.005