Please cite this article in press as: Palczewska, A., et al. Towards model governance in predictive toxicology. International Journal of Information
Management (2013), http://dx.doi.org/10.1016/j.ijinfomgt.2013.02.005
ARTICLE IN PRESS
G Model
JJIM-1227; No. of Pages 17
International Journal of Information Management xxx (2013) xxx–xxx
Contents lists available at SciVerse ScienceDirect
International Journal of Information Management
j our nal ho me p age: www.elsevier.com/locate/ijinfomgt
Towards model governance in predictive toxicology
Anna Palczewska
a
, Xin Fu
b,∗
, Paul Trundle
a
, Longzhi Yang
a
, Daniel Neagu
a
, Mick Ridley
a
, Kim Travis
c
a
School of Computing, Informatics and Media, University of Bradford, Richmond Road, BD7 1DP Bradford, UK
b
Department of Management Science, School of Management, Xiamen University, Xiamen 361005, China
c
Syngenta Jealott’s Hill International Research Centre, Bracknell RG42 6EY, UK
a r t i c l e i n f o
Article history:
Available online xxx
Keywords:
Model governance
Data governance
Predictive toxicology
Information representation
Knowledge management
Quality assessment
a b s t r a c t
Efficient management of toxicity information as an enterprise asset is increasingly important for the
chemical, pharmaceutical, cosmetics and food industries. Many organisations focus on better information
organisation and reuse, in an attempt to reduce the costs of testing and manufacturing in the product
development phase. Toxicity information is extracted not only from toxicity data but also from predictive
models. Accurate and appropriately shared models can bring a number of benefits if we are able to make
effective use of existing expertise. Although usage of existing models may provide high-impact insights
into the relationships between chemical attributes and specific toxicological effects, they can also be a
source of risk for incorrect decisions. Thus, there is a need to provide a framework for efficient model
management. To address this gap, this paper introduces a concept of model governance, that is based
upon data governance principles. We extend the data governance processes by adding procedures that
allow the evaluation of model use and governance for enterprise purposes. The core aspect of model
governance is model representation. We propose six rules that form the basis of a model representation
schema, called Minimum Information About a QSAR Model Representation (MIAQMR). As a proof-of-
concept of our model governance framework we develop a web application called Model and Data Farm
(MADFARM), in which models are described by the MIAQMR-ML markup language.
© 2013 Elsevier Ltd. All rights reserved.
1. Introduction
Efficient access to integrated platforms for toxicological mod-
elling is increasingly important for the chemical, pharmaceutical,
cosmetics and food industries. It supports the decision mak-
ing process for product discovery and development, e.g. drugs,
pesticides, cosmetics and food protection. The whole process
of product development may last for approximately ten years
and is divided into four phases: discovery, profile, evaluation
and support. In the first phase, from millions of chemical com-
pounds, thousands are selected according to their biological,
chemical or physical properties. This chemical compounds group
is profiled against various targets (e.g. biochemical and phys-
iological targets related to metabolism, growth, development,
nervous communication) and tens of them pass to the evalua-
tion phase. After the evaluation phase usually only very limited
number of chemicals are selected as a product that can be
∗
Corresponding author. Tel.: +86 5922181207.
E-mail addresses: a.m.wojak@bradford.ac.uk (A. Palczewska), xfu@xmu.edu.cn
(X. Fu), p.r.trundle@bradford.ac.uk (P. Trundle), L.Yang8@bradford.ac.uk
(L. Yang), D.Neagu@bradford.ac.uk (D. Neagu), M.J.Ridley@bradford.ac.uk
(M. Ridley), kim.travis@syngenta.com (K. Travis).
introduced into the market. Thus, many organisations focus on
better information organisation and reuse in order to reduce the
cost of testing and manufacturing in the product development
phase.
Over several years, many different types of computational meth-
ods, such as structure–activity relationship (SAR); quantitative
structure activity relationship (QSAR); kinetic methods and expert
systems have been developed to identify or predict toxic effects on
human beings, animals and the environment. A large number and
variety of models could be, and are still, created thanks to the con-
tinuously increasing amount of available experimental data that
covers various domains of chemical space. Currently, good quality
models are considered to be a cost efficient alternative to in vivo and
in vitro testing. In order to ensure the safety of humans, animals, and
the environment, they may never become a complete substitute for
in vivo experiments. However, these models can be used to reduce
the cost and negative impacts of animal testing. Thus, for domains
such as pharmacy, cosmetics or food production experimental tox-
icity data and toxicity models have become valuable information
assets. The collection of data and predictive models and their man-
agement is required to support the decision to exclude chemicals
that may fail in the profile and evaluation phases.
Having such a wealth of previously developed models at our
disposal can bring a number of benefits if we are able to make
0268-4012/$ – see front matter © 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.ijinfomgt.2013.02.005