ERK'2014, Portorož, B:146-149 146 Integration of bioactive substances data for preclinical testing with Cheminformatics and Bioinformatics resources Branko Arsic 1 , Marija Djokic 1 , Vladimir Cvjetkovic 1 , Petar Spalevic 2 , Marko Zivanovic 3 , Milan Mladenovic 4 1 Faculty of Science, University of Kragujevac, Department of Mathematics and Informatics 2 Faculty of Technical Sciences, University of Kosovska Mitrovica 3 Faculty of Science, University of Kragujevac, Department of Biology and Ecology 4 Faculty of Science, University of Kragujevac, Department of Chemistry Email: brankoarsic@kg.ac.rs, m.djokic@kg.ac.rs, vladimir@kg.ac.rs, petar.spalevic@pr.ac.rs, zivanovicm@kg.ac.rs, mmladenovic@kg.ac.rs Abstract Finding and comparing information published by institutions with similar goals can be a real challenge due to the fact that it is often necessary to interpret large amounts of data with different nomenclature and structure presentation having equivalent meaning. Using of Semantic Web technologies for publishing data accessible as Linked Open Data (LOD) encourages the integration of these datasets. In this paper, we aim to integrate and extend earlier developed ontology based information system with different datasets from PubChem, ChEMBL, DrugBank, ChemProt, etc. The information system supports the Research Center for Preclinical Testing (RC) which performs monitoring of in vitro effects of active substances on cell lines of different origin, primarily cancer cell lines and primary cells isolated from different tissues. In this way the researchers can be better focused on tested drugs with small IC 50 factor for planning new experiments, saving the resources and time. Available data and Semantic Web technologies significantly improve synthesizing new substances, QSAR (Quantitative/Qualitative Structure Activity Relationships) analyses, the development of advanced algorithms for searching and establish future cancer bank for personalized medicine. 1 Introduction The subject of various analysis that are carried out at the RC [1] includes monitoring of in vitro effects of active substances on cell lines of different origin, primarily cancer cell lines and primary cells isolated from different tissues. Experiments include cytotoxic active substances in human cancer cell lines, while monitoring includes the type of cell death, the mechanisms of apoptosis, migration and angiogenesis and prooxidant- antioxidant mechanisms which are important for regulation of these processes. Experiments are based on protocols such as MTT cytotoxicity test, AO/EtBr staining of cells for examination of the type of cell death, Western blot technique for examining proteins, Multiplex PCR, etc. Complete testing procedures consist of specific and complex relationships among various terms and concepts from the RC work area. As we predicted, the structure of experiment is expected to be further expanded as a consequence of complex research tasks that require flexible modeling and representation that can be easily updated [2]. Nowadays, biomedical researchers frequently need to use datasets derived from other systems. Data integration becomes an important precondition for successful performance of biomedical research. Over the past decade in the field of cheminformatics and bioinformatics research, the huge accumulation of various data (compounds, target, cell line, experiment, etc.) has generated a significant amount of knowledge. Similar institutions with the same purpose and goals are independent in their work, so their associated information models, protocols, cancer cell-lines, compounds, areas of interest and naming systems are different. This reflects to heterogeneity of data models, integration and retrieval data methods. Semantic Web [3] technologies have mechanisms to solve these shortcomings. Ontology [4] as a main component of Semantic Web gives us a mechanism to connect similar data sources, published within different structures and for specific needs, providing the semantic context by adding semantic information to models. The capability to integrate heterogeneous datasets using a common terminology defined in ontology and search heterogeneous datasets in a single SPARQL [5] query represent a powerful tool for solving these problems. In this paper we extended existing ontologies in RC and integrated it with other datasets from PubChem [6], ChEMBL [7], ChemProt [8] and DrugBank [9]. For example, this integration gives us a complete set of data for compounds used in an experiment: related compounds, chemical and physical properties, identifiers, IC 50 value for targets in cancer cell line, and many more. Similarly, the same holds for cancer cell lines, the targets in cell-lines and applied protocols. Now, the researchers can find tested drugs with small IC 50 factor for planning experiments with different conditions and protocols. The chemists can synthesize