Future Medicinal Chemistry Editorial part of Does ‘Big Data’ exist in medicinal chemistry, and if so, how can it be harnessed? Igor V Tetko* ,1,2 , Ola Engkvist 3 & Hongming Chen 3 1 Helmholtz Zentrum München-German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764 Neuherberg, Germany 2 BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764 Neuherberg, Germany 3 Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden *Author for correspondence: Tel.: +49 89 3187 3575 Fax: +49 89 3187 3585 itetko@vcclab.org 1801 Future Med. Chem. (2016) 8(15), 1801–1806 ISSN 1756-8919 10.4155/fmc-2016-0163 © Igor V Tetko, Ola Engkvist & Hongming Chen First draft submitted: 1 August 2016; Accepted for publication: 12 August 2016; Published online: 15 September 2016 Keywords:฀applicability฀domain฀•฀Big฀Data฀•฀chemoinformatics฀•฀education฀in฀chemistry฀and฀ informatics฀•฀local฀and฀global฀models฀•฀multitask฀learning฀•฀neural฀networks฀•฀virtual฀chemical฀ spaces The term ‘Big Data’ has gained increasing pop- ularity within the chemistry field and across science broadly in recent years [1] . Chemical databases have seen a dramatic growth over the past decade, with, for example, ChEMBL, REAXYS and PubChem providing hundreds of millions of experimental facts for tens of millions of compounds [1] . Moreover, even larger datasets of experimental measurements are held within in-house data collections at pharma companies [2] . Overall, the total number of entries across these databases is in the range of a billion, 10 9 ; however, although this number may seem impressive, it pales into comparison relative to other fields [3] , where the amount of data is frequently mea- sured in exabytes, 10 18 . Thus, does Big Data really exist within the chemistry field? What are such data within medicinal chemistry specifically and where do the challenges lie in analysis of these data? Big Data refer to data out of the scale of traditional applications, which require efforts beyond the traditional analysis [1] . In this article, we will be discuss- ing how it applies to medicinal chemistry, as well as providing an overview of some of the most important trends in the medicinal chemistry–Big Data field. Does Big Data exist in medicinal chemistry? A dataset could be classified as ‘big’ if techni- cal resources (speed, memory) are not capable of analyzing the data, using existing meth- ods. Big Data in a field like analysis of par- ticle collision at CERN [3] is driven by physi- cal challenges (hardware, computer speed and physical computer memory required to store and analyze such data), which may be addressed by the development of new and more advanced software. Medicinal chemistry related data are cre- ated and curated in pharmaceutical industry via high-throughput screening (HTS) and drug discovery campaigns and additionally also available in databases sourced from scien- tific journals, patents etc. For example, Astra- Zeneca in-house screening database contains over 150 million structure–activity relation- ship (SAR) data point [2] . The HTS data from pharma companies are usually very sparse and for each screened target there is only a small number of active hits. Further developments are done with a relatively small series of com- pounds, usually varying from hundreds to thousands of compounds for those series. Spe- cialists who work on these target specific data do not have Big Data in their daily work; tra- ditional modeling algorithm is well enough to handle their datasets. When the focus is on chemogenomics data, the situation is different. The big- gest medicinal chemistry data reservoir, PubChem, currently comprise 91 million chemical structures and 230 million bio- activity data points corresponding to over ...further progress will critically depend on training programs and advances in chemoinformatics, a discipline bridging chemistry and informatics. SPECIAL FOCUS y Computational chemistry & computer-aided drug discovery – Part II For reprint orders, please contact reprints@future-science.com