Use of CTI Index for Perception of Duplicated Chemical Structures in Large Chemical Databases Emil Petrov 1 , Borislav Stoyanov 1 , Nikolay Kochev 2 , Ivan Bangov 3* 1 Department of Computer Informatics Faculty of Mathematics and Informatics, Konstantin Preslavski University of Shumen, 115 Universitetska Str., Shumen, Bulgaria, epetrov1990@gmail.com, borislav.stoyanov@shu-bg.net 2 Department of Analytical Chemistry and Computer Chemistry, Faculty of Chemistry, University of Plovdiv, 24 Tsar Asen Str., Plovdiv, Bulgaria, nick@uni-plovdiv.net 3 Faculty of Natural Sciences, Konstantin Preslavski University of Shumen, 115 Universitetska Str., Shumen, Bulgaria, ivan.bangov@gmail.com (Received June 21, 2013) Abstract: The employment of Charge-related Topological Index (CTI) devised by one of the authors (IB) for perception of duplicated structures in large structure collections has been studied. It is shown on a structural database of 249 000 chemical structures that the CTI values with precision more than 7 digits after the decimal point can produce safe discrimination between equivalent (isomorphic) and non-equivalent structures. Also the tests show that the CTI index does not give degenerate values for all alkane isomers of 17 carbon atoms. Introduction Duplicated structures frequently emerge in large chemical databases. Mathematically they are represented by isomorphic molecular graphs. Their perception and recognition by computers is a serious problem. The task of structure identification is particularly important in the context of modern chemical databases where multiple information sources (both free and commercial) are used and merged in order to provide large chemical collections. There are several approaches to the solution of this problem: use of hash codes, one-to-one comparison by pair wise mapping of the chemical structures, creation of a unique linear notation form or * Corresponding author MATCH Communications in Mathematical and in Computer Chemistry MATCH Commun. Math. Comput. Chem. 71 (2014) 645-656 ISSN 0340 - 6253