Indonesian Journal of Electrical Engineering and Computer Science Vol. 18, No. 1, April 2020, pp. 343~350 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v18.i1.pp343-350 343 Journal homepage: http://ijeecs.iaescore.com Functional analysis of cancer gene subtype from co-clustering and classification Logenthiran Machap 1 , Afnizanfaizal Abdullah 2 , Zuraini Ali Shah 3 1,2 Synthetic Biology Research Group, Universiti Teknologi Malaysia, Malaysia 3 Artificial Intelligence and Bioinformatics Group, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Malaysia Article Info ABSTRACT Article history: Received Aug 27, 2019 Revised Sep 16, 2019 Accepted Oct 3, 2019 Cancer is a heterogeneity genetic disease with huge phenotypic alterations among dissimilar cancers types or even between same cancer types. Recent expansions of genome-wide profiling technologies offer a chance to explore molecular changes variations throughout advancement of cancer. Therefore, various statistical and machine learning algorithms have been designed and developed for the handling and interpretation of high-throughput microarray molecular data. Discovery of molecular subtypes studies have permitted the cancer to be allocated into similar groups that are deliberated to port similar molecular and clinical characteristics. Thus, the main objective of this research is to discover cancer gene subtypes and classify genes to obtain higher accuracy. In particular improved co-clustering algorithm used to discover cancer subtypes. And then supervised infinite feature selection gene selection method was combined with multi class SVM for classification of selected genes and further biological analysis. The analysis on breast cancer and glioblastoma multiforme evidences that top genes involved in cancer and the pathways present in both cancer top genes. The functional analysis is useful in medical and pharmaceutical field for cancer diagnosis and prognosis. Keywords: Biological analysis Cancer subtypes Classification Co-clustering Microarray Copyright © 2020 Institute of Advanced Engineering and Science. All rights reserved. Corresponding Author: Logenthiran Machap, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Skudai Johor Malaysia. Email: logmac_87@yahoo.com 1. INTRODUCTION Abnormalities of cancer genome can be observed through basic researches which have been used to categorize patients with respect to enhance their clinical decision making and implement more efficient treatments. Even though this types of categorization have enhanced the efficiency of treatment of various cancers, but the heterogeneity among the populations still remains as a main challenge. The advancement of DNA microarray technology has permitted an extensive understanding of genes especially in oncology field for start, diagnosis and prognosis of cancers. These various diagnostics are useful for different types of cancer, which lead to individual treatment plans and accurate clinical outcomes estimation [1, 2]. As the initial stage in organizing and investigating high-throughput gene expression datasets is through artificial intelligence in deep machine learning approach by grouping them together (cluster) according to similar biological features (gene) or conditions (samples) conferred on some similarity measures [3-5]. Meanwhile for both features and conditions are typically inadequate with prior knowledge, the clustering process is conducted as an unsupervised process via grouping features and conditions [6]. The conventional clustering is not said to be an ideal method for complicated and heterogeneous cancers. This is because, there are only certain genes in a subset of samples are expressed as a cancer genes in cellular processes among the similar clinical types of cancer in a specific tissue. Hence, it has been found a limitation that a single gene might play role in regulating and participating in numerous clusters and pathways of different conditions [7].