c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 1 2 ( 2 0 1 3 ) 640–648 j o ur na l ho me pag e: www.intl.elsevierhealth.com/journals/cmpb TMT-HCC: A tool for text mining the biomedical literature for hepatocellular carcinoma (HCC) biomarkers identification Rania A. Abul Seoud a , Mai S. Mabrouk b,* a Faculty of Engineering, Department of Electrical Engineering, Communication and Electronics Section, El Fayoum University, Fayoum 63514, Egypt b Faculty of Engineering, Department of Biomedical Engineering, Misr University for Science and Technology (MUST University), Al Motamyez Distinct Al, Al Mehwar Road, 00202, Egypt a r t i c l e i n f o Article history: Received 1 June 2013 Received in revised form 4 July 2013 Accepted 22 July 2013 Keywords: Text mining Hepatocellular carcinoma (HCC) Copy number variation Biomedical literature Biomarkers a b s t r a c t Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related mortality worldwide. New insights into the pathogenesis of this lethal disease are urgently needed. Chromosomal copy number alterations (CNAs) can lead to activation of oncogenes and inac- tivation of tumor suppressors in human cancers. Thus, identification of cancer-specific CNAs will not only provide new insight into understanding the molecular basis of tumor genesis but also facilitate the identification of HCC biomarkers using CNA. This paper presents the TMT-HCC system; it is a tool for text mining the biomedical lit- erature for hepatocellular carcinoma (HCC) biomarkers identification. TMT-HCC provides researchers with a powerful way to identify and discern molecular biomarkers of HCC to inform diagnosis, prognosis, and treatment driver genes with causal roles in carcinogenesis is to detect genomic regions that under frequent alterations in cancers (CNAs). TMT-HCC also extracts protein–protein interactions from the full text of the scientific papers. The results provided that the integration of genomic and transcriptional data offers powerful potential for identifying novel cancer genes in HCC pathogenesis. © 2013 Elsevier Ireland Ltd. All rights reserved. 1. Introduction Hepatocellular carcinoma (HCC) is the fifth most common cancer worldwide and the third most common cause of cancer-related death, with an overall 5-year survival rate of <5% [1]. Long-term survival of HCC patients is poor, partly due to HCC recurrence, which up to 80% of the patients experience even after curative resection [2]. In Egypt, HCC is one of the most prevalent cancer types. It is the second most common Corresponding author. Tel.: +20 1001662403. E-mail addresses: r-abulseoud@k-space.org (R.A.A. Seoud), msm eng@yahoo.com (M.S. Mabrouk). malignancy in males and the fifth in females this results in that liver cancer is most causes death in Egypt than other types of cancer [3]. Chronic hepatitis and liver cirrhosis have been recognized as important risk factors for the development of hepatocellu- lar carcinoma (HCC). Prognosis and survival of HCC are still poor, mainly because of diagnosis at a late stage and/or recur- rence of the disease [4,5]. The outcome of HCC patients still remains dismal due to the difficulty in detecting the disease at its early stage; 0169-2607/$ see front matter © 2013 Elsevier Ireland Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cmpb.2013.07.014