Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses Maryam Esmaeili, Hassan Mohabatkar à , Sasan Mohsenzadeh Department of Biology, College of sciences, Shiraz University, Shiraz 71454, Iran article info Article history: Received 19 July 2009 Received in revised form 18 November 2009 Accepted 20 November 2009 Available online 2 December 2009 Keywords: E6 protein Bioinformatics Low risk High risk abstract High-risk types of human papillomaviruses (HPVs) are the etiological agents in nearly all cases (99.7%) of cervical cancer, and the HPV E6 protein is one of the two viral oncoproteins which is expressed in virtually all HPV-positive cancers. Therefore, classifying the risk type of HPVs is very useful and necessary for diagnosis and remedy of cervical cancer. To predict and to classify the risk types of HPV by bioinformatics analysis, 96 E6 protein sequences from available databases were obtained. To investigate the risk type of these sequences, PseAAC server, ROC curves and statistical analysis were applied. Our classification was based on some characters of HPV E6 proteins, such as hydrophobicity, hydrophilicity, side chain mass, PK of the a-COOH group, PK of the a-NH3 + group and PI at 25 1C. Risk type of 4 unknown HPV types and 25 non-reported HPV types were also predicted. These results show that bioinformatics based theoretical approaches can direct and simplify experimental studies. & 2009 Elsevier Ltd. All rights reserved. 1. Introduction Human papillomaviruses (HPVs) are a group of small non- enveloped DNA tumor viruses with a virion size of 55 nm in diameter (Zheng and Baker, 2006) that infect cutaneous or mucosal epithelial cells, causing papillomas or warts on skin, genital tissues and the upper promyelocytic leukemia respiratory tract (Beaudenon and Huibregtse, 2008). Over 100 different genotypes of HPVs have been identified, of which 40 infect the genital mucosa (Yugawa and Kiyono, 2009). HPV types are classified as either high- or low-risk, depending on whether they are associated with malignant or benign lesions, respectively (Zur Hausen, 1991; Gussione et al., 2002). Epidemiologic studies have shown that the association of genital human papillomavirus (HPV) with cervical cancer is strong, independent of other risk factors, and that this is consistent in several countries (Kim et al., 2009). High-risk HPVs, such as HPV16, 18 and 31, are associated with more than 90% of cervical cancers (Narisawa-Saito and Kiyono, 2007; Walboomers et al., 1999; Yugawa and Kiyono, 2009), which is the second leading cause of cancer death among women worldwide (Roux and Moroianu, 2002; Zur Hausen, 2000). At present, approximately 500,000 new cases of cervical cancer are diagnosed per year worldwide, with mortality of approxi- mately one-third of these cases (Yugawa and Kiyono, 2009). High- risk HPV types are also associated with 25% of head and neck carcinomas (of the mouth, tonsils, esophagus and larynx) (Gillison et al., 2000). All HPVs contain a double-stranded circular DNA genome about 8 kb in size that can be divided into three major regions: early genes (E1–E7), late genes (L1 and L2) and a long control region (LCR or noncoding region (NCR)) (Motoyama et al., 2004; Zheng and Baker, 2006). In short, three oncogenes E5, E6 and E7 modulate the transformation process, two regulatory proteins E1 and E2 modulate transcription and replication and two structural proteins L1 and L2 compose the viral capsid (Munger and Howley, 2002; Villiers et al., 2004). The HPV E6 and E7 proteins are the only two viral genes expressed in virtually all HPV-positive cervical carcinomas, and many lines of experiments have shown that these are cooperative viral oncoproteins (Munger et al., 2004). The activity of E6 and E7, which are most clearly linked to carcinogenesis, is their abilities to inactivate P53 and retinoblastoma (pRb) tumor suppressors, respectively (Beau- denon and Huibregtse, 2008). The necessary, bottleneck and rate-limiting step to cancer is the overexpression of E6 and E7, which is usually achieved by accidental integration of a viral genome into a host chromosome and caused polyploidy through deregulation of Plk1 by the loss of P53 through E6 and pRb family members by E7 (Incassati et al., 2006; Narisawa-Saito and Kiyono, 2007; Yugawa and Kiyono, 2009). Acute loss of pRb family members by E7 has also shown to induce centrosome amplification and aneuploidy (Iovino et al., 2006). Also the structural and functional characteristics of HPV E6 ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/yjtbi Journal of Theoretical Biology 0022-5193/$ - see front matter & 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2009.11.016 à Corresponding author. Tel.: + 98 711 6137426; fax: + 98 711 2280926. E-mail addresses: mohabat@shirazu.ac.ir, h_mohabatkar@yahoo.com (H. Mohabatkar). Journal of Theoretical Biology 263 (2010) 203–209