Human microRNA target identification by RRSM Wan J. Hsieh, Hsiuying Wang n Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan article info Article history: Received 26 November 2010 Received in revised form 26 May 2011 Accepted 17 June 2011 Available online 29 June 2011 Keywords: Microarray expression miRNA Relative R 2 method Regression model Correlation abstract MicroRNAs (miRNAs) are small endogenously expressed non-coding RNAs that regulate target messenger RNAs in various biological processes. In recent years, there have been many studies concentrated on the discovery of new miRNAs and identification of their mRNA targets. Although researchers have identified many miRNAs, few miRNA targets have been identified by actual experimental methods. To expedite the identification of miRNA targets for experimental verification, in the literature approaches based on the sequence or microarray expression analysis have been established to discover the potential miRNA targets. In this study, we focus on the human miRNA target prediction and propose a generalized relative R 2 method (RRSM) to find many high-confidence targets. Many targets have been confirmed from previous studies. The targets for several miRNAs discovered by the HITS-CLIP method in a recent study have also been selected by our study. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction MicroRNAs (miRNAs) are endogenous and single-stranded 23 nt RNAs that play crucial gene regulatory roles in animals and plants by pairing to the 3 0 untranslated regions (UTRs) of the target messenger RNAs (mRNAs) of protein -coding genes to direct their post-transcrip- tional repression (Carrington and Ambros, 2003; Bartel, 2004; Mattick and Makunin, 2006). Extensive research has revealed the existence of more than 700 different human miRNAs (Griffiths-Jones et al., 2008). Griffiths-Jones et al. (2008)and several studies have demonstrated the importance of miRNA-mediated regulation in a wide range of basic biological processes, such as proliferation, apoptosis, cellular identity and pathogen–host interactions (Pillai et al., 2007; Carthew and Sontheimer, 2009). The discovery of many miRNAs in various multi-cellular species has raised many questions, such as how these small non-coding RNAs function in cells. The key to answering this particular question is to explore their regulatory targets. The most general feature of miRNA regulation is the recognition of sequence motifs complementary to the 3 0 UTR of target mRNAs (Lewis et al., 2003; Grimson et al., 2007). Several target prediction computational algorithms for motifs complementary predictions have been developed, for example, miRanda (John et al., 2004), TargetScan (Lewis et al., 2003; Lewis et al., 2005) and PicTar (Krek et al., 2005), but they show poor overlap between their predicted results, which might be caused by a number of false-negative and probably also false positive predictions (Bartel, 2009). In addition to sequence motifs complementary predictions, gene expression profiling can also provide useful information for studying the biological functions of miRNAs. Therefore expression data analysis has been used as a complementary method for discovering miRNA targets (Lim et al., 2005). However, it can become computationally complicated when considering multiple miRNAs and their effects across multiple tissues. To overcome this difficulty, Huang et al. (2007b) and Wang and Li (2009b) proposed statistical methods to build up a network of associations between the miRNAs and their target mRNAs. Huang et al. (2007b) established a method, GenMiR þþ , using Bayesian variation analysis to explore miRNA targets. However, it is complicated and requires extensive calculations. In order to provide a more effective approach, Wang and Li (2009b) proposed the relative R 2 method to select high-confidence targets of miRNAs from prediction targets, which is easy to interpret and less compu- tationally expansive. This method successfully obtained many high- confidence targets for mouse miRNA in Wang and Li (2009b). In this study, we generalize the relative R 2 method to a more flexible form and called it as RRSM. We also establish program codes for performing RRSM for different original data and normalized data. RRSM has several virtues for discovering high-confidence targets. Although the paired correlation analysis between miRNA and their targets has been discussed (Ritchie et al., 2009; Wang and Li, 2009a; Liu et al., 2010), observing several confirmed targets in the literature indicates that for many miRNAs, the correlation coefficient of the microarray expression of a miRNA and that of its confirmed target is nearly zero. The discussion and comparison of RRSM and the existing correlation analysis methods (Ritchie et al., 2009; Wang and Li, 2009a; Liu et al., 2010; Wang et al., in press) are given in Section 3. When the correlation coefficient is not high, it is hard to use any standard statistical approaches to explore miRNA targets because Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/yjtbi Journal of Theoretical Biology 0022-5193/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.jtbi.2011.06.022 n Corresponding author. Tel.: þ886 3 5712121x56813; fax: þ886 3 5728745. E-mail address: wang@stat.nctu.edu.tw (H. Wang). Journal of Theoretical Biology 286 (2011) 79–84