Screen of MicroRNA Targets in Zebrafish Using Heterogeneous Data Sources: A Case Study for Dre-miR-10 and Dre-miR-196 Yanju Zhang 1 , Joost M. Woltering 2 , Fons J. Verbeek 1 Abstract—It has been established that microRNAs (miRNAs) play an important role in gene expression by post-transcriptional regulation of messengerRNAs (mRNAs). However, the precise relationships between microRNAs and their target genes in sense of numbers, types and biological relevance remain largely unclear. Dissecting the miRNA-target relationships will render more insights for miRNA targets identification and validation therefore promote the under- standing of miRNA function. In miRBase, miRanda is the key algorithm used for target prediction for Zebrafish. This algorithm is high-throughput but brings lots of false positives (noise). Since validation of a large scale of targets through laboratory experiments is very time consuming, several computational methods for miRNA targets validation should be developed. In this paper, we present an integrative method to investigate several aspects of the relationships between miRNAs and their targets with the final purpose of extracting high confident targets from miRanda predicted targets pool. This is achieved by using the techniques ranging from statistical tests to clustering and association rules. Our research focuses on Zebrafish. It was found that validated targets do not necessarily associate with the highest sequence matching. Besides, for some miRNA families, the frequency of their predicted targets is significantly higher in the genomic region nearby their own physical location. Finally, in a case study of dre-miR-10 and dre-miR-196, it was found that the predicted target genes hoxd13a, hoxd11a, hoxd10a and hoxc4a of dre-miR- 10 while hoxa9a, hoxc8a and hoxa13a of dre-miR-196 have similar characteristics as validated target genes and therefore represent high confidence target candidates. Keywords—MicroRNA targets validation, microRNA-target rela- tionships, dre-miR-10, dre-miR-196. I. I NTRODUCTION T HE microRNA (miRNA) field started with the discovery of lin-4 in 1993 [1] which was initially considered as an isolated case but later miRNAs have been found to be widely present in multicellular organisms, ranging from plants to human. MicroRNAs (miRNAs) are 22 nucleotide single-stranded noncoding RNA molecules that repress mes- sengerRNA (mRNA) translation or mediate mRNA degrada- tion through sequence-specific base pairing [2], [3]. Several miRNAs have been found to play an important role in life and development. To name a few: miRNAs lin-4 and let-7 regulate developmental timing in C. elegans [1], [4]; bantam and miR-14 are involved in the gene regulation of apoptosis in Drosophila [5]; miR-181 modulates hematopoietic lineage differentiation in mice [6]; miR-32 regulates primate foamy virus type 1 (PFV-1) proliferation in human [7]. 1 Leiden Institute of Advanced Computer Science, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands. Email:{yanju, fverbeek}@liacs.nl 2 Institute of Biology Leiden, Wassenaarseweg 64, 2333 AL Leiden, The Netherlands. MiRNAs function by binding to target sites in mRNAs and thereby preventing their translation or promoting their decay. In order to better understand the biological function of miRNAs, it is of fundamental importance to identify miRNA targets. Identifying miRNA targets in animals is not as straightforward as in plants. Computational approaches have been successful in plants, where known target sites tend to be almost perfectly complementary to miRNAs [8], [9]. Whereas in animals, miRNA-target binding is loosely complementary [10]. The inexact sequence match property has complicated computational approaches for target site identification signifi- cantly. Several computational high-throughput methods to predict miRNA targets have been described [3], [11], [12], [13]. The miRanda algorithm is one of the frequently used methods. For each miRNA, target genes are selected on the basis of three properties: sequence complementarity using a position- weighted local alignment algorithm, free energy of RNA-RNA duplexes, and conservation of target sites in related genomes [3], [14]. This computational method introduces one crucial problem, i.e., too much noise. Most likely, not all of the predicted targets for a miRNA represent true biological targets and only a few of these have been confirmed either positive or negative. For example, regarding lin-4 in C. elegans, 554 targets are predicted and to date only 2 are confirmed through laboratory experiments. Therefore, nowadays the challenge is to find an effective way to filter out false positive predicted targets. Accurate target prediction and validation are still major obstacles in miRNA research. Recently, as opposed to other computational methods like miRanda, a few bottom-up approaches for high-throughput miRNA targets validation have been reported. Zhou et al sug- gest that targets identified by multiple prediction algorithms would appear to be the better candidates for verification [15]. Stark et al describe an algorithm to screen targets according to sequence and free energy features shared by validated targets [16]. Unlike the above described methods, we explore a bottom- up approach which focuses on selecting targets based on genomic location and physical association on the genome. An integrative method is presented to analyze the relation- ships between miRNAs and targets in order to extract high confident miRNA targets. The method consists of three layers: data retrieval, data analysis and data visualization. A panel of algorithms such as clustering and association rules are applied International Journal of Mathematical, Physical and Engineering Sciences Volume 2 Number 1 10