funMotifs: Tissue-specific transcription factor motifs Husen M. Umer 1,2 , Karolina Smolinska-Garbulowska 1 , Nour-al-dain Marzouka 3 , Zeeshan Khaliq 1 , Claes Wadelius 4 , Jan Komorowski* 1,5 1 Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 2 Science for Life Laboratory, Department of Oncology-pathology, Karolinska Institutet, Sweden; 3 Department of Clinical Sciences, Faculty of Medicine, Lund University; 4 Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden; 5 Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland *To whom correspondence should be addressed. Tel: +46 (0) 18 471 66 92; Email: jan.komorowski@icm.uu.se ABSTRACT Transcription factors (TF) regulate gene expression by binding to specific sequences known as motifs. A bottleneck in our knowledge of gene regulation is the lack of functional characterization of TF motifs, which is mainly due to the large number of predicted TF motifs, and tissue specificity of TF binding. We built a framework to identify tissue-specific functional motifs (funMotifs) across the genome based on thousands of annotation tracks obtained from large-scale genomics projects including ENCODE, RoadMap Epigenomics and FANTOM. The annotations were weighted using a logistic regression model trained on regulatory elements obtained from massively parallel reporter assays. Overall, genome-wide predicted motifs of 519 TFs were characterized across fifteen tissue types. funMotifs summarizes the weighted annotations into a functional activity score for each of the predicted motifs. funMotifs enabled us to measure tissue specificity of different TFs and to identify candidate functional variants in TF motifs from the 1000 genomes project, the GTEx project, the GWAS catalogue, and in 2,515 cancer samples from the Pan-cancer analysis of whole genome sequences (PCAWG) cohort. To enable researchers annotate genomic variants or regions of interest, we have implemented a command-line pipeline and a web-based interface that can publicly be accessed on: http://bioinf.icm.uu.se/funmotifs. Keywords: transcription factor, motifs, regulatory elements, noncoding variants, tissue specificity, annotation database. . CC-BY-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted June 27, 2019. . https://doi.org/10.1101/683722 doi: bioRxiv preprint