The GLOBE 3D Genome Platform A New Paper Tool for Systems Biological/Medical Data Integration Michael Lesnussa 1 , Frank N. Kepper 1,4 , Hubert B. Eussen 2 , Petros Kolovos 1 , Frank G. Grosveld 3 & Tobias A. Knoch 1,4 in cooperation with the virtual EpiGenSys laboratories of K. Rippe 4 , P. R. Cook 5 ,G. Längst 6 & G. Wedemann 7 1 Biophysical Genomics, 2 Dept. Clinical Genetics, 3 Dept. Cell Biology, Erasmus MC, Dr. Molewaterplein 50, NL- 3015 GE Rotterdam, The Netherlands 4 Genome Organization & Function, BioQuant Center / German Cancer Research Center, Im Neuenheimer Feld 267, D- 69120 Heidelberg, Germany 5 Sir William Dunn School of Pathology, University of Oxford, OX1 3RE Oxford, United Kingdom 6 Biochemistry III, University of Regensburg, Universitätsstr. 31, D-93053 Regensburg, Germany 7 System Engineering & Information Management, University of Applied Sciences Stralsund, Zur Schwedenschanze 15, D-18435 Stralsund, Germany http://www.erasmusmc.nl/ or TA.Knoch@taknoch.org I n t r o d u c t i o n The systems biological/medical combination of genome sequence and structure, its annotation and experimental data in an accessible and comprehensible way is a major challenge. Increasingly there is a large number of extremely divergent data sets: the sequence itself, genes, regulatory regions, various forms of reoccurring sequence features and clone sets etc. Currently, one possibility to represent this information in a visual form - and thus to reveal its scientific meaning - is to use genome browsers such as "Ensembl" or the "UCSC Genome Browser". These browsers have been beneficial in the understanding of the complex organization of genomes. However, there are also limitations concerning their focus on linear presentation, standardized input and data bank accessibility. Also customizability by a remote user with special requirements is difficult. Here we show successfully with the GLOBE 3D Genome Platform ways to visualize multi-dimensional data sets from various sources in an easily accessible way. This allows the integration of these data sets into a single holistic virtual display system giving a systems biological/medical oriented view of genomes advancing basic research, diagnostics and new treatments. The platform allows the mapping of classical and experimental data tracks projected onto metaphase chromosomes simultaneously (Fig. 1). The general track and every single track element layout is customizable e.g. in position, shape and colour. The viewer allows to visualize in prinziple an unlimited number of elements. Multi-Mapping Fig. 1: Complete merged clone set (UCSC, NCBI, Ensembl) of chr. 15: colours represent association with duplicon regions. In addition to the simultaneous mapping on one chromosome, the platform allows the analysis of inter-chromosomal relationships based either on an external input (Fig. 2) or internal correlation analysis (Fig. 1-8). Every genome dependent item is relatable e.g. syndromes to duplicons or gene families to breakpoints. Inter-Relations Fig. 2: Multi-chromosomal relation view between duplicon regions between chr. 15 & 21. Colours: duplicon spreading degree. Using the dynamic scaling range of the intra- chromosomal relationships can be studied in detail in relation to the track mapping (Fig. 1, 2, 4) concerning basic research, diagnostics and treatments. Assays can be projected, related, reviewed and redefined thus leading on various genome levels to scale-free insights. Intra-Relations Fig. 5: Intra-chromosomal duplicons (Eichler et al.) compared to syndromes (blue/green), literature hot-spots (orange), and our defined hot-spots (pink) of the chr. 22q.11 region. C o n c l u s i o n There are several physical levels of genetic information storage, e.g. DNA, chromatin and chromosomes. The interaction between information and the structural carrier is of critical importance for genome function. The platform visualizes 3D genomic structures and to project and link these to a classical linear representation (Fig. 6). Structure Fig. 8: Correlation of a simulated 3D chromatin/chromosome topology combined with the - in principle - linear information content in the DNA sequence and multi-dimensional mapping of chr. 15. The platform has a large dynamic range in the size and resolution of the features it can display: from whole genomes (Fig. 7) or chromosomes (Fig. 8) to individual bases (Fig. 6). This new environment creates entire new possibilities for understanding genome organization. Resolution Scale Fig..6: Dynamic zoom into the level of the DNA. Fig..7: Background image: Multi-chromosomal relation between the breakpoints of chr. 15 to all other chromosomes. Colours: as in Fig. 2. Flexible Customizable Intuitive Navigation Real-Time Interaction & Analysis Dynamical Resolution & Arrangement Extremely Large & Multi-Dimensional Data Bridge ALL Scales from Sequence to Morphology Features The GLOBE 3D Genome Platform presented here enables researchers to visualize and analyse the multi-dimensional aspects of genomes in a new intuitive way. In combination with a data-warehouse and a computing grid also being set-up in parallel, an environment with entire new inspirating possibilities has been created. This opens new perspectives for future research leading to a better understanding of the holistic systems biological/medical properties of genomes, which is neccessary for advanced diagnostic services and perhaps ultimate treatments. Data Tracks Syndrome Ideograms Break Points Chromosome Duplicon Chrom. Loops Repeats Chromatin Epigenetic Histone Genes /SNP DNA D C A B Data Tracks BACS 3D-FISH Fosmids M-FISH Gen. Arrays CGH Prot. Arrays Expression Restr. Sites 3C Primers QPCR D B C A GLOBE - Consortium Erasmus Medical Center Rotterdam Towards Holistic Genomics Bundesministerium für Bildung und Forschung MediGRID Erasmus MC Eramus University Medical Center Rotterdam University of Regensburg University Heidelberg European Comission German Cancer Research Center E p i G e n S y s C o n s o r t i u m S y s t e m s - S t r u c t u r e - F u n c t i o n E p i G e n S y s C o n s o t i u m S y s t e m s - S t r u c t u r e - F u n c t i o n E p i G e n S y s C o n s o r t i u m S y s t e m s - S t r u c t u r e - F u n c t i o n E p i G e n S y s C o n s o r t i u m S y s t e m s - S t r u c t u r e - F u n c t i o n E p i G e n S y s C o n s o r t i u m S y s t e m s - S t r u c t u r e - F u n c t i o n Fig. 3: Various data tracks. Fig. 4: Various data tracks.