.... .... Fig. 1: The inverse relationships of D and V in the data matrix. Balanced Layouts Using the Composite Data-Variable Matrix Shenghui Cheng, Bing Wang, Zhiyuan Zhang, Klaus Mueller Visual Analytics and Imaging Lab, Computer Science Department, Stony Brook University, NY, USA and SUNY Korea, Songdo, Korea ABSTRACT Numerous methods have been described that allow the visualization of the data-variable matrix. But all suffer from a common problem- visualizing the data and variable points separately which is hard for people to catch the relations in data and variables together. We de- scribe a method that allows data and variables balanced layouts. We achieve it by combining two distance matrices typically used in iso- lation – the distance matrix encoding the similarities of the variables and the distance matrix encoding the similarity of the data points. The remaining two submatrices are obtained by creating a fused distance matrix – one that measures the distance of data points with respect to the variables or vice versa. We then use MDS to simulta- neously optimize the placement of data points and variable points, producing a display that allows users to appreciate all three types of relationships in a single display: (1) the patterns of the collection of data items, (2) the patterns of the collection of variables, and (3) the relationships of data items with the variables and vice versa. 1 I NTRODUCTION The data matrix is a common representation high-dimensional da- tasets. Let N be the number of samples (or data points) drawn from a given population and let D be the number of attributes (or variables) measured per sample – we then obtain an N×D data matrix. In this data matrix, the samples and attributes can change roles. For exam- ple, for a data matrix storing the results of a DNA microarray exper- iment for multiple specimens, one research objective might consider the genes expressed in the microarray to be the samples and the spec- imens to be the attributes, or vice versa. Switching from one objec- tive to the other formally requires a transposition of the data matrix. Numerous methods have been described that allow the visualiza- tion of the data matrix. Embedding the high-dimensional space onto a 2D canvas via a suitable optimization strategy is a common strate- gy. In a low-dimensional space embedding, such as multi- dimensional scaling (MDS) [1], linear discriminant analysis (LDA), and others the attributes are even completely suppressed and only clusters of samples can be visually appreciated. While changing the roles of samples and attributes is easy – it re- quires a simple transpose of the data matrix – the unequal treatment of attributes and samples represents a significant problem. It makes it difficult to observe patterns formed by attributes and samples simul- taneously, and it also makes it difficult to see the samples in the proper context of the attributes. The method we propose provides such a comprehensive display. It uses MDS to simultaneously opti- mize the placement of samples and attributes. 2 T HE COMPOSITE DISTANCE MATRIX Let be the data matrix with rows and columns, where the rows denote the data points, the columns denote the varia- bles and is the data value in the th row and th column. Without loss of generality, we assume is normalized to [0, 1]. Now let D be the data space with m data points: Let be the variable space with variables: where T is the transpose function. Thus, we can look at in two ways. We can map it into variable space V in which D represents the points, or we can map it into data space D where V represents the points. An illustration of the inverse relationship is provided in Fig. 1. 2.1 Extending the Data Matrix As mentioned, current visualization methods tend to look at the two spaces – data space and variable space – in an imbalanced fashion. The usual resort is to either visualize the data matrix or its transpose with the algorithm at hand which then lowers the fidelity of one space at the cost of the other. But it can often be beneficial to see both spaces at the same time and do so in a balanced way where all relationships – data to data, data to variables, and variables to varia- bles – are conveyed at equal fidelity. Visualization of relationships in a data matrix can be made ex- plicit by transforming it into a distance matrix. The notion of dis- tance (also often called dissimilarity) can take many forms – Euclidi- an, cosine similarity, correlation, etc. But in all cases, the matrix stores the pairwise distances of two data matrix vectors, either V or D, but not both. So only one type of relation gets expressed, V or D. Our solution is to create a distance matrix in which both types of relations are equally expressed. We call this matrix the composite distance matrix and the space the composite space (see Fig. 2). In this composite space, both data and variables can be located at the same time. The composite space is symmetrical since data and varia- bles are complementary. 2.2 Creating the Composite Distance Matrix We can derive the composite distance matrix as follows: Here, stores the pairwise dissimilarities of the data points, and DV store the pairwise dissimilarities of the variables with the data points, and stores the pairwise dissimilarities of variables. As mentioned, there are various measures suitable to express dis- tance or dissimilarity. However, these measures have sometimes opposite meaning. Let be the function of Dissimilarity Metrics where F=Euclidian Distance||1-Cosine Similarity||1-Correlation||… 2.1 The Data to Data Distance Matrix (DD) The data points are vectors of equal length. The dissimilarity can be obtained using any of the functions in F. Then the DD matrix is an m×m matrix with elements: . To demonstrate our method with a controlled experiment, we gener- ated a test dataset comprised of a set of 6 6-D Gaussian distributions. 235 IEEE Symposium on Visual Analytics Science and Technology 2014 November 9-14, Paris, France 978-1-4799-6227-3/14/$31.00 ©2014 IEEE