MEMSCALE
TM
: a Scalable Environment for Databases
H´ ector Montaner
∗
, Federico Silla
∗
, Holger Fr¨ oning
†
, and Jos´ e Duato
∗
∗
Universitat Polit` ecnica de Val` encia, Departament d’Inform` atica de Sistemes i Computadors
Camino de Vera, s/n 46022 Valencia, Spain. hmontaner@gap.upv.es, {fsilla,jduato}@disca.upv.es
†
University of Heidelberg, Computer Architecture Group
B6, 26, Building B (3rd floor) 68131 Mannheim, Germany. froening@uni-hd.de
Abstract—In this paper we propose a new memory architec-
ture for clusters referred to as MEMSCALE. This architecture
provides a distributed non-coherent shared-memory view of the
memory resources present in the cluster. With this aggregation
technique, a given processor can directly access any memory
address located at other nodes in the cluster and, therefore,
the whole memory present in the cluster can be granted to a
single application.
In this study we focus on in-memory databases as a memory-
hungry application in order to show the possibilities of our
new architecture. To prove the feasibility of our idea, a 16-
node prototype cluster serves as a demonstrator. Part of the
memory in each node is used to create a global memory pool
of 128GB which hosts an entire database. First we show that
providing more memory than usually available in a typical
commodity node for a database server makes the execution of
queries more than one order of magnitude faster than using
regular SSD drives. After that, we go one step further and show
that simultaneously accessing the database from all the nodes
in the cluster converts our prototype into a powerful database
server capable of beating current commercial solutions in terms
of latency and throughput.
Keywords: memory architecture, cluster computer, non-
coherent memory, in-memory databases
I. I NTRODUCTION
Commodity computers have become the common building
block for scalable high performance computing. As a matter
of fact, 83% of the systems included in the Top500 list are
cataloged as cluster computers [1]. The main reason is that
clusters based on commodity computers are noticeably much
more cost-effective than their counterpart massive parallel
processing systems.
However, the cluster architecture partitions the system
memory into isolated pieces, each one located at a different
node. In this way, communication among nodes is resolved
by exchanging messages, although this access to foreign
memory undergoes extra overhead caused by the message
handling layer (in addition to the higher latency due to
the distance between nodes). Nevertheless, this paradigm is
commonly used by MPI-based applications.
This innate latency in the process of gaining access
to remote memory through a message, together with the
required extra effort when dealing with explicit messages on
the programmer side, encourages the use of shared-memory
applications where possible. However, as a processor can
only directly access memory allocated at its node, the
habitat of a shared-memory application is restricted to a
single motherboard, thus hindering its use across a cluster.
However, the current trend in the number of cores per socket
alleviates the previous restriction in the sense of computing
power resources: nowadays, it is easy to configure a moth-
erboard with 32 cores and, as this number will increase up
to 80 cores in the near term, the number of execution flows
hosted in a single node can be quite high. However, note
that many shared-memory applications do not scale beyond
a few tens of threads [2], either because of synchronization
problems or because unbalances in the system such as I/O
bottlenecks in some data-intensive applications.
But this situation changes with regard to memory needs,
which are a harder requirement than the computing power
one: a decrease in the number of available cores produces
a linear increase in execution time, but a decrease in the
amount of available memory produces an exponential in-
crease in execution time. This behavior is due to the fact
that secondary memory storage makes up for the lack of
main memory, although their performance differs in several
orders of magnitude. This is why memory is overscaled at
each node in clusters, just to prevent the critical situation
where an application runs out of main memory. However,
most of the time this just-in-case memory remains idle
(but consuming power). This economic cost and energy
inefficiency is not the only problem. As described in [3],
current trends in DIMM technology predict that the amount
of available memory per core will drop by 30% every two
years. This means that applications will become more and
more memory restricted and, thus, a remedy for the memory
capacity wall seems to be urgent.
We proposed a solution in [4][5] to increase the available
memory to an application by leveraging main memory from
the other nodes in a cluster. As we explain later, this
approach, called MEMSCALE
TM
, can be seen as a memory
aggregation mechanism that does not require coherency
among nodes in the cluster because the global memory
pool is treated as an exclusive distributed memory, that is,
only one application located in one of the nodes can use
this memory at a time. In this paper we apply our remote
memory architecture to databases and analyze how this kind
of applications can benefit from a large main memory pool.
By nature, databases present an insatiable need for mem-
ory. Due to the large amount of data that these applica-
tions usually handle, tables have been traditionally stored
in secondary memories like hard disks. However, as the
2011 IEEE International Conference on High Performance Computing and Communications
978-0-7695-4538-7/11 $26.00 © 2011 IEEE
DOI 10.1109/HPCC.2011.51
339