342 IETE TECHNICAL REVIEW | VoL 26 | ISSUE 5 | SEP-oCT 2009 Benchmarking Database Systems for the Requirements of Sensor Readings Ciprian Pungilă, Teodor-Florin Fortiş and Ovidiu Aritoni Research Institute e-Austria, Bvd. C.Coposu, 4, Room 045B, 300223, Timişoara, Romania Abstract Improving energy efficiency in order to reduce CO 2 emissions is a permanent challenge in the European space. Smart metering could help for improving energy efficiency by offering information about the way in which the energy is used. Smart metering will be based on large volumes of sensor data, since energy monitoring will bring together sensor data from various critical areas. The main purpose of this paper was to present the selection mechanism for a scalable storage solution, based on the requirements of the DEHEMS (Digital Environment Home Energy Management System) project. With regular sensor readings coming at every 6 seconds, there is an impressive amount of data collected even for the minimal target of about 250 households, 10 sensors per user. With these huge data streams that are non-stationary time-series data, collected at discrete intervals, the DEHEMS project has to offer a solution for storing and retrieving sensor data in a responsive way. We have tested both collection speed and aggregation speed for reasonable data streams of sensor data. The tests were performed on various database models, with their associated representations, including relational databases, key-value stores, column stores, self-tuning databases, as well as time-series enabled database systems. These experiments confirmed that column stores and key- value stores perform better than relational databases, while time-series databases outperform all the others. Keywords Benchmarking, Energy monitoring, Sensor data, Smart metering, Time-series data. 1. Introduction Improving energy efficiency in order to reduce CO 2 emissions is a permanent challenge in the EU-27 space. Different projects and initiatives were developed around this target in order to improve energy consumption at different levels; databases on residential and on tertiary sector electricity consumption were developed and recommendation based on these developments were issued (such as the REMODECE and EL-TERTIARY projects). As households account for about 30% of EU CO 2 emissions, with a significant part of household energy used for heating, the DEHEMS project will help in improving energy efficiency by supporting households to reduce their energy usage through better analysis and management of their energy consumption. Different studies in energy conservation [1,2] show that behavior change in households is central to improving energy usage. According to [3], with automated room temperature control, there are potential savings of around 20-30 kWh/m 2 . Digital Environment Home Energy Management System (DEHEMS) developments will be based on smart metering energy ‘performance’ models, which will monitor not only the levels of energy being used by a household, but will also look at the way in which the energy is used. In order to achieve these ‘performance’ models, energy monitoring will bring together sensor data from critical areas like appliance performance, or household heating. As sensor readings are expected to occur from various sources, it is expected that generated data collection is essentially a very large database that support storing and generating of statistical measures on sensor data. With a target of only 250 households, the generated amount of data is already extremely large. In order to select a scalable storage solution, it was in our intention to test the capabilities and limits of the different database storage engines for the requirements of large- scale implementation of the DEHEMS project. The remainder of this paper is organized as follows: Section 2 offer the necessary background information, as well as the state-of-the-art analysis related with the objectives of the paper; Section 3 presents the speciic requirements for data storage, as well as some method- ological issues related with tests performed on various database systems; Section 4 includes setting details, and synthetic views of benchmarking results; Section 5 will conclude with the results of the tests performed, and offer an overview of future research. [Downloaded free from http://www.tr.ietejournals.org on Wednesday, November 18, 2009]