JID:BDR AID:38 /FLA [m5G; v1.175; Prn:8/04/2016; 14:44] P.1(1-7) Big Data Research ••• (••••) •••••• Contents lists available at ScienceDirect Big Data Research www.elsevier.com/locate/bdr Machine Learning with Big Data An Efficient Electricity Generation Forecasting System Mohammad Naimur Rahman, Amir Esmailpour , Junhui Zhao Department of Electrical & Computer Engineering and Computer Science, University of New Haven, West Haven, CT, 06516, United States a r t i c l e i n f o a b s t r a c t Article history: Received 1 June 2015 Received in revised form 26 January 2016 Accepted 19 February 2016 Available online xxxx Keywords: Artificial neural network Backpropagation Big Data Electricity generation forecast Hadoop MapReduce Machine Learning (ML) is a powerful tool that can be used to make predictions on the future nature of data based on the past history. ML algorithms operate by building a model from input examples to make data-driven predictions or decisions for the future. The growing concept “Big Data” has brought much success in the field of data science; it provides data scalability in a variety of ways that empower data science. ML can also be used in conjunction with Big Data to build effective predictive systems or to solve complex data analytic problems. In this work, we propose an electricity generation forecasting system that could predict the amount of power required at a rate close to the electricity consumption for the United States. The proposed scheme uses Big Data analytics to process the data collected on power management in the past 20 years. Then, it applies a ML model to train the system for the prediction stage. The model can forecast future power generation based on the collected data, and our test results show that the proposed system can predict the required power generation close to 99% of the actual usage. Our results indicate that the ML with Big Data can be integrated in forecasting techniques to improve the efficiency and solve complex data analytic problems existing in the power management systems. 2016 Elsevier Inc. All rights reserved. 1. Introduction The United States (U.S.) is currently the second largest elec- tricity producer and consumer in the world [1]. The U.S. enjoys a magnificent geographical diversity among states with a high amount of power consumption. This makes it challenging to de- ploy a centralized power management system that can control the power generation and regulate the consumption. The electricity is mostly generated from natural resources, such as coal, gas, nuclear, petroleum, oil, and renewable energy. The consumption sectors can be detailed in terms of commercial, industrial, residential and other user communities. Due to lack of centralized control, there is a large disparity in the ratio of power consumption/power generation from one state to the next. This imbalance results in wasting large quan- tities of power generated in states where generation significantly exceeds consumption, while other states are suffering from in- sufficient amount of power generation. Due to the size and the geographical diversity of different states in the U.S., it is farfetched This article belongs to Analytics & Applications. * Corresponding author. E-mail addresses: MRahm1@unh.newhaven.edu (M. Naimur Rahman), aesmailpour@newhaven.edu (A. Esmailpour), JZhao@newhaven.edu (J. Zhao). to prescribe centralized control over the power system. Merely, the interstate segments are regulated by the federal government [2,3], and the majority of the rest of the nation is delimited by individ- ual states. Fig. 1 shows the electricity generation and consumption in the U.S. during 1980–2014. In this figure, the green line at the bottom shows the consumption, the red line in the middle repre- sents the actual generation, and the blue line on top indicates total generation including net import (i.e. from neighboring countries). The difference between the generation (red line) and consumption (green line) is attributed to system losses, uncounted loads, and the lack of centralized control. Fig. 2 shows electricity generation in the U.S., by state. States shown in lighter brown color are not producing enough electricity to meet their demand. Other states (shown in darker orange color) produce excess electricity, which could be used to compensate for the brown states lacking sufficient power generation. Further de- ficiencies are fulfilled by importing electricity from neighboring countries. Power generation is in direct correlation with the amount of re- sources used to generate the electricity such as coal, gas, nuclear, petroleum, oil, and renewable energy. In Fig. 1, the red line in the middle (representing the power generation in the U.S.) provides two types of information: the amount of energy consumed and the quantity to be imported. Therefore, predicting power genera- tion might provide vague information about power demand; hence http://dx.doi.org/10.1016/j.bdr.2016.02.002 2214-5796/2016 Elsevier Inc. All rights reserved.