International Conference on "Emerging Trends in Computer Engineering, Science and Information Technology”- 2015 Special Issue of International Journal of Electronics, Communication & Soft Computing Science and Engineering, ISSN: 2277-9477 174 A New Effective Approach to Move Huge Data to Cloud Minimizing Cost Leena B. Dhangar Prof. Amitkumar Manekar Abstract — Big Data is a promising focus in the cloud computing arena that is getting a lot of significance. Information is progressively more important in our daily lives. We access the internet everyday to perform searches, and other applications. So processing of such huge data requires well suited resources as well as lot of time. In many scenarios, data are, however, geographically distributed across multiple data centers. So sometimes it becomes necessary to transfer data from one server to other due to some reasons. Therefore cost minimization for processing data has become an important issue in big data era. Here considered three factors like Data assignment, data placement and data movement which manipulate the operational expenses of data centers. Hence, this paper presents an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm which minimizes the cost of data transfer to the cloud. Key Words — Big Data, Cloud Computing, Cost Minimization, Data Movement I. INTRODUCTION Big data is a term that refers to data sets or combinations of data sets whose size (volume) [4], complexity (variability), and rate of expansion (velocity) make them difficult to capture, manage, process or analyze by usual technology and tools. Whereas the size used to determine whether a particular data set is considered big data is not firmly defined and continues to transform over era, nearly all analysts and practitioners at present refer to data sets from terabytes to multiple petabytes as big data [1]. Over the past several years there has been a tremendous increase in the amount of data being transferred between Internet users. Escalating usage of streaming multimedia [3] and other Internet based applications has contributed to this surge in data transmission. An additional facade of the augment is due to the expansion of Big Data [18], which refers to data sets that are an order of magnitude larger than the standard file transmitted via the Internet. Big Data can range in size from hundreds of gigabytes to petabytes [11]. Today everything is being stored digitally. Within the past decade, everything from banking transactions to medical history has migrated to digital storage. This change from physical documents to digital files [12] has necessitated the creation of large data sets and consequently the transfer of large amounts of data. There is no sign that the amount of data being stored or transmitted by users is steady or even decreasing. Every year average Internet users are moving more and more data through their Internet connections [12]. Depending on the bandwidth of these connections and the size of the data sets being transmitted, the duration of transfers could potentially be measured in days or even weeks. There exists a need for an efficient transfer technique that can move large amounts of data quickly and easily without impacting other users or applications. Thus Big Data has translated already into the big price because of its high demands on computation and communication resources [14]. Accordingly considering the bandwidth of these connections and size of these data sets that is being transmitted, duration of data transfer could be measured in terms of days or even weeks. Therefore it becomes necessary to invent an approach that will minimize the cost [1] of processing of this big data. Thus Big data analysis is one of the key challenges of current era. The restrictions to what able to be done are often times due to how much amount of data can be processed in a given period of time. Big data sets innately occur due to applications generating more information to get better operation, performance; general applications like social networks supports every individual users in producing massive amounts of data. The cloud computing paradigm enables rapid on- demand provisioning of server resources (CPU, storage, bandwidth) to users, with minimal managing hard work. Recent cloud platform, as exemplify by Amazon EC2 and S3, Microsoft Azure, Google App Engine, Rack space, etc., organize a shared pool of servers from multiple data center, and provide their users with virtualization technology. The elastic and on-demand nature of resource provisioning makes a cloud platform attractive for the execution of various applications, especially computation-intensive ones [2], [3]. More and more data-intensive Internet applications such as the Human Genome Project [4], are relying on the clouds for processing and analyzing their petabyte-scale data sets,