A Review on How Big Data Analytics Can Influence Education Victor Chang 1 a , Qianwen Xu 1 , Victor Mendez 2 1 International Business School Suzhou, Xi’an Jiaotong-Liverpool University, Suzhou, China 2 Universitat Autònoma de Barcelona, UAB, Spain Victor.Chang@xjtlu.edu.cn, Qianwen.Xu18@student.xjtlu.edu.cn, victor.mendez@uab.es Keywords: Big Data, Big Data Analytics, Education, Ethics Abstract: This paper gives an explanation about the definition of Big Data, Big Data Analytics and how they influence education in terms of benefits and ethical issues. First of all, this paper illustrates the applications of big data analytics in educational context and three benefits to education. Then, it explains ethical issues arisen during the usage of Big Data Analytics. It then discloses the ethical issues of Big Data Analytics employed in the education context from the aspects of individuals and the goal of education. Finally, this paper provides several recommendations to deal with current or potential ethical issues due to the applications of big data analytics in the area of education. 1 INTRODUCTION AND BIG DATA ANALYTICS Nowadays, with the development of big data, a lot of industries have benefited a lot from the applications of big data analytics. While big data generates information from a variety of sources and processes them to create new information or knowledge and then make predictions or other usage, concerns such as ethical issues also become the hot topic. This paper explains the role of Big Data and Big Data Analytics in the area of education. It intends to provide a holistic perspective of education-related literature and discusses about the ethical issues within the context in detail. 1.1 What Is Big Data? Big data is a broad context that refers to the massive amounts of digital information being captured and used to personalize content, predict consumer behaviors, and design interventions (Gregg, Wilson and Parrish, 2018). There are many definitions about big data. Mills, S., et al (2012) define big data as vast amount of data which is also of high velocity, complexity and variety and advanced technologies are required to collect, store, distribute, manage and analyze the information. a https://orcid.org/0000-0002-8012-5852 After conducting surveys on twelve definitions by Gartner, Microsoft, Oracle, Intel etc., Ward and Barker (2013) condense them and define the characteristics of big data as large volume and complex and it is stored and analyzed with a set of technical tools such as NoSQL, Map Reduce, and machine learning. Nowadays, it is quite common to describe big data by using the Three V’s, which are Volume, Variety, and Velocity, Volume refers to the magnitude of data. According to Gandomi and Haider (2014), definitions of big data volume could be different depending on factors, such as time. At present, the size of big data is measured in multiple terabytes and petabytes, but as the storage capacities keep growing, the data which can be classified as ‘big’, may not meet the entry criteria in the future as the measurement unit of big data may come up to Exabyte. Variety refers to diversity in the structure of dataset. Dataset can be structured (e.g. tabular data), semi-structured, and unstructured data (e.g. text, audio). Extensible Markup Language (XML), which is a language for flexible encoding documents, is a kind of semi- structured data. Velocity refers to the speed of data’s generation, analysis and reaction (Gandomi and Haider, 2014). More and more areas, e.g. markets, mobile apps, require real-time information in recent years and will be in the future. High-frequency data brought to you by CORE View metadata, citation and similar papers at core.ac.uk provided by Teeside University's Research Repository