International Journal of Research in Engineering, Science and Management Volume-2, Issue-12, December-2019 www.ijresm.com | ISSN (Online): 2581-5792 130 Abstract: We live in a digitalized world today. Analysis of structured data has seen tremendous success in the past. However, analysis of large scale unstructured data in the form of video format remains a challenging area. YouTube, a Google company, has over a billion users and generates billions of views. Since YouTube data is getting created in a very huge amount and with an equally great speed, there is a huge demand to store, process and carefully study this large amount of data to make it useful.300 hours of video is uploaded to YouTube every minute. Just imagine volume of data is generated by YouTube and it is publicly available and because of this YouTube become a powerful tool for data analysts to analyze which YouTube Channel is trending or which YouTube category will help to increase sales and reaching out to customers with quality products. Big Data mining of such an enormous quantity of data is performed using Hadoop and MapReduce to measure performance. This project aims to analyze different information from the YouTube datasets using the MapReduce framework provided by Hadoop. Keywords: Unstructured data, YouTube data analysis, Big Data, Hadoop, HDFS, MapReduce. 1. Introduction In today’s day and age, the consumption of data is increasing rapidly. Along with this consumption, the data that is stored on the servers is also increasing. The popular video streaming site- YouTube is one such example where the data upload and consumption rate is increasing at a fleeting rate [2]. These are available in structured, semi-structured, and unstructured format in petabytes and beyond. This huge generated data has given a birth to data called as Big data. Table 1 YouTube Statistics YouTube Company Statistics Data Total number of YouTube users 1,325,000,000 Hours of video uploaded every minute 300 hours Number of videos viewed everyday 4,950,000,000 Total number of hours of video watched every month 3.25 billion hours Number of videos that have generated over 1 billion views 10,113 Average time spent on YouTube per mobile session 40 minutes Table 1 [4], provides us with important statistics of YouTube. Hence, such kind of data can be handled using the Hadoop framework. Most of the companies are uploading their product launch on YouTube and they anxiously await their subscriber’s reviews and comments. Major production based companies launch movie trailers and people provide their first reaction and reviews about the trailers [3]. Big data is huge collection of large and complex data sets. These massive data sets can’t be analyzed using traditional database management tools [6]. “Big Data is a word for data sets that are huge and complex that data processing applications are insufficient to deal with them. Analysis of data sets can find new correlations to spot business sales, prevent diseases, preventing crime and so on." [3]. The characteristics of big data are: High Volume of Data. High Velocity of Data. High Variety of Data. Fig. 1. Characteristics of big data Apache Hadoop is one technology which can be used for faster, reliable and distributed processing of large scale data. The Hadoop technologies like HDFS, MapReduce can be utilized for processing and retrieving of unstructured video data. The incompetence of RDBMS gave birth to new database management system called NOSQL management system. The Hadoop is an open source project including Hadoop Distributed File System (HDFS) for storing and retrieving the data. Hadoop mainly consists of two components 1. A distributed processing framework named MapReduce (which is now supported by a component called YARN (Yet Another Resource Negotiator). 2. A distributed file system known as the Hadoop Distributed File System, or HDFS. [6] A Review on YouTube Data Analysis Using MapReduce on Hadoop Krishna Bhatter 1 , Siddhi Gavhane 2 , Priyanka Dhamne 3 , Shardul Rabade 4 , G. B. Aochar 5 1,2,3,4 Student, Dept. of Computer Engineering, Modern Education Society’s College of Engineering, Pune, India 5 Assistant Professor, Dept. of Computer Engg., Modern Education Society’s College of Engineering, Pune, India