8 Vinod Kumar, Ravi Mohan Sharma, R.S. Thakur, International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 5, Issue 6 June 2016 Big Data Analytics: Bioinformatics Perspective Vinod Kumar MCA Dept. MANIT, BHOPAL(MP), India Ravi Mohan Sharma Computer Applications,MCRPVV, BHOPAL India R.S. Thakur MCA Dept.MANIT, BHOPAL(MP), India Abstract - In current digital era, every experimental instruments, clinical system, laboratory apparatuses are embedded with digital devices, due to the digitization of research and experimental processes, biological databases has increased in volume tremendously. However, the high performance computing devices and software tools to deal with this complex and increased volume of data is still persists as a big challenge among the computer scientists and biologists. This papers introduces the various analysis and visualization tools in bioinformatics, bioinformatics big databases and a high level bioinformatics system architecture is proposed to handle the voluminous data in bioinformatics. Keywords: Big Data, Bioinformatics, Analysis Tools, Hadoop, MapReduce I. INTRODUCTION In todays’ world of information and technology era, Bioinformatics is now used as an umbrella term for almost all aspects of computational biology. Bioinformatics research will have an impact on all of biology. Bioinformatics can be defined in many ways. Broadly, bioinformatics can be considered of it as a computer-based discipline applied to the life sciences. More specifically, it is a science that utilizes computationalbased technology for capturing, storing, representing, retrieving and analyzing biological information and/or to simulate process of biotic Systems. Along with biology and computer science aspects, it also involves genetics, statistics, software engineering (SE),mathematics, systems biology,molecular evolution, and so on. Currently, bioinformatics are found in many facets- Biomedical literature, Clinical sciences, Statistics, Biological sciences, Biomedical Sciences, Computer Science and Engineering, Mathematics, Systems biology, Genetics, ProteomicsEvolutionary biology, Genomics, Pharmacokinetics,Pharmacogenomics, High throughput chemistry, High throughput biology, Metabolomics etc. Biological data are huge in volume, it is has varieties, and velocity. So, it is termed as biological Big Data with 3 V’s. Since, the biological data are in massive volume, hence it is an intelligent not to move data, Data must reside its own storage place, and only the code for analyzing the data must suite for processing over the data. Common Data that are massive in volume and needs high computing devices to process it all. The major reason for production of voluminous data. The cost of producing, acquiring and disseminating data is decreasing day by data. Healthcare systems, where the digitalization of all clinical examinations and records of medical reports is getting as standardization in hospitals. In rest of the Paper,Section II describes about the exponential growth of data in bioinformatics, Section III basically focuses on various available bioinformatics databases. Section IV proposes a High Level Bioinformatics System Architecture and its different components are described in brief. Section V refers to associated challenges and opportunities, Section VI describes various application areas of bioinformatics, finally Section VII summarizes the paper and Section VIII puts light towards the future directions II. DATA EXPLOSION Today, Biological data repositories are dynamically and rapidly increasing in volume, as well as in varieties tool in bioinformatics research. The big data sources are no longer limited to search engine, web logs, indexes, social media, or in particle physics. It is quite obvious that Due to the digitization of the processes, and easily available, high throughput electronic and computing devices at lower costs. The size of data is exponentially rising Figure 1: Bioinformatics Data Explosion [6]