www.postersession.co m www.postersession.c om www.postersession.co m ABSTRACT Seismology is a data-driven science with a large amount of data gathered for over a century. With the availability of a large amount of seismic data, it is paramount to develop new seismic data processing and management tools to help analyze and find new and better seismic models. Though seismic data recording started in 1900, the growth of seismic data has been exponential in the last three decades. This data growth can be easily exemplified by looking at just one of the largest seismological data centers in the US/World known as the Integrated Research Institutions in Seismology (IRIS) Data Management Center (DMC) of the United States: data at IRIS DMC grew from less than 10 Tebibytes in 1992 to greater than 750 Tebibytes in 2022. Developing new big seismic data processing and management tools will be helpful to make the best use of such big data sets. The objective of this investigation is the development of data manipulation and processing tools for splitting, merging, converting, processing and managing big seismic data from different data sources. Such big seismic data processing tools are being developed using python programming language and open-source python libraries, and the tools will be helpful to split, and convert, merge and process big seismic data. Python has powerful libraries to process and manage data and applications as well as develop new powerful tools for the above objectives. Moreover, we will be making use of some important software tools developed in the last decade or so to develop our tools. Some of these software are specifically developed for seismic data processing such as Obspy and obspyDMT, and the other tool we will be using is Apache Spark which is useful for all kinds of big data analytics. Methods Conclusions Development of Big Seismic Data Processing Tools Mulugeta Dugda 1 , Alemayehu B. Kassa 2 , Line Pouchard 3 , Hunter Saylor 2 1.Morgan State University, Department of Electrical and Computer Engineering 2. Morgan State University, Department of Computer Science 3. Brookhaven National Laboratory, Computational Science Initiative Figure #1 References [1]. ObspyDMT: a Python toolbox for retrieving and processing large seismological data sets, Kasra Hosseini 1,2 and Karin Sigloch 1 , Published: 12 October 2017 [2]. The ObsPy Development Team (devs@obspy.org), ObsPy Tutorial Release 1.2.0 [3]. Jason Brownlee, Deep learning with python, developing deep learning models using theano and tensorflow using keras, 2020 [4]. The VERCE portal a user’s guide, Version 1.1 September 2015 [5]. Yuri Demchenko, Paola Grosso, Cees de Laat, Peter Membrey Addressing Big Data Issues in Scientific Data Infrastructure, 2013 IEEE [6]. Machine learning with python, Tutorials Point, 2019.ata mining, American Geoscience Union Fall Meeting 2015, December 14-18, San Francisco, USA. The first task of this study is investigating the methods of splitting, merging, converting and managing the existing python based seismic libraries for big seismic data. Then, develop a demo tools to split, merge, convert and manage big seismic data. Finally, identify the feasibility to develop a new python based seismic library or framework to extract, split, merge, convert, encrypt and manage big seismic data. The future work is to develop a new python framework or library that enables efficiently split, merge, convert, process and manage big seismic data from different data centers. In this study, we retrieve the big seismic data from North America seismic data centers. We draw and identify the map and geographical location of the seismic data using obspy and obspyDMT. Then, we develop a demo tools that help to split, merge, convert and manage the retried seismic data using the existing python libraries. In addition, we identify feasibility to develop a new library or framework based on the existing python based seismic tools. Based on our study, we proposed and we planned to develop a new python framework or library that enables efficiently extract, split, merge, convert, encrypt and manage big seismic data from different data centers. Large amount of seismic data has been gathered for over a century. Such large/big seismic data stimulates the development of new or more efficient big seismic data analytics (processing and management) tools to help analyze and find new or better seismic models. In addition, there are powerful seismic python libraries to process and manage seismic data (like obspy, ObspyDMT). We identify the feasibility to develop a new python based seismic library to extract, split, merge, convert, encrypt and manage big seismic data. So far, we have been developing a demo tools to split, merge, convert and process seismic data. In this research, we have developed seismic data management demo tools using and combining different python tools and libraries. The demo is developed using python programming language, pyspark and other python-based libraries. The data is retrieved from different seismic data centers using obspy and opsypDMT. In the future, we are planning to advance our development of new python frameworks or libraries that are capable of extracting, splitting, merging, converting and encrypting seismic data from one format to others. Introduction Results Python We will be using Python programming language and python based software to develop data splitting, merging, converting and management tools for big seismic data. We have installed python version 3.9.7 on Anaconda Distribution. obspy ObsPy is an open-source project dedicated to provide a Python framework for processing seismological data. It provides parsers for common file formats, clients to access data centers and seismological signal processing routines, which allow the manipulation of seismological time series. The goal of the ObsPy project is to facilitate rapid application development for seismology. It helps to work with seismological data such as Waveform data, Station metadata and Event metadata [2]. We have installed obspy version 1.3.0 on Anaconda Distribution. obspyDMT obspyDMT (obspy Data Management Tool) is a tool for retrieving, processing and managing seismological datasets in a fully automatic way. It can be used as a stand-alone command line tool or can be integrated as a module with other Python codes. It is provided with powerful diagnostic and plotting tools to check the retrieved data and metadata. ObspyDMT is written in the Python programming language and runs on Linux, Mac OS and Windows platforms [1]. We have installed obspyDMT version 2.2.10 on Anaconda Distribution. Apache spark Apache Spark is an open-source, distributed processing system used for big data workloads. It is becoming increasingly effective in big data analysis and solving artificial intelligence problems. We have installed pyspark version 3.1.2 on Anaconda Distribution. Acknowledgement We would like to thank the US National Science Foundation (NSF) as this project is supported by NSF Research grant #2101080. However, NSF is not responsible for any of the results published in this paper. Figure#2 Figure #3 Figure #4 Figure #11 Figure #8 Figure #7 Figure #10 Figure #12 Figure #9 Figure #6 Figure #5 OPTIONAL LOGO HERE OPTIONAL LOGO HERE