Journal of Artificial Intelligence and Big Data, 2025, Volume 5, Number 1 www.scipublications.com/journal/index.php/jaibd DOI: 10.31586/jaibd.2025.6049 DOI: https://doi.org/10.31586/jaibd.2025.6049 Journal of Artificial Intelligence and Big Data Review Article Enhancing Scalability and Performance in Analytics Data Acquisition through Spark Parallelism Hanza Parayil Salim 1* , Yanas Rajindran 2 1 Staff Engineer, Neiman Marcus, Texas, USA 2 Lead Engineer, AT&T, Texas, USA *Correspondence: Hanza Parayil Salim (hanzapsalim@gmail.com) Abstract: Data acquisition serves as a critical component of modern data architecture, with REST API integration emerging as one of the most common approaches for sourcing external data. This study evaluates the efficiency of various methodologies for collecting data via REST APIs and benchmark their performance. It explores how leveraging the Spark distributed computing platform can optimize large scale REST API calls, enabling enhanced scalability and improved processing speeds to meet the demands of high volume data workflows. Keywords: Distributed computing, Parallel processing, Data Acquisition, Apache Spark, RESTful Web Services, REST API, Data Analytics 1. Introduction REST APIs are commonly used for data acquisition due to their flexibility, scalability, and standardization. They are widely used in enterprises for data acquisition from external sources. In most cases we need to perform large volumes of API calls at a time and that involves a lot of challenges like latency, rate limits, error handling etc. This paper specifically examines the challenges related to latency caused by traditional API calls and explores how these can be addressed using Spark's parallel processing architecture. It discusses how Apache Spark [6] DataFrames, RDDs, UDFs (User defined functions) can be utilized to parallelize REST API calls, enhancing overall performance. 2. REST API and Apache Spark A REST API (also called RESTful web API) is an application programming interface (API) that follows the design principles of the representational state transfer (REST) architectural style. REST APIs provides lightweight, flexible ways to integrate applications and is known for its Scalability, Flexibility and portability and independence as there is a separation between client and server. They provide a simple and efficient method for accessing data from various sources, enabling developers to integrate systems easily and retrieve information through basic HTTP requests. This makes the data acquisition process streamlined and effective across different platforms and programming languages. In various real life scenarios, we need to do Parallel REST API [2] calls. In applications with high traffic, such as e-commerce platforms or real time dashboards, making parallel API requests ensures the application remains responsive by reducing the load time for fetching external data. Parallel API calls can help meet the real time requirements, in scenarios like real time analytics or monitoring systems, where the data needs to be collected from multiple sources without delay and stored in high performance scalable storage like Delta lake [4] and further used for analytics and machine learning [11] applications. How to cite this paper: Salim, H. P., & Rajindran, Y. (2025). Enhancing Scalability and Performance in Analytics Data Acquisition through Spark Parallelism. Journal of Artificial Intelligence and Big Data, 5(1), 38-46. DOI: 10.31586/jaibd.2025.6049 Received: February 2, 2025 Revised: March 8, 2025 Accepted: March 19, 2025 Published: March 22, 2025 Copyright: © 2025 by the authors. Submitted for possible open access publication under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses /by/4.0/).