TECHNICAL PAPER Efficient HW/SW partitioning of Halo: FPGA-accelerated recursive proof composition in blockchain Rami Akeela 1,2 • Mitchell P. Krawiec-Thayer 1,2 Received: 5 November 2020 / Accepted: 21 November 2020 Ó Springer-Verlag GmbH Germany, part of Springer Nature 2021 Abstract The blockchain space has seen tremendous innovation and advancement, in the last few years with an explosion of functionality and use cases. However, several challenges naturally arise from the nature of these distributed systems— energy efficiency, privacy, and scalability challenges due to the computational resources required to generate, validate, and store the cryptographic proofs that provide immutable security. New applications of recursive proof composition offer paradigmatic improvements that effectively address these challenges. This paper addresses the practical implementation of these theoretical advances. We demonstrate how HW/SW co-design methods can be algorithmically applied to identify practical hardware optimizations for the cryptographic verification of these zero-knowledge proofs, using Halo as an example. We offer a partitioning methodology of blockchain operations and then discuss the use of the Binary Particle Swarm Optimization (BPSO) algorithm for systemic optimization. To demonstrate our methodology, we implement the Halo algorithm on the Xilinx Zynq-7000 System-on-Chip (SoC). We successfully achieve a considerable speedup of 2.2x, compared to a software-only implementation on a CPU. 1 Introduction Blockchain technology refers to shared digital ledgers, typically hosted on permissionless decentralized networks. Every participant has access to all data on blockchain, and can thus cryptographically verify the correctness of the entire ledger and its transactions. These systems offer significant benefits in terms of accessibility, censorship resistance, and immutability. The cryptographic proofs within transactions offer strong security guarantees, how- ever verifying and storing these proofs requires significant resource, which we will address below. Using a distributed platform to verify sensitive infor- mation without moving data assets has a number of applications: – ID and verification – Enablement of Self Sovereign Identity – Proof of funds – Data privacy and compliance to the European Union’s General Data Protection Regulation (GDPR) – Anonymous transactions (e.g., ZCash) – Determining the origin of sensitive data (e.g., matters of national security) To highlight the significance of the blockchain market, the market cap for Bitcoin alone is projected to reach $1 tril- lion (Benjamin Pirus 2020). A few of the market trends for blockchain, as reported by McKinsey Digital (2018), include monitoring supply chains and managing Internet of Things (IoT) networks. Blockchain has been adopted for a wide range of applications (Monrat et al. 2019). The list includes healthcare systems (Ray et al. 2020), IoT appli- cations (Lao et al. 2020), tokenization (Li et al. 2019), and electronic voting (Kshetri and Voas 2018). Blockchain and its implementation methods are con- stantly studied to achieve more efficient processing power and energy consumption. Among the hardware platforms used to implement blockchain processes are General Pur- pose Processors (GPPs), Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), and Application Specific Integrated Circuits (ASICs). Each platform offers a number of advantages and challenges. A major challenge is the careful balance between processing power and energy consumption. Hardware optimizations in & Rami Akeela rakeela@scu.edu 1 Department of Electrical and Computer Engineering, Santa Clara University, Santa Clara, CA 95053, USA 2 Head of Research, Insight, San Francisco, CA 94107, USA 123 Microsystem Technologies https://doi.org/10.1007/s00542-020-05138-4