Performance Evaluation and Optimisation for Kyber on the MULTOS IoT Trust-Anchor Keith Mayes Information Security Group Royal Holloway, University of London Egham, UK keith.mayes@rhul.ac.uk Abstract—The Internet of Things (IoT) may be considered as a distributed, critical infrastructure, consisting of billions of devices, many of which having limited processing capability. However, the security of IoT must not be compromised by these limitations, and defenses need to protect against today’s threats, and those predicted for the future. This requires pro- tection against implementation attacks, as well as the ability to load, replace and run, best-practice cryptographic algorithms. Post-Quantum cryptographic algorithms are attracting great interest, and NIST standardization has a competition to find the best. Prior research demonstrated that a Learning With Errors candidate algorithm could be implemented on a smart card chip, however this was a low-level implementation, and not representative of loading the algorithm onto a secured IoT chip platform. In this paper we present analysis from a practical implementation of the Kyber768 CPAPKE public key encryption component on a MULTOS IoT Trust-Anchor chip. The investigation considered memory and speed requirements, and optimizations, and compared the NTT transform version of Kyber, presented in Round 1 of the NIST competition, with the Kroenecker multiplier technique that exploits a hardware crypto-coprocessor. The work began with a generic multi-round multiplier approach, which was then improved using a novel modification of the input data, allowing a built-in modular multiply function to be used, significantly increasing the speed of a multiplication round, and doubling the useable size of the hardware multiplier. Index Terms—MULTOS, Kyber, Post Quantum, embedded, performance, IoT I. I NTRODUCTION The Internet of Things (IoT), is fast evolving into a critical enabler for future society. Much focus is on new functionality and services, however ensuring the security of IoT is crucial. As yet there is no clear solution for securing the entire IoT, however a lot is known about providing system security in legacy systems, using best-practice cryptographic algorithms and protocols, and there is considerable industry expertise in protecting security sensitive devices from attacks on their implementation. Complications for IoT security, include the long potential life of the deployed infrastructure, the difficulty to physically access and/or replace security-sensitive devices, and their processing resource limitations. To maintain an effec- tive defense against evolving attacks, requires flexible security devices, which even after deployment, can be loaded with new algorithms. The greatest test for such devices, may eventually be attackers equipped with quantum computers, implying that we cannot rely on legacy Public Key Infrastructure (PKI) for confidentiality, integrity and availability. In this paper we do not offer a magic bullet for IoT security, but, practically investigate, trustworthy security foundations for IoT that support traditional algorithms, yet are sufficiently flexible to support the future loading of post-quantum algo- rithms. The considered scenario was the IoT seeded with post- quantum capable security anchors. There are numerous security-sensitive systems in use today, which have protection from strongly attack-resistant hardware, e.g., the chips in our bank cards and passports. They include specialist hardware, to resist physical, side-channel and fault attacks (summarized in [11]), supported by software defensive measures, and are typically assessed under Common Criteria (CC) [2]. The secured microcontrollers within bank cards, are normally of small register size (16-bit is common) and have limited memory and processing speed; the Infineon SLE78 [7] is typical. The software defensive measures for high level CC evaluation, significantly degrade performance compared to unprotected native mode implementations, with one or two orders of magnitude not untypical (see [10]). Therefore, a se- cured microcontroller has a crypto-co-processor (CCoP), with special hardware for executing specific functions much faster than the CPU. The functions may be complete algorithms, e.g., RSA [15], or general utilities such as block multiplies. A secure chip platform will offer an API for functions that map onto the underlying CCoP. The CCoP cryptographic operations may be fast relative to simple byte or bit manipulation via the main CPU; making results and optimisation strategies unusual compared to a CPU without CCoP. The MULTOS [13] platform has a CCoP, but also offers generic software primitives that still have defensive coding, but are optimized for faster execution compared to implementation at the application level. Secure chips are typically initialized and personalized before first use, which may include the storage of identities and cryptographic keys for operation and management. To overcome processing restrictions, some values may be pre-computed, for example, storing a diver- sified ID rather than calculating it, or adding small look-up tables to speed execution; our work made extensive use of the personalisation phase. In this research we chose to use MULTOS security platform(s) based on the high-levels of security assurance that they have achieved, and the availability