DeepFuzzSL: Generating Simulink Models with Deep Learning to Find Bugs in the Simulink Toolchain Sohil Lal Shrestha Computer Science & Eng. Dept. University of Texas at Arlington Arlington, Texas, USA Shaful Azam Chowdhury Computer Science & Eng. Dept. University of Texas at Arlington Arlington, Texas, USA Christoph Csallner Computer Science & Eng. Dept. University of Texas at Arlington Arlington, Texas, USA ABSTRACT Testing cyber-physical system (CPS) development tools such as MathWorks’ Simulink is very important as they are widely used in design, simulation, and verifcation of CPS models. Existing randomized diferential testing frameworks such as SLforge lever- ages semi-formal Simulink specifcations to guide random model generation. This approach requires signifcant research and engi- neering investment along with the need to manually update the tool, whenever MathWorks updates model validity rules. To address the limitations, we propose to learn validity rules automatically by learning a language model using our framework DeepFuzzSL from a existing corpus of Simulink models. In our experiments DeepFuz- zSL consistently generated over 90% valid Simulink models and also found 2 bugs in Simulink version R2017b and R2018b confrmed by MathWorks Support. ACM Reference Format: Sohil Lal Shrestha, Shaful Azam Chowdhury, and Christoph Csallner. 2020. DeepFuzzSL: Generating Simulink Models with Deep Learning to Find Bugs in the Simulink Toolchain. In Proceedings of 2nd Workshop on Testing for Deep Learning and Deep Learning for Testing (DeepTest ’20). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn 1 INTRODUCTION Cyber-physical systems (CPS) are integration of cyberspace and physical world through a network of interconnected components such as actuators and sensors. Engineers typically prototype CPS with graphical block diagram using commercial development tools such as MathWorks Simulink [29] (a de-facto industry standard), which enable them to model, simulate and analyze their system. Furthermore, these toolchain can automatically generate embedded code that are often deployed in target hardware of safety critical systems. It is thus very important to fnd and remove bugs in such development toolchains. In software engineering, there are a number of ways to fnd bugs. Ideally one can formally verify the entire Simulink toolchain, but it is not feasible due to its large and complex code base and lack of complete formal specifcation, which can be partly attributed to its commercial nature [11]. Like many other software systems, toolchain testing sufers from the test oracle problem [2]. An alternative is fuzzing, or random test case generation which is an efective way to identify bugs [6, 7]. State-of-the-art Simulink- testing tool SLforge combined randomized fuzzing with diferential testing and found 8 new bugs in Simulink [11]. Since Simulink does DeepTest ’20, May 25, 2020, Seoul, Republic of Korea 2020. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnn not have complete publicly available language specifcation, Chowd- hury et al. [11] parsed semi-formal specifcations from Simulink’s web page automatically and rigorously incorporated them in SLforge’s random model generator. While SLforge is proven efective, it in- herently relies on documented specifcation to update it’s random model generator. To overcome the engineering efort of maintaining the tool with respect to subtle specifcation changes and adding new features while also preserving reasonable fdelity to the real world Simulink models, we propose to build a neural network model that can au- tomatically generate Simulink models by learning directly from third-party Simulink models. We hypothesize that a neural net- work model should be able to capture undocumented Simulink specifcations that is missed by earlier approach. The hypothesis is motivated by recent development in deep learning and natural language processing research that have constructed probabilistic language models of how humans write code. Such approach have shown efcacy of random program generation without the need of rigorously defning rules or grammar in a random program genera- tor [15, 25]. For e.g., DeepSmith [15], a deep learning based fuzzer, have reported 50+ bugs in OpenCL compiler such as LLVM and claimed that it can be easily extensible to other programming lan- guages with minimum engineering eforts. Earlier work on applying deep learning to compiler fuzzing have mostly focused on programming languages (such as C, OpenCL) whose complete specifcations are publicly available. In contrast, we focus on Simulink that lacks complete specifcation making it a better candidate to validate language agnostic deep learning framework that earlier work claims [15]. In this work, we portray random Simulink model generation task as a language modeling problem (Section 2.1). Traditional statistical language model approach like n-grams fails to capture semantic relations, thus is not useful in our work. In contrast, neural language model (Section 2.1) captures the semantic and syntactic structure of a given language. While there are diferent types of neural network architecture (such as feed forward, convolutional, recurrent etc), we chose Long Short Term Memory(LSTM) [18], a variant of recurrent neural network, which has proven efective in language modeling [32]. In our DeepFuzzSL framework, we extend DeepSmith architec- ture to generate random Simulink models. In doing so, we verify their earlier claim and validate our hypothesis. In our preliminary evaluation, our trained DeepFuzzSL model is able to generate over 90% valid Simulink models and have found 2 bugs in Simulink versions R2017b and R2018b confrmed by MathWorks Support. To summarize, this paper makes the following major contribu- tions. 1