Journal of Computer Languages 70 (2022) 101105 Contents lists available at ScienceDirect Journal of Computer Languages journal homepage: www.elsevier.com/locate/cola Automatic compiler/interpreter generation from programs for Domain-Specific Languages: Code bloat problem and performance improvement Željko Kovačević a, , Miha Ravber a , Shih-Hsi Liu b , Matej Črepinšek a a University of Maribor, Faculty of Electrical Engineering and Computer Science, Koroška cesta 46, 2000 Maribor, Slovenia b California State University Fresno, Department of Computer Science, CA 93740, Fresno, USA ARTICLE INFO Keywords: Semantic inference Genetic programming Attribute grammars Domain-Specific Languages Code bloat ABSTRACT Using advanced AI approaches, the development of Domain-Specific Languages (DSLs) can be facilitated for domain experts who are not proficient in programming language development. In this paper, we first addressed the aforementioned problem using Semantic Inference. However, this approach is very time-consuming. Namely, a lot of code bloat is present in the generated language specifications, which increases the time required to evaluate a solution. To improve this, we introduced a multi-threaded approach, which accelerates the evaluation process by over 9.5 times, while the number of fitness evaluations using the improved Long Term Memory Assistance (LTMA) was reduced by up to 7.3%. Finally, a reduction in the number of input samples (fitness cases) was proposed, which reduces CPU consumption further. 1. Introduction Can we envision how software development will be carried-out in the future? Is it possible that end-users will be describing solutions for their problem tasks on their own? Indeed, the appearance of End- User Development (EUD) [1] is on the horizon. The central point of EUD is to make Programming Languages (PL) easier to adopt and use, where various Artificial Intelligence (AI) and Machine Learning (ML) approaches (e.g., program by examples, learning the user’s actions) can be applied to help in end-user programming. AI is a combination of hardware and software that allows a machine to exhibit intelligent behavior. The central part of modern AI includes reasoning, knowledge, planning, learning, human language skills, perception, and manipu- lation skills. AI and ML techniques are, nowadays, an integral part of many systems, in which, even a few years ago, we would not expect to be. A good example are compilers, where well designed and deterministic algorithms were used in various phases of compiler implementation (syntax, semantics and code optimization) [2]. In this work, we tackle a similar problem already stated in [3]. Namely, how development of Domain-Specific Languages (DSLs) [36] can be made easier for domain experts not versed in a programming lan- guage design. Can a complete compiler/interpreter be generated only from sample programs and their associated meanings? In that case, an end-user wanting to express his solution written in a new DSL would need only to provide sample programs and their associated Corresponding author. E-mail addresses: zeljko.kovacevic@student.um.si (Ž. Kovačević), miha.ravber@um.si (M. Ravber), shliu@mail.fresnostate.edu (S.-H. Liu), matej.crepinsek@um.si (M. Črepinšek). meanings. An early attempt was the work of [7], where only syntax of a special form was inferred from samples and proposed to language designers. To generate a compiler/interpreter completely automatically from samples (programs) another step forward is also needed, that is inferring the semantics. As expected, without AI and ML techniques this task would be unmanageable. Therefore, we resorted to Evolutionary Algorithms (EAs) [8], such as Genetic Programming [9] and Memetic Algorithms [10]. The principles behind Evolutionary Algorithms are quite different with respect to deterministic algorithms. First of all, there is a population of individuals, which represent possible solutions. The successfulness of a particular individual is determined by its fitness. Better individuals have higher chances to survive and make it into the next generation, where new solutions (individuals) are obtained by crossover and mutation [8]. To increase convergence towards (sub- )global solutions local search is often added, leading to the Memetic Algorithm [11]. However, Genetic Programming often suffers from code bloat [12], which, in this work, we have shown exists in Semantic Inference. This means that the correct solutions found (Attribute Gram- mars) are often difficult to understand. Code bloat also means that more CPU time is needed to evaluate a solution. To tackle this problem three approaches are presented: An Improved Long Term Memory Assistance (LTMA) [13], multi-thread implementation, and a reduction of the number of input programs (fitness cases). https://doi.org/10.1016/j.cola.2022.101105 Received 29 October 2021; Received in revised form 6 March 2022; Accepted 9 March 2022 Available online 24 March 2022 2590-1184/© 2022 Elsevier Ltd. All rights reserved.