Journal of Computer Languages 70 (2022) 101105
Contents lists available at ScienceDirect
Journal of Computer Languages
journal homepage: www.elsevier.com/locate/cola
Automatic compiler/interpreter generation from programs for
Domain-Specific Languages: Code bloat problem and performance
improvement
Željko Kovačević
a,∗
, Miha Ravber
a
, Shih-Hsi Liu
b
, Matej Črepinšek
a
a
University of Maribor, Faculty of Electrical Engineering and Computer Science, Koroška cesta 46, 2000 Maribor, Slovenia
b
California State University Fresno, Department of Computer Science, CA 93740, Fresno, USA
ARTICLE INFO
Keywords:
Semantic inference
Genetic programming
Attribute grammars
Domain-Specific Languages
Code bloat
ABSTRACT
Using advanced AI approaches, the development of Domain-Specific Languages (DSLs) can be facilitated for
domain experts who are not proficient in programming language development. In this paper, we first addressed
the aforementioned problem using Semantic Inference. However, this approach is very time-consuming.
Namely, a lot of code bloat is present in the generated language specifications, which increases the time
required to evaluate a solution. To improve this, we introduced a multi-threaded approach, which accelerates
the evaluation process by over 9.5 times, while the number of fitness evaluations using the improved Long
Term Memory Assistance (LTMA) was reduced by up to 7.3%. Finally, a reduction in the number of input
samples (fitness cases) was proposed, which reduces CPU consumption further.
1. Introduction
Can we envision how software development will be carried-out in
the future? Is it possible that end-users will be describing solutions
for their problem tasks on their own? Indeed, the appearance of End-
User Development (EUD) [1] is on the horizon. The central point of
EUD is to make Programming Languages (PL) easier to adopt and use,
where various Artificial Intelligence (AI) and Machine Learning (ML)
approaches (e.g., program by examples, learning the user’s actions)
can be applied to help in end-user programming. AI is a combination
of hardware and software that allows a machine to exhibit intelligent
behavior. The central part of modern AI includes reasoning, knowledge,
planning, learning, human language skills, perception, and manipu-
lation skills. AI and ML techniques are, nowadays, an integral part
of many systems, in which, even a few years ago, we would not
expect to be. A good example are compilers, where well designed
and deterministic algorithms were used in various phases of compiler
implementation (syntax, semantics and code optimization) [2]. In this
work, we tackle a similar problem already stated in [3]. Namely,
how development of Domain-Specific Languages (DSLs) [3–6] can be
made easier for domain experts not versed in a programming lan-
guage design. Can a complete compiler/interpreter be generated only
from sample programs and their associated meanings? In that case,
an end-user wanting to express his solution written in a new DSL
would need only to provide sample programs and their associated
∗
Corresponding author.
E-mail addresses: zeljko.kovacevic@student.um.si (Ž. Kovačević), miha.ravber@um.si (M. Ravber), shliu@mail.fresnostate.edu (S.-H. Liu),
matej.crepinsek@um.si (M. Črepinšek).
meanings. An early attempt was the work of [7], where only syntax
of a special form was inferred from samples and proposed to language
designers. To generate a compiler/interpreter completely automatically
from samples (programs) another step forward is also needed, that is
inferring the semantics. As expected, without AI and ML techniques this
task would be unmanageable. Therefore, we resorted to Evolutionary
Algorithms (EAs) [8], such as Genetic Programming [9] and Memetic
Algorithms [10]. The principles behind Evolutionary Algorithms are
quite different with respect to deterministic algorithms. First of all,
there is a population of individuals, which represent possible solutions.
The successfulness of a particular individual is determined by its fitness.
Better individuals have higher chances to survive and make it into
the next generation, where new solutions (individuals) are obtained
by crossover and mutation [8]. To increase convergence towards (sub-
)global solutions local search is often added, leading to the Memetic
Algorithm [11]. However, Genetic Programming often suffers from
code bloat [12], which, in this work, we have shown exists in Semantic
Inference. This means that the correct solutions found (Attribute Gram-
mars) are often difficult to understand. Code bloat also means that more
CPU time is needed to evaluate a solution. To tackle this problem three
approaches are presented: An Improved Long Term Memory Assistance
(LTMA) [13], multi-thread implementation, and a reduction of the
number of input programs (fitness cases).
https://doi.org/10.1016/j.cola.2022.101105
Received 29 October 2021; Received in revised form 6 March 2022; Accepted 9 March 2022
Available online 24 March 2022
2590-1184/© 2022 Elsevier Ltd. All rights reserved.