On the Prevalence of Indirect Function Calls in Middleware Software Systems Zachary M. Blasczyk * , Yanting Liang + , Keith Ecker + , and Melissa Sarnowski Computer Science Department University of Wisconsin- {Sheboygan * , Fox-Valley + } Wisconsin, USA {blasz7012,liany8529,eckek2561,sarnm6825}@uwc.edu Saleh M. Alnaeli and Mark Hall CSEPA Department University of Wisconsin-Colleges Madison, WI, USA {saleh.alnaeli, mark.hall}@uwc.edu Abstract—An empirical study investigating the pervasiveness and distribution of indirect function calls via function pointers and virtual methods in middleware software systems is presented. The study encompasses a broad gamut of software systems that range from high-performance, distributed real-time embedded systems, to fully-featured professional 3D game engines; comprising in aggregate nearly five million lines of code and nine software systems. The systems were inter-procedurally statically examined to determine the distribution of function pointers and virtual method calls; function pointers were further segregated by type and complexity. Results indicate that function pointers are typically utilized in situations that make static analysis costly and impractical to conduct. A five-year analysis of archived data shows an increase in the usage of both calls using function pointers and virtual methods over the lifetime of open- source middleware systems, thus posing additional obstacles for inter-procedural analysis. Keywords—middleware; function pointers; virtual methods; static analysis. I. INTRODUCTION Software Developers of C/C++ often utilize techniques that are known to pose immense challenges for software engineers to analyze their programs statically. The use of indirect function calls via function pointers or virtual methods are two of such techniques [1-3]. The reason for the difficulty is that one function pointer can alias any function with the corresponding signature, and a virtual method can alias methods with the same method signature in derived classes; leading to exponential complexity during static analysis. Accurately determining invoked functions is a task that can only be performed during runtime. When software engineers undergo static analysis, an approximated yet legitimate conservative approach is taken to resolve function pointers/virtual methods in the system by assuming the system invokes all potential functions/methods. Consequently, more complexity and inaccuracy are inherent with any static analysis partaken because not all functions are called at runtime. In many studies, the predicament is shown to be NP-hard; based on the implementation of function pointers and how they interact in a system [1, 3]. For accurate inter-procedural analysis; function-alias analysis is a paramount step that should be handled properly [4]. One example of a software-engineering problem domain tantamount to static analysis, regarding complexity, is automatic parallelization. If a for-loop contains a function with any side effects, it is considered un-parallelizable with APIs such as OpenMP because the set of possible targets of a function pointer call is indeterminable. The uncertainty of runtime calls prompts software engineers to adopt a conservative approach to parallelization; if one side effect exists for one possible target, parallelization may be unsafe [2, 5]. This methodology, while safe, is impractical in other software-engineering problems where the accuracy is a crucial concern and may present an adverse impact on the robustness of the program. Furthermore, applications developed in object-oriented languages, such as C++ or Java, may have additional hurdles with the intra-procedural analysis because of the nature of object inheritance and function overloading [3]. For thoroughly requisite analysis, virtual method calls need to be resolved at runtime; however, virtual methods allow parent methods to be overridden by derived classes in these more sophisticated languages [6]; literature is teeming with algorithms on this topic [2, 7-9]. Our motivation to expand empirically driven research—that analyzes the prevalence and distribution of indirect calls conducted via function pointers and virtual methods—directly correlates with the influx of emerging open source software systems. We aim to conduct research that will unequivocally aid software engineers in estimating the complexity and challenges inherent when static analysis is required. Such research can explicitly be used for the analysis of middleware system complexity as that system evolves over time. The research community is utterly lacking in in-depth studies of open source middleware systems and their evolution of function pointers and virtual methods over time. We believe that an extensive effort should be made to capitalize on the opportunities and insights such research would beget into the complexity of such systems and help us to understand the intricate obstacles of statically analyzing them. In this study, we empirically examined nine mature software systems listed in TABLE I. These Systems were reviewed based on multiple metrics; the number of function pointers, virtual methods, and indirect calls were tabulated for