Understanding Scripting Language Extensions Daniel L. Moise, Kenny Wong, H. James Hoover Department of Computing Science University of Alberta Edmonton, AB, Canada moise, kenw, hoover @cs.ualberta.ca Daqing Hou Avra Software Lab. Inc. Edmonton, AB, Canada daqing@cs.ualberta.ca Abstract Software systems are often written in more than one pro- gramming language. During development, programmers need to understand not only the dependencies among code in a particular language, but dependencies that span lan- guages. In this paper, we focus on the problem of scripting languages (such as Perl) and their extension mechanisms to calling functions with a C interface. Our general ap- proach involves building a fact extractor for each scripting language, typically by hooking into the language interpreter itself. The produced facts conform to a common schema, and an analyzer is extended to recognize the cross-language dependencies. We present how these statically discovered dependencies can be represented, visualized, and explored in the Eclipse environment. 1. Introduction There is an important need to understand software sys- tems written in more than one programming language. For example, a web application may contain a mix of code in Java, HTML, JavaScript, SQL, etc. Legacy systems are typ- ically heterogeneous, with various languages used in their constituent parts. Also, many systems are written with en- tity, control, and boundary layers, each implemented or gen- erated by a different suitable language. It is not enough to have program understanding tools that consider each language independently as an island in isolation. We need to also bridge these islands to form a more complete understanding. For example, programmers often need to follow control flows in software, and this ac- tivity should not be constrained by language boundaries. It would be useful to know if, say, a C function was ulti- mately called from Perl code to better assess the impact of potential changes. Also, a more integrated understanding can help in looking for inconsistencies or anomalies, such as malformed or missing stubs in the cross-language mech- anism. If a C function is declared to be called from Perl, then a static program analysis can check that the C function indeed exists. Finally, a comprehensive understanding can aid in recovering system architecture [3]. There are a number of reasons why multi-language sys- tems exist. Efficiency For performance reasons, a high-level language may invoke fragments of code in another lower-level lan- guage (e.g., C with embedded assembly). An inter- preted language may call functions written in a na- tively compiled language (e.g., Perl with calls to a C library). Suitability For certain tasks, some languages and notations may be more suitable than others. For example, SQL is the standard notation for manipulating relational data. Scripting languages are useful in gluing together pro- grams. In particular, Perl is very effective at text pro- cessing. Reuse A software system may need to interoperate with an- other one as is, even if written in another language, rather than rewriting everything into a single language. The different teams working on each system may con- tinue to use the language with which they are most fa- miliar. The space of languages and cross-language interoper- ability mechanisms is huge. Rather than considering anal- yses between every pair of languages, it is helpful to di- vide the space, narrow our focus, and look for general ap- proaches for each partition. Consequently, interactions between program entities can be broadly categorized as being either loosely coupled or tightly coupled. Loosely coupled interactions may be en- abled by sharing a database or file, communicating through network channels, or invoking procedures remotely through the use of middleware. Such interactions typically cross