Using a Variant of Sliding Window to Reduce Event Trace Data Andy Zaidman, Serge Demeyer Lab On Reengineering University of Antwerp Middelheimlaan 1, 2020 Antwerpen, Belgium Andy.Zaidman@ua.ac.be, Serge.Demeyer@ua.ac.be Abstract. Understanding how components interact with their neighboring com- ponents is a necessary prerequisite for the evolution of legacy software systems. Dynamic program analysis is known to provide deep insight in component in- teraction protocols, however such techniques must all cope with a tremendous scaleability problem. Therefore, this paper proposes a heuristic which reduces program traces based on a frequency spectrum analysis of program events. Based on a small case-study, we conclude that the heuristic is able to identify interest- ing component interaction patterns in program traces that consist of one to two million events. 1 Introduction Software engineers who are specialized in the field of dynamic analysis often have to contend with large amounts of trace data. This trace data is used for regaining architectural insight, profiling an application, measuring performance and many other purposes. To give an idea of the amount of trace data that is generated: a well-structured Java program consisting of 5 classes generates approximately 6000 events while running for 1 second. Although this execution scenario lasts only one second, it is easy to see that tracing large scale industrial applications would lead to sizes of trace data that are very difficult to handle [1]. Moreover, for the purpose of regaining architectural insight these huge amounts of detail of the trace are not needed: the event trace contains all the method calls that a program makes, but for the purpose of regaining architectural insight, we are mainly interested in the component interaction protocol. A lot of events don’t influence this interaction protocol, so we aren’t really interested in them. The heuristic we propose is based on a combination of Frequency Spectrum Anal- ysis [2] and a sliding window mechanism, well-known in the world of telecom- munications [3]. In using both these techniques we are able to identify key events, events we find interesting, and mark regions where there are a lot of these key events. For the purpose of regaining architectural knowledge through dynamic analysis, reducing the amount of trace data is very important to keep algorithms for detect- ing patterns in the trace as efficient as possible. Bottom-line is that we want to identify key events and their surrounding events and continue working with them and not the entire trace.