Technical Report: Feedback-Based Generation of Hardware Characteristics Marcus Jägemar Sigrid Eldh Andreas Ermedahl Ericsson AB first.last at ericsson.com Björn Lisper Mälardalen University bjorn.lisper at mdh.se ABSTRACT In large complex server-like computer systems it is difficult to characterise hardware usage in early stages of system de- velopment. Many times the applications running on the platform are not ready at the time of platform deployment leading to postponed metrics measurement. In our study we seek answers to the questions: (1) Can we use a feedback- based control system to create a characteristics model of a real production system? (2) Can such a model be suffi- ciently accurate to detect characteristics changes instead of executing the production application? The model we have created runs a signalling application, similar to the production application, together with a PID- regulator generating L1 and L2 cache misses to the same ex- tent as the production system. Our measurements indicate that we have managed to mimic a similar environment re- garding cache characteristics. Additionally we have applied the model on a software update for a production system and detected characteristics changes using the model. This has later been verified on the complete production system, which in this study is a large scale telecommunication system with a substantial market share. Categories and Subject Descriptors D.2.8 [Software Engineering]: Metrics—performance mea- sures ; B.3.2 [Memory Structures]: Design styles—Cache Memories ; C.4 [Performance of Systems]: Measurement techniques General Terms Measurement, Performance Keywords Control Theory, Feedback computing, Performance Analy- sis, Characteristics, Cache memories, Simulation, Load Test- ing and Design Aids 1. INTRODUCTION Measuring behavioural characteristics for complex large scale computer systems is difficult since this requires either a full production system or advanced test programs with large test systems. After a software update, it is essential to mea- sure behavioural characteristics and check that the nature of the system has not changed. Behavioural changes re- sults in costly and time consuming verification in the de- velopment cycle. Late detection of unfulfilled requirements due to characteristics changes leads to increased lead time, since parts of the system must be re-investigated and re- implemented. Such increase in development time may not be accepted since short time-to-market is essential [16, 7, 13, 19, 20]. The system we are investigating is a telecommunication sys- tem with a market share of about 38% in 2011 [7]. It consists of 5M SLOC [4] and runs on more than 20 types of boards with different hardware layout and functionality servicing both voice and data communication. We define one instance of the computer system we are in- vestigating as a node. One node can consist of many CPU’s but from a system point of view they are grouped as one execution entity. A large scale node has many CPU’s, a small scale node may consist of only a single CPU. A node communicates extensively both internally and externally be- tween nodes using signals, i.e., operating system messages. We introduce two concepts central to our investigation. The first; behavioural characteristics is in our case CPI, CPU- load or signal turnaround time but can be any metric that describes the behaviour or performance of the system. The second is load characteristics which is described by metrics that will affect the behaviour characteristics of the system. In our investigation we have concentrated on cache misses but it can be any other metric such as TLB usage, branch statistics, number of system calls or interrupts etc. For early detection of behavioural characteristics changes we suggest to create a model of the production system on a small scale node. The benefit of doing so is that we don’t have to wait for the availability of large scale nodes which are expensive and difficult to obtain. Additionally, changes in the platform may require modifications in the application software which even more extends the time before character- istics measurements can be made. Our approach is to alter load characteristics, in our case cache miss rate, to change the behavioural characteristics. Our model system consists