6,&26<6$Q,QWHJUDWHG)UDPHZRUNIRU6WXG\LQJ,QWHUFRQQHFWLRQ1HWZRUN 3HUIRUPDQFHLQ0XOWLSURFHVVRU6\VWHPV V. Puente, J.A. Gregorio and R. Beivide Computer Architecture Group University of Cantabria – Spain {vpuente, jagm, mon}@atc.unican.es $EVWUDFW $QHQYLURQPHQWKDVEHHQGHYHORSHGZKLFKLVFDSDEOH RI GHWHUPLQLQJ WKH LPSDFW WKDW D PXOWLSURFHVVRU LQWHUFRQQHFWLRQ VXEV\VWHP FDXVHV RQ UHDO DSSOLFDWLRQ H[HFXWLRQ WLPH $ JHQHUDOSXUSRVH LQWHUFRQQHFWLRQ QHWZRUN VLPXODWRU FDOOHG 6,&26<6 DEOH WR FDSWXUH HVVHQWLDO DVSHFWV RI WKH ORZOHYHO LPSOHPHQWDWLRQ KDV EHHQLQWHJUDWHGLQWRWZRH[HFXWLRQGULYHQVLPXODWRUVIRU PXOWLSURFHVVRUV56,0DQG6LP267KHHQKDQFHPHQWRI ERWKWRROVDOORZVWKHDQDO\VLVRIQHZSURSRVDOVIRUWKH LQWHUFRQQHFWLRQVXEV\VWHPRIDFF180$PDFKLQHIURP WKH9/6,OHYHOXSWRWKHUHDODSSOLFDWLRQOHYHO$Q\QHZ SURSRVDOFDQEHWUDQVODWHGWRDVSHFLILFPHVVDJHURXWHU DUFKLWHFWXUHDQGE\XVLQJDORZOHYHOLPSOHPHQWDWLRQWRRO WKHSDUDPHWHUGHOD\VRIDGHWDLOHGURXWHUPRGHOWREHXVHG E\6,&26<6FDQEHREWDLQHG ,QWURGXFWLRQ The interconnection network is an essential element of multiprocessor systems and it is critical to determine its performance. Although a lot of work has been carried out in this direction, there is a lack of completeness, mainly in two aspects. On the one hand, in very few studies is the impact of the low-level implementation considered. Thus, in spite of pioneering studies like Chien’s work [3], it is usual to analyze of new proposals without considering if the increase of complexity of its VLSI implementation will neutralize the supposed improvements. On the other hand, as opposed to what already happens with practically all other computer building blocks, the performance analysis of the interconnection network continues to be analyzed without paying too much attention to real working loads. However, it is evident that numerous proposals lose relevance when characteristics of traffic corresponding to real applications are considered. The main reasons for these deficiencies are both cost and complexity of their consideration. The simulation, at VLSI level, of an interconnection network of medium size (128 nodes) can take several days for obtaining some basic parameters. Moreover, to accurately simulate a few seconds of the traffic generated by a real parallel application can go beyond what could be considered as a reasonable design cycle time. Moreover, the situation is still worse if the special characteristics of structures as successful as Distributed Shared Memory machines are considered. The coherence maintenance, normally by hardware, causes an uncertainty about the traffic applied to the network. An application will generate different traffic distribution depending on aspects such as data location, out-of-order execution, cache size, interference between processes, characteristics of the interconnection network, etc. Obtaining real measures from this type of systems is not even feasible until very advanced stages of the design phase. The high cost prevents the construction of prototypes of multiprocessor systems, even with a reduced number of processors, without a guarantee of working. Analytical results are equally difficult to obtain. Although the parameters which take part in the performance of the interconnection network can be fixed, their stochastic relationships are complex, application dependent, and, therefore, it is difficult to obtain results without carrying out unacceptable approaches. The only way to accurately determine the performance is simulation. However, the number and characteristics of the simulators are as diverse as research groups working on this subject. They can be composed of a few tens of code lines [10] (raw results at low cost), up to tens of thousands of lines of a VHDL simulator (higher precision at higher cost). For this reason, throughout the last five years, our research group has been developing a simulator called SICOSYS [14], which is able to incorporate the key parameters of the low-level implementation and provides results close to those from VHDL simulators, but at lower computational cost. In this paper, the integration of this high-precision simulator in RSIM [9] and SimOS [12] is described. They are two of the most powerful public-domain simulation tools for multiprocessor systems. RSIM allows simulation from characteristics of a superscalar processor to the types of protocol used to maintain the data coherence. SimOS represents the following step in the simulation of a multiprocessor system, introducing the effect caused by the operating system. However, both simulators have the same drawback. Their capacity for modeling the Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing (EUROMICRO-PDP02) 1066-6192/02 $17.00 ' 2002 IEEE